2024-05-11 09:49 PM
Dear Community,
I am porting an old application made on AVR to STM32, and I am facing a strange timing issue.
In a nutshell, the application is reading sector (512 Bytes) from a SDCARD and output the content of the buffer to GPIO with 4us cycle (meaning 3us low, 1 us data signal).
The SDCard read is working fine, and I have written a small assembly code to output GPIO signal with precise MCU cycle counting.
Using DWT on the debugger, it give a very stable and precise counting (288 cycles for a total of 4us).
When using a Logic analyser with 24 MHz freq, I can see shift of signal by 1 or 2 cpu cycles and so delay.
I have tried to use ODR directly and BSRR but with no luck.
Attached :
- Screenshot of the logic analyzer
Clock configuration
Port configuration:
I do not know where to look at to be honnest
2024-05-12 08:50 PM
Yes this is my goal, I have done it with the ATMEGA328P but the SPI speed does not allow accurate writing,
I have done a lot of trick to make it working... now I try with stm32.
My approach is:
- FatFS to select the right file,
- Direct fat allocation reading to get the cluster / sector match
- Reading is very fast on stm32, less than 3ms to read a sector. As per the specification I have 20ms.
- Using Assembly to send bit by bit the buffer (not working), so I am testing now the approach with the DMA and the timer.
I do not get your point with USART in synchronous or SPI ? you mean having a SPI to DMA and then DMA to USART ?
Just a side question: I have the feeling that my blue pill is not with a genuine st chip (ID change). Would that impact the cycle to cycle predictability ?
Vincent
2024-05-12 11:25 PM - edited 2024-05-12 11:27 PM
>Just a side question: I have the feeling that my blue pill is not with a genuine st chip (ID change). Would that impact the cycle to cycle predictability ?
No , is same - because is same core.
see for the chips inside :
https://www.richis-lab.de/STM32.htm
> I have the feeling that my blue pill is not with a genuine st chip (ID change)
Whats written on chip ?
Whats its ID ?
2024-05-12 11:33 PM
It is written:
STM32
F103C8T6
991KA 93
MYS A11
The CPU ID is 0x2ba01477 instead of 0x1ba01477
2024-05-12 11:51 PM - edited 2024-05-12 11:55 PM
So its probably a CS32F103C8T6 by CKS , not by STM. :)
see: https://www.eevblog.com/forum/beginners/unexpected-idcode-flashing-bluepill-clone/
But it should work fine also (its just same ARM core made by other company).
Bad (illegal) is just : its re-labeled , so its a "fake STM32F103" .
+
Maybe you cannot debug in STM-IDE , because it checking since some versions for "correct" chip ID.
For debug you need genuine STM chip.
2024-05-12 11:53 PM
Ok thanks,
I am using OpenOCD and debug works fine,
I am implementing the DMA approach, but after that, I am kine to understand why I have a cycle or more shift with assembly code, (maybe I need to disable all interrupts)
Vincent
2024-05-13 12:28 AM
Aaaa, you cannot expect fixed timing, when INT might happen.
Using DMA should be "better" , but still they (dma + cpu) access the same internal bus, so any access might get one or more wait states, until it gets the bus, if there is the bus busy with a transfer at that moment.
2024-05-13 12:30 AM
You say I am kine to understand why I have a cycle or more shift with assembly code, (maybe I need to disable all interrupts).
Where you have any interrupt happening, it will be many more than one or two cycles. The processor has to save several registers onto the stack, jump to the interrupt-service-routine, do whatever's coded there, then pop those registers off the stack before resuming whatever your code was doing.
I think an approach to improve cycle-accuracy would be have only one clock in the system. Run the processor sufficiently slowly that you don't have any FLASH wait-states (24 MHz?), and have all the AHB, APB1 and APB2 at the same clock-rate as SysClk. That's if you can get all the necessary processing done while running that slowly.
Thinking about your application - emulating or linking-to a disk drive, I don't currently see why timing is so critical. I would expect even a synchronous motor on a genuine disk drive to suffer from some speed variation. The recorded data - FM, MFM or whatever, should be "self-clocking" in that the time between edges determines whether you have a 1 or 0 of data, and there should be significant latitude in those timings. The only exception is if the receiving end is poorly emulated and samples an entire sector of data based on the timing of the first edge, in which case any clock drift between sender and receiver can easily add up to more than a bit period. I think the fix is to use a different algorithm for reception e.g. make it self-clock as data comes in or sample sufficiently frequently so as not to miss any edges even if the clock drifts.
2024-05-13 12:34 AM
Have you tried running the code from RAM instead of FLASH? This might improve timing.
2024-05-13 02:24 AM - edited 2024-05-13 02:25 AM
Forget about bit-banging, whether asm or DMA.
If you want cycle precision, just use a timer. Or SPI, or whatever other hardware is suitable.
JW
2024-05-13 05:01 AM
Hello Danish1,
I do not have any interrupt handler implemented, this is why it is very very strange,
Running at 24Mhz is not fast enough, 24 *** cycle to issue on GPIO will not work (same issue as the ATMEGA328P that was overclocked to 27Mhz).
I need to understand (and I will try to experiment what make the asm code to take more time to execute). What is strange is that the first data pulse iteration is not taking 1us but systematically 2.5 us and after that I have 1us or 1.04us.
Synchronisation and accuracy of the clock is critical as it is a 1 GPIO interface without any clock pulse. At the beginning of the transmission there is 5 times a synchronisation pattern made of 10 bits (FFFFFFFF00) and then the real data transfer starts 402 bytes of data.
After that, The disk head can move for 20ms based on 4 GPIO interrupt.
to better understand this, there is a small old book Beneath Apple II DOS where it is explain section 3.7
What does not make sense, is why on a AVR I can manage to get very precise clock cycle, and here on the STM32 it is not the case. I must have done a mistake,
I will try the following:
- Removing all interrupt handler
- Reducing the speed of the clock
- Moving asm in SRAM
- Finish implementation of the DMA with half buffering