cancel
Showing results for 
Search instead for 
Did you mean: 

Clock cycle shift on GPIO output STM32F103

vbesson
Senior

Dear Community,

I am porting an old application made on AVR to STM32, and I am facing a strange timing issue. 

In a nutshell, the application is reading sector (512 Bytes) from a SDCARD and output the content of the buffer to GPIO with 4us cycle (meaning 3us low, 1 us data signal). 

The SDCard read is working fine, and I have written a small assembly code to output GPIO signal with precise MCU cycle counting. 

Using DWT on the debugger, it give a very stable and precise counting (288 cycles for a total of 4us).

When using a Logic analyser with 24 MHz freq, I can see shift of signal by 1 or 2 cpu cycles and so delay. 

I have tried to use ODR directly and BSRR but with no luck. 

Attached :

- Screenshot of the logic analyzer

Screenshot 2024-05-12 at 06.30.59.png
As you can see I do not have 3us but 3.042 and this is not always the case
 

Clock configuration

Screenshot 2024-05-12 at 06.32.34.png

Port configuration:

 

GPIO_InitStruct.Pin = GPIO_PIN_13| READ_PULSE_Pin|READ_CLK_Pin;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed=GPIO_SPEED_FREQ_HIGH;
HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
 
Assembly code : 
 
.global wait_1us
wait_1us:
.fnstart
push {lr}
nop ;// 1 1
nop ;// 1 2
mov r2,#20 ;// 1 3
wait_1us_1:
subs r2,r2,#1 ;// 1 1
bne wait_1us_1 ;// 1 2
pop {lr}
bx lr // return from function call
.fnend

.global wait_3us
wait_3us:
.fnstart
push {lr}
nop
nop
wait_3us_1:
subs r2,r2,#1
bne wait_3us_1
pop {lr}
bx lr // return from function call
.fnend
 
 
sendByte:
 
and r5,r3,0x80000000;// 1 1
lsl r3,r3,#1 ;// 1 2 // right shift r3 by 1
subs r4,r4,#1 ;// 1 3 //; dec r4 bit counter
//mov r6,#0 // Reset the DWT Cycle counter for debug cycle counting
//ldr r6,=DWTCYCNT
//mov r2,#0
//str r2,[r6] // end
bne sendBit ;// 1 4
beq process ;// 1 5
// Clk 15, Readpulse 14, Enable 13
sendBit:
ldr r6,=PIN_BSRR ;// 2 2
LDR r2, [r6] ;// 3 5
cmp r5,#0 ;// 1 6
ITE EQ ;// 1 7

 
ORREQ r2,r2, #0x80000000 ;// 1 8 set bit 13 to 1, OR with 0000 0010 0000 0000 0x2000 (Bit13) 0x6000 (Bit13 & 14)
ORRNE r2,r2, #0x00008000 ;// 1 9 set bit 29 to 1, OR with 0010 0000 0000 0000
 
 
ORR r2,r2, #0x00004000 ;// 1 8 set bit 13 to 1, OR with 0000 0010 0000 0000 0x2000 (Bit13) 0x6000 (Bit13 & 14)
 
STR r2, [r6] ;// 1 10 set the GPIO port -> from this point we need 1us, 72 CPU cycles (to be confirmed)
bl wait_1us ;// 65 75 144 209
ORR r2,r2, #0xC0000000 ;// 1 12 ; // Bring the pin down
STR r2,[r6] ;// 1 13 ; //
; // We need to adjust the duration of the 3us function if it is the first bit (coming from process less 10 cycle)
cmp r4,#1
ite eq
moveq r2,#56
movne r2,#62
bl wait_3us ; // wait for 3 us in total
b sendByte

 

I do not know where to look at to be honnest

 

40 REPLIES 40

I will try the DMA + SRAM and maybe checking interrupt as well

 

Vincent 

vbesson
Senior

This is the view NVIC in CubeMX,

Screenshot 2024-05-13 at 14.06.54.png

Screenshot 2024-05-13 at 14.07.11.png

Does it change something to uncheck these ?

Vincent

 


@vbesson wrote:

Yes this is my goal, I have done it with the ATMEGA328P but the SPI speed does not allow accurate writing, 

...

I do not get your point with USART in synchronous or SPI ? you mean having a SPI to DMA and then DMA to USART ?

...


No, the idea is to use a timer to generate the clock you need and use that timer signal to clock either a USART in synchronous mode (using the TX signal as your data) or a SPI in Slave mode (using the MISO signal as your data).  Either approach takes care of sending data bits based on your clocking.  At this point you can add the circular double-buffer DMA mechanism to feed the data bytes to the USART or SPI.

vbesson
Senior

This is getting really embarrassing ... ;)

Putting DMA in motion, I have better accuracy but stil some glitch... this is crazy... really really crazy

I do not understand...

V

 

vbesson
Senior

 

Frame should all be 1us High and 3us low, and I still have glitches... 

Screenshot 2024-05-13 at 21.06.47.png

I don't see any glitches in your screenshot.

Timing also looks good. The imperfections be an issue with synchronization between the logic analyzer and the microcontroller. Try a logic analyzer with a higher clock and you might see even better timing. Clock dither of the MCU and/or the logic analyzer can also be a factor. Rise time can also be a factor in imperfect measuring of pulse widths.

You say your logic analyzer is clocked at 24MHz, but that would make 3.063us about 73.512 clock cycles of your logic analyzer (or exactly 73.5 cycles and the 3.063 is a rounded number). So that cannot be correct. Does it sample at both edges of the clock? So 48 MSamples/second?

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

I have done sampling at different logic analyser speed rate and the pulse timing is not right. I have a comparison with an ATMEGA328P. 

The issue is that I have 402 bytes per data stream to be sent, it means 402*8*4 us period = 12 864 us for a data sector. 

When I have 0.063 us shift time to time, I can have more than 1 bit or 2 shift in the end and data corruption with the 2 byte XOR CRC send at the end of the transfer.

I found an interesting article, and I am currently testing it Disabling / enabling IRQ on STM32 for atomic read 

Will keep you posted

Vincent

 

 

Use logic analyzer or oscilloscope with much higher sampling frequency then your MCU frequency or you will not be able to distinguish measurment relics from real MCU output jitter. Imagine that you generating perfect square wave with frequency 72MHz/20=7.2MHz (period 277.8ns). If  you sample this signal by analyzer with 24MHz (period 41.67ns) the you will see jittering output of 2 consecutive periods in duration 6*41.67 (250ns) and one period in duration 7*41.67 (292ns) - on perfect square wave ! And that can leads you to wrong conclusion that jitter comes from MCU output ...

vbesson
Senior

Hello Michal, 

Thanks and I genuinely agree with you. However it is not easy to find an AL with capability above 24 MHz. 

what I do is to look at the data stream over the whole transmission period and I should not get significant delay. Unfortunately I have delay and not a 250kHz fdata freq.

Vincent 

Ordinary oscilloscope should be able to handle it with ease. As others have already written - use SPI or USART or Timer + DMA and you will get easy seamless pulse stream and with minimal CPU load.