cancel
Showing results for 
Search instead for 
Did you mean: 

Clock cycle shift on GPIO output STM32F103

vbesson
Associate III

Dear Community,

I am porting an old application made on AVR to STM32, and I am facing a strange timing issue. 

In a nutshell, the application is reading sector (512 Bytes) from a SDCARD and output the content of the buffer to GPIO with 4us cycle (meaning 3us low, 1 us data signal). 

The SDCard read is working fine, and I have written a small assembly code to output GPIO signal with precise MCU cycle counting. 

Using DWT on the debugger, it give a very stable and precise counting (288 cycles for a total of 4us).

When using a Logic analyser with 24 MHz freq, I can see shift of signal by 1 or 2 cpu cycles and so delay. 

I have tried to use ODR directly and BSRR but with no luck. 

Attached :

- Screenshot of the logic analyzer

Screenshot 2024-05-12 at 06.30.59.png
As you can see I do not have 3us but 3.042 and this is not always the case
 

Clock configuration

Screenshot 2024-05-12 at 06.32.34.png

Port configuration:

 

GPIO_InitStruct.Pin = GPIO_PIN_13| READ_PULSE_Pin|READ_CLK_Pin;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
GPIO_InitStruct.Speed=GPIO_SPEED_FREQ_HIGH;
HAL_GPIO_Init(GPIOC, &GPIO_InitStruct);
 
Assembly code : 
 
.global wait_1us
wait_1us:
.fnstart
push {lr}
nop ;// 1 1
nop ;// 1 2
mov r2,#20 ;// 1 3
wait_1us_1:
subs r2,r2,#1 ;// 1 1
bne wait_1us_1 ;// 1 2
pop {lr}
bx lr // return from function call
.fnend

.global wait_3us
wait_3us:
.fnstart
push {lr}
nop
nop
wait_3us_1:
subs r2,r2,#1
bne wait_3us_1
pop {lr}
bx lr // return from function call
.fnend
 
 
sendByte:
 
and r5,r3,0x80000000;// 1 1
lsl r3,r3,#1 ;// 1 2 // right shift r3 by 1
subs r4,r4,#1 ;// 1 3 //; dec r4 bit counter
//mov r6,#0 // Reset the DWT Cycle counter for debug cycle counting
//ldr r6,=DWTCYCNT
//mov r2,#0
//str r2,[r6] // end
bne sendBit ;// 1 4
beq process ;// 1 5
// Clk 15, Readpulse 14, Enable 13
sendBit:
ldr r6,=PIN_BSRR ;// 2 2
LDR r2, [r6] ;// 3 5
cmp r5,#0 ;// 1 6
ITE EQ ;// 1 7

 
ORREQ r2,r2, #0x80000000 ;// 1 8 set bit 13 to 1, OR with 0000 0010 0000 0000 0x2000 (Bit13) 0x6000 (Bit13 & 14)
ORRNE r2,r2, #0x00008000 ;// 1 9 set bit 29 to 1, OR with 0010 0000 0000 0000
 
 
ORR r2,r2, #0x00004000 ;// 1 8 set bit 13 to 1, OR with 0000 0010 0000 0000 0x2000 (Bit13) 0x6000 (Bit13 & 14)
 
STR r2, [r6] ;// 1 10 set the GPIO port -> from this point we need 1us, 72 CPU cycles (to be confirmed)
bl wait_1us ;// 65 75 144 209
ORR r2,r2, #0xC0000000 ;// 1 12 ; // Bring the pin down
STR r2,[r6] ;// 1 13 ; //
; // We need to adjust the duration of the 3us function if it is the first bit (coming from process less 10 cycle)
cmp r4,#1
ite eq
moveq r2,#56
movne r2,#62
bl wait_3us ; // wait for 3 us in total
b sendByte

 

I do not know where to look at to be honnest

 

40 REPLIES 40

I will do some test and get back to this thread. 

Quick question, If I use  DMA buffering with USART, I cannot have stop bit each Bytes, is there a way to remove the stop bit ? It means as well that I will set the clock speed to 1 us (72 cycles), and I will rearrange data stream to have 0.0.0.DATABITS. 

 

Uwe Bonnes
Principal III

How is data sampled on the receiving side? At what edge? I do not see any sensible setup/hold time guard!

I need to bit stream data along 1 wire interface with1us Data pulse (High or Low) , 3us data line Low, no start bit, no stop bit, what is best SPI, USART seems to have start and stop bits


@vbesson wrote:

However it is not easy to find an AL[LA] with capability above 24 MHz. 



Dreamsourcelab, Saleae and various others: https://sigrok.org/wiki/Supported_hardware#Logic_analyzers

Or use an oscilloscope as Michal Dudka pointed out. If you want precise timing use proper equipment.

 


@vbesson wrote:
When I have 0.063 us shift time to time, I can have more than 1 bit or 2 shift in the end and data corruption with the 2 byte XOR CRC send at the end of the transfer.

What are you sending it to? Is it synchronous or asynchronous? In either case a tiny bit of drift shouldn't be a problem for a long stream of pulses since the receiving end should synchronize it somehow.

What is the clock source of your MCU? Perhaps that's the problem. Try to give the STM32 and the AVR the same clock source and compare if they behave the same. Usually AVRs are clocked at 8,12,16 or 20MHz and an STM32 can easily work with that as an input clock.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

@vbesson wrote:

I need to bit stream data along 1 wire interface with1us Data pulse (High or Low) , 3us data line Low, no start bit, no stop bit, what is best SPI, USART seems to have start and stop bits


1-wire doesn't have critical timing at all. A 1 has a low pulse of 1-15us and a 0 is a low pulse of 60us. Nanosecond dither is irrelevant.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

@vbesson wrote:

I need to bit stream data along 1 wire interface with1us Data pulse (High or Low) , 3us data line Low, no start bit, no stop bit, what is best SPI, USART seems to have start and stop bits


As I originally suggested, look into the "S" part of the USART.

Start/Stop bits in USART can be problematic in your case. You can use SPI with 1Mbps datarate and each four bytes generate one output bit. You should have small buffer in RAM and transfer data into SPI by DMA. Your SW then can write new data into lower/higher part of buffer while another part is transmited. But there is also another way.

You can reroute SPI  MOSI signal back to MCU to timer ETR pin and set timer to "one-pulse" mode to work as "pulse stretcher". Then you can transmit raw data by SPI (each bit of data for SPI equals bit of data at you output). This method need minimal SW data preprocessing (thanks fact that bit is bit, against first method where one output bit is created by four SPI bits).


Yes indeed my fault, I will definitely give it a try 😉 

I need to write a few stuff (bit rearranging to have 3us delay), great 🙂 and I will save for sure Ram memory 😉

V

vbesson
Associate III

 

Hello All, 

Quick update on my test based on the feedback you gave me.

What I have tested:

  • Double buffer DMA with USART
  • Double buffer DMA with SPI
  • Double buffer DMA with GPIO & BSR
  • Disable all IRQ with bit bang SPI and ASM
  • Reducing the clock speed to avoid congestion on the bus

and combination of all the above.

My feedback:

USART was a great approach to reduce the buffer size, indeed I needed a 2048 buffer (512 Bytes * 4 clock cycles).

The issue with USART is the pause between bytes even without stop bits, USART is waiting a few cycle between Bytes. so I can not use this approach as I need a continuous stream of bit every 3us for 1us.

 

SPI same a USART, giving the same results. the good thing is using DMA I see more accuracy on the stream of data.

 

Disabling all IRQ and doing bit bang on SPI with GPIO output using ASM: disabling IRQ does not change anything the accuracy is not there. This is the most frustrating stuff, having an ASM function not behaving the same in one CPU cycle to cycle... there must be a way to do it. Maybe ST can help and provide a more detailed explanation.

 

Reducing clock speed: it has no effect on accuracy, and btw I need cpu speed to be able to manage SPI without running after the SDCard as I am not doing bitbang on SPI and on GPIO.

 

Double buffer DMA with GPIO & BSR. This is for the moment the best approach, even from a memory perspective it is pretty ugly. Indeed, for the record I have a buffer of 402 Bytes to be send on a 4us cycle (3us delay, 1us data cycle). It means 13 Chunk of 32 Bytes, so I needed a unint32 (BSR is UINT32) buffer of 2048 = 8192 bytes (64 Bytes x 8 Bit x 4 timer cycles, x 4 UINT size #&##é!). What I could do and not done yet, is to use straight ODR to have a UINT16 and dividing the buffer in 2. I need to test this as my prog is not over and I would need more memory space to manage the OLED screen.

NB: I scratched my head on the DMA interrupt not triggering. I used 

 

HAL_DMA_Start_(&hdma_tim2_up,  (uint32_t)DMA_BUFFER, (uint32_t)&(GPIOC->BSRR), 2048);

//instead of

HAL_DMA_Start_IT(&hdma_tim2_up,  (uint32_t)DMA_BUFFER, (uint32_t)&(GPIOC->BSRR), 2048);
       
      

 

This is the way I prepare the buffer :

 

void initeDMABuffer(char * buffer){                   // TODO check number of CPU cycle in C and Assembly
  
  uint32_t GPIO_14L_15L=   0xC0000000;                // No Data Pulse, No Clock
  uint32_t GPIO_14H_15H=   0x0000C000;                // Data HIGH, Clock HIGH
  uint32_t GPIO_14H_15L=   0x80004000;                // Data LOW,  Clock HIGH
  
  char c=0;
  int l=0;

  for (int j=0;j<DMA_BUFFER_SIZE;j++){                            // Populate 128 Bytes, 8 bits each, and 4 x 1us step,
    c=buffer[j];                                      // DMA to GPIO will be based on a 1us frequency, so 72 clock cycle on a STM32F103,
    for (int k=0;k<8;k++){
                                                      // upfront compute for optimization,
      DMA_BUFFER[l]=GPIO_14L_15L;                     // Cycle 1 wait,                        
      DMA_BUFFER[l+1]=GPIO_14L_15L;                   // Cycle 2 wait,
      DMA_BUFFER[l+2]=GPIO_14L_15L;                   // Cycle 3 wait,             
      
      if (c & 0x80)                                   // AND x80, test if Bit 15 is 1, 0x1000 0000 0000 
        DMA_BUFFER[l+3]=GPIO_14H_15H;                 // Only populate the 4th value to do , 1us wait cycle, 1us wait cylce, 1us wait cycle, 1 us data cycle
      else
        DMA_BUFFER[l+3]=GPIO_14H_15L;                 // Assuming Bit 15 is 0, then GPIO 15 Low
      c=c<<1;
      l+=4;                                          // Left shift by 1 next iteration
    }
                                              
  }

}

 

This the Half buffer preparation during DMA Cycle

 

void populateHalfDMABuffer(char * buffer,int pos,int half){

// GPIO13 -> Chip enable (active low)
// GPIO14 -> Clock pulse
// GPIO15 -> Data pulse

// buffer correspond to the Sector char buffer,
// pos is the current position in the buffer %64
// Half is the first 0 or second half of the DMA array

uint32_t GPIO_14H_15H=   0x0000C000;
uint32_t GPIO_14H_15L=   0x80004000;

char c=0;
unsigned int l=half*1024;
unsigned int bsize=DMA_BUFFER_SIZE/2;
for (int i=0;i<bsize;i++){
  c=buffer[pos+i];
  for (int j=0;j<8;j++){

    if (c & 0x80)
      DMA_BUFFER[l+3]=GPIO_14H_15H;
    else
      DMA_BUFFER[l+3]=GPIO_14H_15L;
    c=c<<1;
    l+=4;
  }
}

return;
}

 

 

These are my 2 DMA Buffer callback functions:

 

volatile int ClusterSlice;
void HAL_DMA_HalfTxIntCallback(DMA_HandleTypeDef *hdma)
{
	   if (ClusterSlice<13){
      ClusterSlice++;
      // Half the buffer has been transmitted;
      populateHalfDMABuffer(sectorBuf,ClusterSlice*DMA_BUFFER_SIZE/2,0);
     }else{
    
      __disable_irq(); 
      HAL_TIM_Base_Stop_DMA(&htim2);
      __enable_irq(); 
      prepareNewSector=1;
     
    }
}

void HAL_DMA_FullTxIntCallback(DMA_HandleTypeDef *hdma){
	  
    
    if (ClusterSlice<13){
      ClusterSlice++;
      populateHalfDMABuffer(sectorBuf,ClusterSlice*DMA_BUFFER_SIZE/2,1);
    }
      
    else{
    
      __disable_irq(); 
      /* might not be necessary */
      //hdma_tim2_up.XferCpltCallback=NULL;
      HAL_TIM_Base_Stop_DMA(&htim2);
      __enable_irq(); 

      prepareNewSector=1;
      //printf("End of DMA\n");
    }
      
    // end if the the initial buffer
    
}

 

In the end this is the output on the logic Analyser:

Full 402 Data ChunkFull 402 Data ChunkScreenshot 2024-05-22 at 06.23.38.png

What is left todo:

- Manage disk head movement based on GPIO interrupt (and then move to the right SD card Sector and cluster)

- Doing some testing to see if the timing is accurate enough.

I will keep you posted on how I progress. 

Vincent

 

vbesson
Associate III

Hello All, 

the delay between 2 data chunk of (512 Bytes) is causing some trouble. 

I am heading to using DMA SPI to send bytes. 

I am having some issues with DMA interrupt and I need a small help.

I want to have Half transfer DMA and Complete transfer DMA interrupt. 

I did 

  hdma_spi1_tx.XferHalfCpltCallback=HAL_DMA_HalfSpiTxIntCallback;
  hdma_spi1_tx.XferCpltCallback=HAL_DMA_FullSpiTxIntCallback;
 
  HAL_SPI_Transmit_DMA(&hspi1,DMA_BIT_BUFFER,1608);   

The interrupt never get called...

Should I use instead ?

  hdma_spi1_tx.XferHalfCpltCallback=HAL_DMA_HalfSpiTxIntCallback;
  hdma_spi1_tx.XferCpltCallback=HAL_DMA_FullSpiTxIntCallback;
 
  //HAL_SPI_Transmit_DMA(&hspi1,DMA_BIT_BUFFER,1608);               // 402*8*4
  HAL_SPI_Transmit_IT(&hdma_spi1_tx,DMA_BIT_BUFFER,1608);

In that case I assume I have to setup the timer to hdma_spi1_tx ?

Thanks for your help 

Vincent