cancel
Showing results for 
Search instead for 
Did you mean: 

maximum throughput for usart

michaelmccartyeng
Associate II
Posted on September 25, 2012 at 04:22

Hello All, 

  Finally got to a stm32 discovery specific question. I've turned my baud up to a really high rate 

921600. I have a simple loop that sends a request for data and then gets data back from the stm32 as fast as possible. 

  I only ever see about 300Kbs. I dont see any errors so I'm wondering if simply the processing of the packet takes so long that it bottlenecks my baud rate ? 

  I'm using the F0 

STM32F051 series for testing. 

  I have thought about dma'ing the usart data into memory and then using a crc inside the memory sent to check the memory on dma TC, then mark the data as good, bad or new. This I think would give me the maximum throughput, allowing one dma of rx to kick off the dma of tx. But i dont know if thats a good idea. 

  I'm currently using a circular buffer and pulling/reading the bytes out into another buffer before processing that data as a packet. Then I send the response packet after building it using dma. My packet sizes are just over 100 bytes. 

  I want to tell if the program flow is whats taking the time and causing the bottleneck, because that would be cross telemetry/periph. 

  Guess I could run some tests getting the time in ms since getting the first byte and sending the packet to profile the code. 

  Any advice is appreciated ! 

  Thanks

#usart-stm32-discovery
4 REPLIES 4
Posted on September 25, 2012 at 06:18

Well you could use GPIO toggling to apportion times to various parts of the process.

It should be pretty easy to time the inter-symbol gap between bytes, and frankly I'd believe it should be easy to saturate the USART output with data. The gap should tend to zero.

I have found that doing XMODEM-1K-CRC it is better to compute/bill the CRC a byte at a time as it transmits, rather that compute it for a block at the end.

Building a buffer, and computing a CRC prior to dispatching a DMA transfer will add to your latency. Better perhaps to have a scatter-gather list of DMA buffers, or form buffers so header information can be prepended, or to transmit the payload portion whilst computing the CRC, and avoid copying buffers.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
michaelmccartyeng
Associate II
Posted on September 25, 2012 at 13:20

Thanks Clive,

  designing my buffers in such a way that the header information can simply be prepended before the data is shipped out is a good idea. I was thinking of that too, it would work for some portions of memory i'm sending but other portions are contiguous 7k blocks. I never thought of the ''scatter gather'' that makes sense. I could simply have a queue that holds chunks of memory and sizes that need to be sent out and pass them to the dma on completion of previous dma. 

  as soon as i work out bugs that i put in trying to make it less latent i'll set some timers and see exactly  how many ms i spend in each area. I also have a bunch of ''bytestou16()'' functions that i'm going to replace with structs so i can just point at the data. 

  guess if i really wanted to not be lazy i could just count the steps in the debugger and use the clock cycles to calculate the latency. 

  Thx,

   MM

michaelmccartyeng
Associate II
Posted on September 28, 2012 at 04:21

After much changing and re factoring seems like my throughput only got worse :( 

I changed all of my copying of data from memory into packets into simply passing pointers of the data to use/send to the dma. The only other thing I could do is change the logic around, remove my main switch statement state machine. 

I guess blindly trying to resolve a problem that I dont know what is is is futile. I should determine where the hold up is and try and fix that, for all I know its on the pc side. 

michaelmccartyeng
Associate II
Posted on September 28, 2012 at 04:29

Oops. I forgot I turned on the printf to print debug output to usart3. I was trying to actually see that in the usart or printf window in keil but it never worked. So at some point i was sending out data in a blocking way at a much slower rate. Commenting that out I now get 

BitsPerSec: 463552.0

when my baud is set to 921600, only 50% ''goodput''. Without further anaylizing the actual time spent inside each function I dont think i'll figit with the code anymore. 

Is there any analyzer within keil that can tell you where you spend the most time in your code? like a profiler ?