2025-04-08 8:45 PM - last edited on 2025-04-09 2:17 AM by Andrew Neil
I am using an STM32H750 custom hardware to interface with an external UART device at 38400 baud. (see STM32_block.png). UART 2 is configured to Asynchronous mode.
It is used to transfer the following 31 bytes of data to the external device
ff 03 fe f0 00 1e 80 03 fe 00
16 00 12 42 50 00 10 01 00 00
00 00 00 00 00 00 20 01 00 83
89
UART2 then expects 32 bytes of data from the external device
ff 03 fe f0 00 1f 80 03 fe 00
17 00 13 c2 54 20 01 00 00 00
00 00 00 00 00 00 10 01 00 00
d1 25
The software is base on Zephyr OS
Below, I used async APIs uart_rx_enable and uart_tx for data transmission and reception. Along with a semaphore for timeout purpose.
Following is an log extract of the above code.
System Clock Frequency: 400000000 Hz
AHB Clock Frequency: 200000000 Hz
APB1 Clock Frequency: 100000000 Hz
APB2 Clock Frequency: 100000000 Hz
<inf> com_test: processGroup1PortEvent: Port 0, Events 0x10
<inf> com_test: Resp rcv within 50 ms
<inf> com_test: Port 1 recv bytes: 32
<inf> com_test: Serial send-receive duration: 31375580 ns
From STM32 perspective, sending 31 bytes, followed by receiving 32 bytes of data @ 38400baud took around 31ms. The 31375580 ns value fluctuates between 29ms to 39ms
I used an oscilloscope to measure the actual time between packet transmission and reception. (see UART2_txrx.jpg) CH1 is data coming out of STM32, and CH2 is data from the UART device. The entire transfer took around 16.5ms.
From these measurements, I see that there is a latency of 12 ~ 22ms
That is quite a long time.
Comparatively, observing the response time from UART device (UART_dev_resp_time.jpg) , it only took around 360us to process the incoming data and send a response. (The device was not doing a straight echo)
I have seen the explanation regarding bus architecture
https://community.st.com/t5/stm32-mcus-products/stm32h7-gpio-togle-max-frequency/m-p/336687
I understand that latency is expected. Have I reached the peak performance for this processor? (I know I can try to clock the CPU faster). Or am I missing something that can yield better results?
2025-04-08 10:56 PM
You can make a simple "bare metal" test without any RTOS. Just send the bytes, receive the bytes. Difference of the timing of "bare metal test" and that 31 ms will be the software overhead of your Zephyr implementation. The fancier are the libraries, the heavier is overhead. No free cheese.
2025-04-08 11:07 PM
> I understand that latency is expected. Have I reached the peak performance for this processor? (I know I can try to clock the CPU faster). Or am I missing something that can yield better results?
I think you are on the wrong track here.
> The software is base on Zephyr OS
You better look into this OS, how it organizes tasks and interrupts, and how the task(s) performing the transmission are scheduled.
A bare-metal implementation will come very close to the minimally required time, which is : "nr-of-bytes * bits-per-byte * bit-time". Where "bit-per-byte" include start, stop, and parity.
Basically all STM32 MCUs have TxE interrupt capability, which allows you to write the next character into the TDR register before the previous one is transmitted.
You can safely assume the overhead and latency is fully due to the OS implementation you use.