cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H750 UART latency [Zephyr]

BensonYoung
Associate

I am using an STM32H750 custom hardware to interface with an external UART device at 38400 baud. (see STM32_block.png). UART 2 is configured to Asynchronous mode.

It is used to transfer the following 31 bytes of data to the external device

ff 03 fe f0 00 1e 80 03 fe 00

16 00 12 42 50 00 10 01 00 00

00 00 00 00 00 00 20 01 00 83

89

 

UART2 then expects 32 bytes of data from the external device

ff 03 fe f0 00 1f 80 03 fe 00

17 00 13 c2 54 20 01 00 00 00

00 00 00 00 00 00 10 01 00 00

d1 25

 

The software is base on Zephyr OS

Below, I used async APIs uart_rx_enable and uart_tx for data transmission and reception. Along with a semaphore for timeout purpose.

void port1_uart_callback(const struct device *dev, struct uart_event *evt, void *user_data)
{
    switch (evt->type) {
        case UART_TX_DONE:
            //LOG_INF("Handle UART_TX_DONE: %d\n", evt->type);
            k_sem_give(&group_ports[GROUP_1_PORT].data_sem);
            break;

        case UART_TX_ABORTED:
            //LOG_INF("Handle UART_TX_ABORT: %d\n", evt->type);
            break;

        case UART_RX_RDY:
            //LOG_INF("Handle UART_RX_RDY: %d\n", evt->type);
            group_ports[GROUP_1_PORT].data.rcvPacket.len += evt->data.rx.len;
            response_received = true;  // Flag response as received
            k_sem_give(&group_ports[GROUP_1_PORT].rcv_data_sem);
            break;

        default:
            LOG_WRN("Unhandled UART event: %d\n", evt->type);
            break;
    }
}

int port1_serial_process(void)
{
    uint32_t  events;
    int status;

        // ----- Initial timer for USART performance measurement
        timing_t start_time, end_time; // Variables to store timing information
        uint64_t total_cycles;
        uint64_t total_ns;
        timing_init();
        timing_start();
        LOG_INF("Timing Frequency: %llu Hz\n", timing_freq_get_mhz());
        LOG_INF("Grp port 1 running!");
        //----- End of timer for USART performance measurement

    // ----- Get system clock setting
    uint32_t sys_clk, hclk, pclk1, pclk2;

    // Retrieve clock frequencies using STM32 HAL APIs
    sys_clk = HAL_RCC_GetSysClockFreq(); // System clock frequency
    hclk = HAL_RCC_GetHCLKFreq();        // AHB clock frequency
    pclk1 = HAL_RCC_GetPCLK1Freq();      // APB1 clock frequency
    pclk2 = HAL_RCC_GetPCLK2Freq();      // APB2 clock frequency

    // Print the clock frequencies
    printk("System Clock Frequency: %u Hz\n", sys_clk);
    printk("AHB Clock Frequency: %u Hz\n", hclk);
    printk("APB1 Clock Frequency: %u Hz\n", pclk1);
    printk("APB2 Clock Frequency: %u Hz\n", pclk2);
    // ----- end Get system clock setting

    LOG_INF("Grp port 1 running!");
    uart_callback_set(group_ports[GROUP_1_PORT].uart, port1_uart_callback, NULL);
    k_event_init(&group_ports[GROUP_1_PORT].events);

    while (true) {
        /* Retrieve events the event queue*/
       events = k_event_wait( &group_ports[GROUP_1_PORT].events,
                            (uint32_t)(PORT_SYNC | PORT_SEND | PORT_SHUT_DOWN | PORT_RECEIVE | PORT_SEND_RECV | PORT_TIMEOUT | PORT_ALWAYS_RECV),
                            false,
                            K_FOREVER );
 
        LOG_INF("processGroup1PortEvent: Port %d, Events 0x%x\n", GROUP_1_PORT, events);
        if (events != 0U) {
            if (events & PORT_SEND_RECV) {
                start_time = timing_counter_get(); // Start timing

                // timeout is in microseconds which is time of 10bits at baudrate 38400
                if (uart_rx_enable(group_ports[GROUP_1_PORT].uart, (uint8_t *) &group_ports[GROUP_1_PORT].data.rcvPacket.pkt.buffer, PALCOMX_TRANSPORT_PACKET_MAX_SIZE, 520) != 0) {
                    LOG_ERR("Port 1 failed to enable UART RX\n");
                }
                status = uart_tx(group_ports[GROUP_1_PORT].uart, (uint8_t *) &group_ports[GROUP_1_PORT].data.txPacket.pkt.buffer, group_ports[GROUP_1_PORT].data.txPacket.len, SYS_FOREVER_MS);
                if (status < 0) {
                        LOG_WRN("Failed to send data: %d\n", status);
                }
   
                // Wait for response or timeout
                if (k_sem_take(&group_ports[GROUP_1_PORT].rcv_data_sem, K_MSEC(50)) == 0) {
                    if (response_received) {
                        end_time = timing_counter_get(); // End timing

                        LOG_INF("Resp rcv within %d ms\n", 50);
                        k_sem_give(&group_ports[GROUP_1_PORT].udp_resp_sem);
                        response_received = false; // Reset the flag for next use
                        /* response packet received */
                        LOG_INF("Port 1 recv bytes: %d\n", group_ports[GROUP_1_PORT].data.rcvPacket.len);
                    } else {
                        LOG_INF("UART Resp handling error\n");
                    }
                } else {
                    LOG_INF("UART Timeout waiting for resp\n");
                    // housekeeping
                    group_ports[GROUP_1_PORT].data.rcvPacket.len = 0;
                }

                // Disable UART RX after operation
                uart_rx_disable(group_ports[GROUP_1_PORT].uart);
                // Clear the event flag
                k_event_clear(&group_ports[GROUP_1_PORT].events, PORT_SEND_RECV);

                total_cycles = timing_cycles_get(&start_time, &end_time);
                total_ns = timing_cycles_to_ns(total_cycles);
                LOG_INF("Serial send-receive duration: %llu ns", total_ns);
            }
        } // end of if (events != 0U) condition
    } // end of while (true) loop
}

Following is an log extract of the above code.

System Clock Frequency: 400000000 Hz
AHB Clock Frequency: 200000000 Hz
APB1 Clock Frequency: 100000000 Hz
APB2 Clock Frequency: 100000000 Hz

<inf> com_test: processGroup1PortEvent: Port 0, Events 0x10

<inf> com_test: Resp rcv within 50 ms

<inf> com_test: Port 1 recv bytes: 32

<inf> com_test: Serial send-receive duration: 31375580 ns

From STM32 perspective, sending 31 bytes, followed by receiving 32 bytes of data @ 38400baud took around 31ms. The 31375580 ns value fluctuates between 29ms to 39ms

I used an oscilloscope to measure the actual time between packet transmission and reception. (see UART2_txrx.jpg) CH1 is data coming out of STM32, and CH2 is data from the UART device. The entire transfer took around 16.5ms.

From these measurements, I see that there is a latency of 12 ~ 22ms

That is quite a long time. 

Comparatively, observing the response time from UART device (UART_dev_resp_time.jpg) , it only took around 360us to process the incoming data and send a response. (The device was not doing a straight echo)

I have seen the explanation regarding bus architecture 

https://community.st.com/t5/stm32-mcus-products/stm32h7-gpio-togle-max-frequency/m-p/336687

I understand that latency is expected. Have I reached the peak performance for this processor? (I know I can try to clock the CPU faster). Or am I missing something that can yield better results?

 

2 REPLIES 2
Pavel A.
Evangelist III

You can make a simple "bare metal" test without any RTOS. Just send the bytes, receive the bytes. Difference of the timing of "bare metal test" and that 31 ms will be the software overhead of your Zephyr implementation. The fancier are the libraries, the heavier is overhead. No free cheese.

 

Ozone
Lead III

> I understand that latency is expected. Have I reached the peak performance for this processor? (I know I can try to clock the CPU faster). Or am I missing something that can yield better results?

I think you are on the wrong track here.

> The software is base on Zephyr OS

You better look into this OS, how it organizes tasks and interrupts, and how the task(s) performing the transmission are scheduled.

A bare-metal implementation will come very close to the minimally required time, which is : "nr-of-bytes * bits-per-byte * bit-time". Where "bit-per-byte" include start, stop, and parity.

Basically all STM32 MCUs have TxE interrupt capability, which allows you to write the next character into the TDR register before the previous one is transmitted.

You can safely assume the overhead and latency is fully due to the OS implementation you use.