cancel
Showing results for 
Search instead for 
Did you mean: 

Why the IRQ needs so many cycles

BSchm.0
Associate II

Hardware & HAL:

  • STM32F767ZI Nucleo 210MHz
  • Cube MCU Package 1.15
  • Application USB (12Mbit FS) to UART bridge 115200 baud, 8bit 1s 1s, full duplex test (RX-TX bridged)

Tested:

  • The test on /dev/ttyACM0 is running without any errors on all packet 2-64KB sizes for 3 days (all compares ok of sen received file)
  • no delay between 64 Bytes on TX
  • Flow Control working fine (NAK) on USB OUT endpoint
  • I made some IRQ time measurements with the logic analyzer and synced it with USB bus analyzer

Result:

  • UART IRQ needs about 3us
  • USB IRQ on 3x endpoints is checked by loop (Control,IN,OUT)( 3x cycle (e.g. OUT, DATA,ACK) = 3us x9 = about 27us and we see this multiplied UART IRQ time matches well with the measured USB IRQ total time for transmit OUT to TX

My Conclusion:

  • you could not do multi channel USB to UART full duplex x2 or x3 or x4 bridge with any STM32F with out drop outs on the other channels ! (only highest IRQ sub group is fine)
  • CPU to slow or HAL needs to many cycles for IRQ handling
  • if more then 3 EPs are checked in USB IRQ handler time will extend for multi channel bridge in comparison to single channel... makes it even worser
  • change to HS 480 Mbit makes it much worser because of NAK EP flow control
  • ok 3x times IRQ because of USB protocol to get the bulk data is caused by USB.org :beaming_face_with_smiling_eyes:
  • there is no sampe code from STM for more then 1x USBbride to UART ..why ..its not working to do cheaper then FDTI ASIC x2 x4

Question:

  • why we need 3us @ 210MHz = 630 clock cycles with HAL for this UART IRQ
  • how many times faster is your HAL replacemet code of UART and USB for IRQ handling
  • any time infos of thid party suppliers IRQ total times are welcome
  • dont forget your CPU core speed otherwise info is not comparable
  • no guessing only IRQ times measurements are welcome

Attachment:

  • Channel 0 USB IRQ Prio7, Channel 1 Flow Control w. LOW=NAK to OUT, Channel 2 UART IRQ Prio 5, Channel 3 OS tick Prio 0, Channel 4 RX DMA HT;FT Prio 6

Hint: 2xUART IRQ below is send of new 64 byte buffer because TXE IRQ is immediate empty after byte moved to shift out register in UART

  • IRQ calls are framed by GPIO so we won`t see NVIC stack pop time (push registers is included)

12 REPLIES 12
Pavel A.
Evangelist III

> why. we need 3us @ 210MHz = 630 clock cycles with HAL for this UART IRQ

Does the UART interrupt preempt USB?

We have no idea what the IRQ handler does besides of the HAL library overhead.

Generally, prefer LL for working with UARTs. The HAL library is not good.

-- pa

Yes, but its called NVIC (Nested Vector Interrupt Controller by ARM) in the RM: USB Prio 7 channel 0; UART Prio 5 channel; so UART is doeing nested IRQ with the USB IRQ..and USB is continued later after pop of stack

What IRQ total times do you have round about for simple stuff like UART @ core speed ?

I can live with UART IRQ time . Did you wrote complete USB handler with LL ?

A lot off LL marcos are removed in 1.15 for STM32F7xx; I noticed on porting my DMA RX handler from f3xx

S.Ma
Principal

Don t use HAL for usart.

Not using H7 however i think usart ip version do include fifos now.

Do use LL.

I think Interrupt takes 12 cycles max to enter, then same to exit, if not using float within interrupts.

Some people use DMA in a rolling buffer to emulate a fifo which will be checked every msec....

turboscrew
Senior III

I'm not familiar with F7 family, but Arm v7m ARM says (about exceptions)

"When pushing context to the stack, the hardware saves eight 32-bit words, comprising xPSR, ReturnAddress, LR

(R14), R12, R3, R2, R1, and R0."

Those are, of course popped at exception return.

That takes some cycles alone.

Then Cube has somewhat "complete" handling of interrupts. It checks about all possible causes and calls callbacks if callbacks for pending reasons are configured.

That makes interrupt processing a bit slow. At least slower than ad-hoc interrupt handling that only checks for necessary things.

  • LL is part of HAL and the macros are not availabel in 7xx HAL and do not show at brakpoint whats going on
  • UART is easy and not the problem; can you tell us what speed you will win at USB IRQ if IRQ handler is selfcoded ?
  • to do full duplex with 4 UARTS at 115200 you must keep below 21.7us at one channel in sum to do 4 without byte lost on the others (NVIC) ..guessing and belive is wasting time
  • I have a total time now of 55us for one channel (USB IN&OUT+UART) ..how selfwritten code should do double speed instead of HAL ? please explain by IRQ handler code example for USB
  • If its 8 32 bit words its a liitel bit more then 80ns for push and pop @ 200Mhz 5ns x16 so that not the issue
  • Cortex M7 has 72 register and I also belive that not all are saved by push and pop on IRQ: 720ns .. not the issue if NVIC on impossible worst case complete async its at 4 channels 720ns x (3x4)-1 =7us or at 8x32 below a 1 us .. push and pop NVIC is not the issue :thumbs_up:
  • still USB IRQ total time is the issue
  • do you think you can make IRQ handling of USB HAL in minimum 2.2 times faster ? I do not ! ;)

thanks for the info turboscrew

S.Ma
Principal

Everytime there is a jump or a call(back) in the interrupt and the ART accelerator is lost, hence the RAM location for ISR.

LL should not be used for UART interrupts, CMSIS is quite appropriate there.

Moreover, an interrupt for a UART makes sense to handle a crash and terminate packet transmission.

Everything else should be done by dma.

Pavel A.
Evangelist III

> but its called NVIC (Nested Vector Interrupt Controller by ARM) 

If I remember correctly, preemption (nesting) of interrupts should be enabled. By default NVIC does not preempt, only prioritizes.

> Did you wrote complete USB handler with LL ?

USB is complicated, so I use the HAL library for it. UARTs are simple, so can be done with LL or home brewn register access code.

If ST has not turned around to provide a LL library for F7 yet, you can borrow the code from other compatible MCUs.

DMA can be a better solution for several busy UARTs, but more complicated.

--pa