cancel
Showing results for 
Search instead for 
Did you mean: 

Why the IRQ needs so many cycles

BSchm.0
Associate II

Hardware & HAL:

  • STM32F767ZI Nucleo 210MHz
  • Cube MCU Package 1.15
  • Application USB (12Mbit FS) to UART bridge 115200 baud, 8bit 1s 1s, full duplex test (RX-TX bridged)

Tested:

  • The test on /dev/ttyACM0 is running without any errors on all packet 2-64KB sizes for 3 days (all compares ok of sen received file)
  • no delay between 64 Bytes on TX
  • Flow Control working fine (NAK) on USB OUT endpoint
  • I made some IRQ time measurements with the logic analyzer and synced it with USB bus analyzer

Result:

  • UART IRQ needs about 3us
  • USB IRQ on 3x endpoints is checked by loop (Control,IN,OUT)( 3x cycle (e.g. OUT, DATA,ACK) = 3us x9 = about 27us and we see this multiplied UART IRQ time matches well with the measured USB IRQ total time for transmit OUT to TX

My Conclusion:

  • you could not do multi channel USB to UART full duplex x2 or x3 or x4 bridge with any STM32F with out drop outs on the other channels ! (only highest IRQ sub group is fine)
  • CPU to slow or HAL needs to many cycles for IRQ handling
  • if more then 3 EPs are checked in USB IRQ handler time will extend for multi channel bridge in comparison to single channel... makes it even worser
  • change to HS 480 Mbit makes it much worser because of NAK EP flow control
  • ok 3x times IRQ because of USB protocol to get the bulk data is caused by USB.org 😁
  • there is no sampe code from STM for more then 1x USBbride to UART ..why ..its not working to do cheaper then FDTI ASIC x2 x4

Question:

  • why we need 3us @ 210MHz = 630 clock cycles with HAL for this UART IRQ
  • how many times faster is your HAL replacemet code of UART and USB for IRQ handling
  • any time infos of thid party suppliers IRQ total times are welcome
  • dont forget your CPU core speed otherwise info is not comparable
  • no guessing only IRQ times measurements are welcome

Attachment:

  • Channel 0 USB IRQ Prio7, Channel 1 Flow Control w. LOW=NAK to OUT, Channel 2 UART IRQ Prio 5, Channel 3 OS tick Prio 0, Channel 4 RX DMA HT;FT Prio 6

Hint: 2xUART IRQ below is send of new 64 byte buffer because TXE IRQ is immediate empty after byte moved to shift out register in UART

  • IRQ calls are framed by GPIO so we won`t see NVIC stack pop time (push registers is included)

12 REPLIES 12
BSchm.0
Associate II

Solved:

The CPU is not really processing at core speed because of 5 DMA slots (CPU,DMA1,DMA2,USB,ETHER)

So only 210MHz /5 is available for CPU slot if not the complete ISR is inside instruction cache.. and pop could not happen because stack is not inside I-Cache

I checked by logic analyzer and it also matches with calculation if you single step your ISR . estimate the ISR cycles by 2 for each single step and multiply with 5/210MHz.. voila nearly the same as in logic analyzer

If you want the exactly cycle time you can enable at cortex M 7 cycle count register

void UART5_IRQHandler(void)
 {
   /* USER CODE BEGIN UART5_IRQn 0 */
+       volatile unsigned int *DWT_CYCCNT   = (volatile unsigned int *)0xE0001004;
+       COUNT1 = *DWT_CYCCNT;
 #if !defined(LED_LOGIC_ANALYZER)
   HAL_GPIO_TogglePin(GPIOG, GPIO_PIN_10); //green LED used UART IRQ
 #else
@@ -262,7 +261,7 @@ void UART5_IRQHandler(void)
   iIRQ_UART5++;
 #endif
 
-
+  COUNT2 = *DWT_CYCCNT;^M
   /* USER CODE END UART5_IRQn 1 */
 }

and print it out in you lowest IRQ routine to get NVIC worst case by RTOS in polling UART mode =)

iRX_DMAcounter: 31 iRX_DMAcounterNow/s: 2 iRX_DMAcounterMax/s: 3
iIRQ_UART5: 3061 iIRQ_UART5Now/s: 171 iIRQ_UART5Max/s: 190
iCDCBlockRequest: 340 iCDCBlockRequestNow/s: 9 iCDCBlockRequestMax/s: 10
iSeconds: 47
CYCLES: 1545
CYCLES min: 752 => 3.58 us
CYCLES max: 1566 => 7.46 us
CDCReceive: 350 CDCReceiveNow/s: 10 CDCReceiveMax/s: 10
iRX_DMAcounter: 34 iRX_DMAcounterNow/s: 3 iRX_DMAcounterMax/s: 3
iIRQ_UART5: 3251 iIRQ_UART5Now/s: 190 iIRQ_UART5Max/s: 190
iCDCBlockRequest: 350 iCDCBlockRequestNow/s: 10 iCDCBlockRequestMax/s: 10
iSeconds: 48
CYCLES: 1543
CYCLES min: 752 => 3.58 us
CYCLES max: 1566 => 7.46 us
CDCReceive: 359 CDCReceiveNow/s: 9 CDCReceiveMax/s: 10
iRX_DMAcounter: 36 iRX_DMAcounterNow/s: 2 iRX_DMAcounterMax/s: 3
iIRQ_UART5: 3422 iIRQ_UART5Now/s: 171 iIRQ_UART5Max/s: 190
iCDCBlockRequest: 359 iCDCBlockRequestNow/s: 9 iCDCBlockRequestMax/s: 10
iSeconds: 49
CYCLES: 1447
CYCLES min: 752 => 3.58 us
CYCLES max: 1566 => 7.46 us

 at after some seconds you get your NVIC event from e.g in this case its OS Tick with high paket load in full duplex at 115200 (brigded RX & TX load only and not 100% async)

CReceive: 100527 CDCReceiveNow/s: 181 CDCReceiveMax/s: 182
iRX_DMAcounter: 100124 iRX_DMAcounterNow/s: 182 iRX_DMAcounterMax/s: 182
iIRQ_UART5: 6690741 iIRQ_UART5Now/s: 12132 iIRQ_UART5Max/s: 12145
iCDCBlockRequest: 100530 iCDCBlockRequestNow/s: 181 iCDCBlockRequestMax/s: 182
                                                                              iSeconds: 738
CYCLES: 657
CYCLES min: 452 => 2.15 us
CYCLES max: 2193 => 10.44 us
 

 next step is to count the cycles intelligent by USB IRQ al lowest prio) NVICed by worst case.. result we lost a byte because we are too slow (below 86us) for IRQ handling service 😂 ..and /dev/ttyACM0 1,2,3,4 torture test aknowled by error 😂

>I think Interrupt takes 12 cycles max to enter, then same to exit, if not using float within interrupts.

Correct answer.. But HAL is not as bad as expected.. maybe HAL about 18 in best case..with out my debug pin and counter stuff.. in some case I have to care also about IDLE and DMA ring buffer

.. also depends on the amount of the flags and if NVIC happens to your UART

see my answer at the end of the post (solved)

5 DMA explain why we do nat have real 210MHz for ISR ..that was the stuff I want to understand.

We only have about 40MHz in real for processing ISRs :confounded_face:

IN the picture you can see dual load of emty UART TXE (2x). becaue its a NVIC of the USB you can see a pop and push time 😁 at fake 210MHz because of fixed DMA channels