2024-09-04 03:12 PM - edited 2024-09-05 10:54 AM
Hi,
i just tried to see the reaction on some events on a system with H7A3 cpu and was surprised:
EXTI response time was 50 us ! This is a lot for a ARM cpu running at 200MHz - too much, i would say.
ED: wrong timings measured, my error : pins have cer.caps to filter out noise - and also make delays on the signal.
See "solution" , timing measured on pin without any caps -> to INT (setting other pin) is close to expected.
Here the wrong timings :
So i made some tests :
EXTI response time on external signal , setting a port pin in INT , check on a DSO the delay:
on H7A3 , at 200MHz core, I-cache on, -O0 : 50 us !! (never less than 50, sometimes more...because other INT);
on H563 , at 250MHz core, I-cache on, -O2 : 25 us ! (no other INT coming, INT at prio 13 )
on H563 , at 250MHz core, I-cache on, -O2 : 25 us ! (no other INT coming, INT at prio 1 ) ! no diff.
on H563 , at 250MHz core, I-cache on, -O0 : 25 us ! (no other INT coming, INT at prio 1 ) ! no diff.
on H743 , at 200MHz core, I+D cache on, -O2 : 0.5 us ! (constant ) (=> this seems ok, still 100 core clocks needed.)
All on Cube generated init code, on STM32cubeIDE V1.14.1 , standard settings (GCC, -C11 etc, no warnings/errors).
So...any explanation to this ?
Is the "expected" timing from NVIC somewhere given ? I didnt see in ds, rm or PM0253, Cortex®-M7 processor programming manual -- so anybody knows this ? Or is it some kind of secret info... ??
Is there so much and different delay on these cpus or do i something very wrong ? ed.: yes.
Solved! Go to Solution.
2024-09-05 01:03 AM - edited 2024-09-05 10:47 AM
@TDK , i know about the ingenious tricks in the ARM core, like out of order execution, save stack with one instruction (in 12 core clocks about); thats why i never before made a check: what is it doing in reality .
Now the check shows some problems...
ok, ist clear, on a "hi-speed cpu" with 6 or 7 stage pipeline and wait states on almost everything (because only the core and cache and maybe CC-RAM can work without wait states) there will be some more clocks in real world - BUT 50us ?
ed: its from hardware, 10nF at the pin, making signal rise delayed.
Code i didnt show..because nothing special here, just making a pulse , to check on the scope:
trigger /ch1 , on external signal ; ch2 on the (LED) pin, that is set in EXTI INT :
/**
* @brief This function handles EXTI line3 interrupt.
*/
void EXTI3_IRQHandler(void)
{
/* USER CODE BEGIN EXTI3_IRQn 0 */
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_2, SET); // set LED on (HAL call needing about 200 ns)
/* USER CODE END EXTI3_IRQn 0 */
HAL_GPIO_EXTI_IRQHandler(DRIMP_2_EXTI3_Pin);
/* USER CODE BEGIN EXTI3_IRQn 1 */
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_2, RESET); // set LED off
/* USER CODE END EXTI3_IRQn 1 */
}
/**
... and this comes out :
added: about 10us from external signal blue to cpu-pin (yellow), by opto-coupler there:
(with long persistance, to see many INT on one screen) -> some delayed a little, because other INT coming - thats clear.
And HAL_GPIO...set needs about 200ns, i tested separate. Nothing to worry about.
BUT NEVER shorter than about 50us delay, from signal to INT ! (about 10us here from circuit, with opto coupler at input; but still 40us delay from cpu , that i cannot explain.)
ed.: The (almost) 50us delay coming from opto coupler, because the rising (!) pulse is used for INT, and this is much slower than the falling (i checked with scope at first).
So just the 500ns delay remains, but i will check this also, again, with other settings for stack .
But this is no "drama" . :)
Now "correct" test on H563ZIT , at 250M core, I-cache on :
EXTI -> INT time about 120...140ns ; so about 25 core cycles, more than expected - but close.
2024-09-04 04:57 PM
A reasonable question: but before comparing the INT response time, you had to make sure to have the same conditions. Example:
H743 has also DCache on: the other not?
Where is your stack memory (on which memory, with DCache or not)?
What happens on an INT?
The MCU will push some registers on the current local stack. If this memory is a slow memory, or DCache is not enabled - it takes longer time.
I think, what you have figured out is: enabling DCache (esp. on memory used for stack), makes it faster (and you get reasonable values). All other slower INT response time can be due to "no DCache" or using a slower memory (clock config).
So, from a first glance: it differs because of DCache on or not (disable at least also DCache on H743, resulting for all tests in a "write through"). And check, that stack memory is on a SRAM which does not need more paths through a bus fabric.
Also: make sure that the code execution (ICache) is similar, but different chips can differ in different Flash ROM speed. The first code fetch can be slower as any other following repetition of the INT handler. Make sure, to have the same Flash ROM latency setting, measure just the second time (not the first time which can take longer to ICache the instructions.
Or write the code so that INT handler and stack is located on ITCM/DTCM. If just one setup is using INT handler code from Flash Memory, where the other setup uses the same code from ITCM - it differs a lot in speed.
2024-09-04 06:16 PM
Seeing the code would be helpful. It's hard to believe there's a 49.5 us difference between two chips that share the same core. Are you using HAL callbacks or are you putting code directly into the IRQ handler in the vector table?
> Is the "expected" timing from NVIC somewhere given ? I didnt see in ds, rm or PM0253, Cortex®-M7 processor programming manual -- so anybody knows this ? Or is it some kind of secret info... ??
It's quite hidden. 12-14 cycles on the M7.
*Cortex-M for beginners (arm.com)
Be aware there are other things that can increase this time, such as flash wait states, cache misses, probably others.
There is (I believe) additionally some delay between the edge happening and the NVIC bit getting set. Few cycles max I would imagine.
Also, there is a delay between when you write to GPIO->BSRR and the pin actually going high due to the time it takes to send that over the system bus. Less than a few us.
None of that explains the 50 us you're seeing.
2024-09-05 01:03 AM - edited 2024-09-05 10:47 AM
@TDK , i know about the ingenious tricks in the ARM core, like out of order execution, save stack with one instruction (in 12 core clocks about); thats why i never before made a check: what is it doing in reality .
Now the check shows some problems...
ok, ist clear, on a "hi-speed cpu" with 6 or 7 stage pipeline and wait states on almost everything (because only the core and cache and maybe CC-RAM can work without wait states) there will be some more clocks in real world - BUT 50us ?
ed: its from hardware, 10nF at the pin, making signal rise delayed.
Code i didnt show..because nothing special here, just making a pulse , to check on the scope:
trigger /ch1 , on external signal ; ch2 on the (LED) pin, that is set in EXTI INT :
/**
* @brief This function handles EXTI line3 interrupt.
*/
void EXTI3_IRQHandler(void)
{
/* USER CODE BEGIN EXTI3_IRQn 0 */
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_2, SET); // set LED on (HAL call needing about 200 ns)
/* USER CODE END EXTI3_IRQn 0 */
HAL_GPIO_EXTI_IRQHandler(DRIMP_2_EXTI3_Pin);
/* USER CODE BEGIN EXTI3_IRQn 1 */
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_2, RESET); // set LED off
/* USER CODE END EXTI3_IRQn 1 */
}
/**
... and this comes out :
added: about 10us from external signal blue to cpu-pin (yellow), by opto-coupler there:
(with long persistance, to see many INT on one screen) -> some delayed a little, because other INT coming - thats clear.
And HAL_GPIO...set needs about 200ns, i tested separate. Nothing to worry about.
BUT NEVER shorter than about 50us delay, from signal to INT ! (about 10us here from circuit, with opto coupler at input; but still 40us delay from cpu , that i cannot explain.)
ed.: The (almost) 50us delay coming from opto coupler, because the rising (!) pulse is used for INT, and this is much slower than the falling (i checked with scope at first).
So just the 500ns delay remains, but i will check this also, again, with other settings for stack .
But this is no "drama" . :)
Now "correct" test on H563ZIT , at 250M core, I-cache on :
EXTI -> INT time about 120...140ns ; so about 25 core cycles, more than expected - but close.