cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7xx: AXI Interface delays interupt service routine

andywild2
Associate III

Hello to all,

We have to measure movements via an quadratur encoder.
This is done via an external interupt that writes a timer capture to external memory.
The old product used an STM32F2xxx with 120Mhz, the ISR was about 1us.

However with the STM32F7 sometimes it takes up to 3us!

After heavy debugging I found this in the forum:
https://community.st.com/t5/stm32cubeide-mcus/delayed-interrupts-during-fmc-access-on-stm32h7/td-p/113181

It explaines that FMC accesses which still are stored in the AXI 4 stage storage buffer could be getting drained in the interupt service routine, thus causing a huge delay!

I cannot believe that such a sophisticated design as the STM32F7xx has such a bad interupt behaviour!

How can I avoid that the ISR is delayed because of the AXI bridge?

Thanks a lot for help

Andy

8 REPLIES 8
SofLit
ST Employee

Hello,

Are you executing the interrupt handlers in the FMC??

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

Hello ,

No the interupt handlers is done in the ITCM_RAM of the cpu.
One other fact: Even if I do not write the result to the external memory within the ISR the fault persists.

 

What about the timings of the FMC compared to STM32F2. Do you have the same memory timings?

Are you using the Cache?

FMC memory region defined as cacheable by default? for example these regions are not cacheable by default:

SofLit_0-1727107164891.png

PS: the FMC on STM32F7 product is connected on AHB and not on AXI like the case of STM32H7.

For performance aspects, please refer to the application note AN4667 "STM32F7 Series system architecture and performance"

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

Hello,

yes I have the same memory timings
I use the cache, however if I disable it, I get the same faults.

I use the 0x6C000000 region which is cacheable by default.
Using the MPU with settings:
TEX : 0
Sharable: Enable
Bufferable: Enable
Cacheable: Enable

made the fault to occur less frequently.

Yes you are right in the STM32H7 the FMC is connected to the AXIM Interface.

But in STM32F7 the FMC is connected to the core via the busmatrix and the AXI Interface.
This AXI interface has also a 4 stage storage buffer acc. to the CORTEX-M7 architecture manual.


@andywild2 wrote:

But in STM32F7 the FMC is connected to the core via the busmatrix and the AXI Interface.
This AXI interface has also a 4 stage storage buffer acc. to the CORTEX-M7 architecture manual.


There is an AXI to AHB bridge on F7 so AXI transactions are transparent to the user.

Need to check if there are interrupts that delays your read/write or something else ..

I don't think F2 is more performant than F7.

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
Pavel A.
Evangelist III

IIRC F7 indeed has two options of accessing internal flash as instruction memory, at different addresses: the default 0x08000000 and something other.  Perhaps look at the second non-default option?

the interupt handlers is done in the ITCM_RAM of the cpu.

The vector table only, or whole handler function(s)?

 

Hello,

Thank you for asking all this details, I will try to make it clear:
There are no other interupts around except of SysTick which has priority 15, which is lowest.
My external interupt has 0, which is highest priority.
Only the interupt handler works out of the ITCM Ram, not the vector table. --> Please note I have no problem with interupt latency, or the duration of the ISR if no fault occurs. If I have the ISR in the flash it takes a little bit longer which is no problem. --> about 500ns when out of RAM, about 750ns when out of flash.
But when the fault occurs the ISR takes up to 3500ns!!
I measure the duration with a timer count difference at beginning and end of the ISR.
To say it again: There is no fault (makling the ISR longer) when there there are NO external memory accesses in the main program. Normaly my main program does updating of the display with the results. The frame buffer is in external memory, thats why there is a lot of traffic on the FMC. In order to supress the fault, I just cancel the display updating:
Bingo, the ISR duration stays stable at about 500ns.
In my theory, there is the "AXI to multi AHB" that has a 4 stage internal buffer for any accesses to the bus matrix on each of it´s three AHB channels. The problem occures when there is an external asynchronous interupt just at that time when this 4 stage buffer is kind of full and has to be drained. In the fault situation the draining of the buffer happens unfortunately in the ISR, which makes it quite longer because the external memory accesses are slow.

I think the AXI interface just acts as a bridge between the core and the bus matrix. As a simple interface the AXI has no idea in which context the core is running. It does not destinguish between interupt context and main program context. If the AXI decides to drain its buffer and the core is servicing an interupt at the same time: --> Bad luck for the interupt service routine!!

And in this regard the STM32F2xx is way more predictable than the STM32F7xx: The F2 services the interupt with stable 1us duration. It does not care about external accesses to memory. It has no fancy AXI.

The F7 can service the interupt with 500ns if there is no fault. But you CAN NOT COUNT on this performance, because sometimes it takes up to 3-4us!!
Thus we have to reduce our specs for highest pulse rate for the external interupt to this slow speed even though the chip is supposed to be much faster.
This is our big problem.

 

Another fact:
I changed the parameters of the FMC to make external memories extremly slow. I.e. DataSetUpTime = 100
That results in 5us with SystemCoreClock of 200Mhz. And I turned of the data cache.

But still the ISR manages to do the work sometimes with only 650ns (and that includes a write access to the slow external memory). In my opinion this is a proof that there must be a 4 stage buffer in the AXI interface. With none of this buffer this fast timing would not be possible!