cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F4 Flash data cache prefetch possible?

johannes23
Associate II
Posted on September 20, 2016 at 15:11

Is it possible to prefetch flash data to the ART data cache without stalling execution?

As far as I can see the only possibility is using DMA, probably using the DMA2D engine.  So the idea is not to copy data from Flash to SRAM, but to trigger the data prefetch, so it can then be accessed without latency at the natural address.

I think it would only require two clock cycles in the inner loop to prefetch, say, 4 128bit rows, while only transferring 4 words by using the DMA2D line offset.

Would it be possible to discard the output of the DMA2D engine, perhaps by using unmapped address space, to further reduce bus occupation?

Or is there an easier way to trigger prefetching of flash data?

#read #flash #stm32f
3 REPLIES 3
Walid FTITI_O
Senior II
Posted on September 21, 2016 at 12:54

Hi taelman.johannes,

Chrom�ART Accelerator™ is the commercial appellation for STMicroelectronics’ implementation of the DMA2D peripheral on the STM32F429. Its features include:

·

Two input pixel format converters (PFC). These blocks are able to read and to decode bitmap files.

·

The blender, which computes and mixes the data from the two input PFC. 

·

The output PFC, which decodes the information to be sent to the destination.

·

The FIFOs used for both the inputs and the outputs connected to a specific DMA controller

I recommend you to have a look to the STemWin applications under the

http://www.st.com/content/st_com/en/products/embedded-software/mcus-embedded-software/stm32-embedded-software/stm32cube-embedded-software/stm32cubef4.html

firmware package:

STM32Cube_FW_F4_V1.13.0\Projects\STM32469I-Discovery\Applications\STemWin

-Hannibal-
johannes23
Associate II
Posted on September 21, 2016 at 15:37

Hi Hannibal,

The ART I was referring to is the ''Adaptive real-time memory accelerator (ART Acceleratorâ„¢)'', part of the flash memory read interface.

I'd like to prefetch data from Flash into the 8 rows of 128 bits of data cache of the ART, without stalling execution.

Incidentally, the DMA2D engine is indeed called the Chrom-ART Accelerator. Which I think could perhaps be used to initiate such a prefetch. Certainly not a typical use of the DMA2D engine, and library API's are unlikely to help, as it is critical to initiate the DMA transfer of only a few words, in as few cycles as possible in a inner loop. Hopefully this only requires two instruction cycles, one to set the address, another to initiate the DMA, ignoring interrupt and ready status. The DMA2D PFC's and blender serve no specific purpose in this scenario, preferably even completely discarding the output.

Posted on September 21, 2016 at 16:59

The ART is bolted onto the prefetch path of the processor, I'm pretty sure it is not designed to provide flash line width data to the Bus/DMA, and contaminating the cache with high temporal data is not a design goal. I understand what you are asking, but it either works the way you want, or it doesn't, and in asking the question it suggests you know it doesn't.

You'd should be able to readily demonstrate/test that the access to the Flash array is not folded, so each access is going to take the full wait-state hit. Put patterns you want in Internal SRAM

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..