STM32 I-CODE and D-CODE buses

ds1 · ‎2015-06-26

Posted on June 26, 2015 at 14:40

STM32 documentation says that the I-CODE and D-CODE buses are connected to the internal flash memory. The I-CODE bus is used to fetch instructions and the D-CODE bus is used for data access in the code memory region (literal load).

The question is why two separate buses are used? Can they provide simultaneous and absolutely independent access to flash memory?

#stm32 #arm #cortex #bus-matrix

Tesla DeLorean · ‎2015-06-26

Posted on June 26, 2015 at 17:28

Can they provide simultaneous and absolutely independent access to flash memory?

This is highly doubtful. The ART unit does however provide an accelerated prefetch, that's faster than a SRAM access on cache line hits.

The I-CODE path is likely optimized for fixed size and alignment. ARM's design and integration manuals might be enlightening in this regard.

The FLASH array is large, wide, slow and awkward, I wouldn't expect more than one line to be presented at a time.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

ds1 · ‎2015-06-28

Posted on June 28, 2015 at 18:12

clive1, thank you for your comment. I managed to find some hints on the subject here

http://community.arm.com/message/11530

(in a message posted by Simon Craske). But the info is still incomplete:

Cortex-M3 assumes that a read to the same address on either the D-Code or I-Code ports will access the same data; how this is achieved is up to the implementor, but can make a significant difference to the performance of the device, options (not necessarily all practical) might include:

1. Simple arbiter (matrix) into a single slave (Flash/ROM/RAM).

2. Flash with prefetching to I-Code and a bypass for D-Code.

3. Dual-port RAM with read-only to I-Code, and read/write to D-Code.

4. Some kind of cache system - possibly supporting hit-under-miss between I/D-Code.

As far as I understand the behaviour of the Bus Matrix and the interface to Flash memory depends on the MCU vendor and is not specified by ARM. It would be great to know the details of the implementation in STM32.

Amel NASRI · ‎2015-06-29

Posted on June 29, 2015 at 12:48

Hello crtx2000,

The following 3 programming manuals may bring you some help:

-

http://www.st.com/st-web-ui/static/active/en/resource/technical/document/programming_manual/DM00046982.pdf

for Cortex-M4 based devices

-

http://www.st.com/st-web-ui/static/active/en/resource/technical/document/programming_manual/CD00228163.pdf

for Cortex-M3 based devices

-

http://www.st.com/st-web-ui/static/active/en/resource/technical/document/programming_manual/DM00051352.pdf

for Cortex-M0 based devices.

-Mayla-

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Tesla DeLorean · ‎2015-06-29

Posted on June 29, 2015 at 15:38

Somehow I think you'd need to get a more specific set of answers from the engineers doing the integration of the IP.

Some more data specifically on the ART, and it's connections to the FLASH array, CPU buses, and it's anticipated interactions.

As I said, the cache does provide an accelerated prefetch path, but it important to understand that at some point your going to have a singular access to the array, and suitable arranged code will bottleneck on that. A cache line hit into the prefetch path is faster than a 0-waitstate SRAM access.

Perhaps you can explain in more detail why the data is critical/important to your project?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

jpeacock · ‎2015-07-01

Posted on July 01, 2015 at 19:37

In general separate I and D buses to a memory array are needed because code and data often do not originate on the same cache line. Code is (mostly) sequential so a cache hit is likely on subsequent instruction fetches. Data behaves differently, more random access and in a different area than code. So the cache line for data is independent of instruction cache line.

For the M3 the cache is minimal, and the M4 (for the STM series) only marginally larger. But the M4 cores can support much larger caches and on some vendors using the ARM big.LITTLE technology you will see L1 cache on an M4 as large as 32K, so the dual bus is significant to the core IP design. As you move up the embedded ARM models the cache becomes a major performance feature (M7, A5, A7, etc.).

For the STM32F4 the dual bus has some other advantages when optimizing for the bus matrix. Offloading data access to CCM or SRAM reduces contention with flash, but the D bus still lets you use DMA, running in parallel with the CPU, to transfer from flash to a peripheral or SRAM. And if you use the SRAM2 bank for DMA you don't lose the D cache line.

Jack Peacock

ds1 · ‎2015-07-02

Posted on July 02, 2015 at 15:04

My current project does not have any special requirements regarding internal flash access. I just needed some clarification on how STM32 buses are organized. It' always good to understand the internals.

Thank you all for the helpful answers!

BeMercy · ‎2015-07-02

Posted on July 02, 2015 at 15:23

hi admin：

i need your help，A question about UART DMA，thank you�? https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=%2fpublic%2fSTe2ecommunities%2fmcu%2fLists%2fcortex_mx_stm32%2fSTM32F103%20UART%20DMA%EF%BC%88RX%20AND%20TX%EF%BC%89%20can%20%20execute%20only%20once%20%EF%BC%8C&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&TopicsView=https%3A%2F%2Fmy%2Est%2Ecom%2Fpublic%2FSTe2ecommunities%2Fmcu%2FLists%2Fcortex_mx_stm32%2FAllItems%2Easpx&currentviews=14