Predicting the ART Accelerator Flash memory read operations on the STM32F405RGT
I have set up my FLASH_ACR to disable all caches or prefetch options, but with 5 WAIT_CYCLES. My goal is to predict when the read operations happen and how long they take in processor cycles.
I provided some example code below, but my goal is to get a general understanding, not just for this specific example.
f000 fdb7 bl <entry>
f004 0101 and.w r1, r4, #1
1e4b subs r3, r1, #1
426c negs r4, r5
4249 negs r1, r1
ea01 0005 and.w r0, r1, r5
401c ands r4, r3
4044 eors r4, r0
f000 fdb4 bl <exit>According to the docs: "Each Flash memory read operation provides 128 bits from either four instructions of 32 bits or 8 instructions of 16 bits according to the program launched."
My current understanding is that every branch results in a flash read operation. When returning from the first "bl" (line 1) the ART, therefore, reads 128-bit. This would include the instructions on lines 2-7 including line 7. Inbetween line 7 and line 8 another flash read operation would then need to happen.
In total this would cost me an extra 2*5 cycles next to the cycles my 7 instructions (AND.W, SUBS, NEGS, NEGS, AND.W, ANDS, EORS) take. This result is unexpected as it means that I spend over half of my cycles on flash read operations.