About performance in stm32f103c8t6
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2020-06-22 5:37 AM
Would be the part of the program code executed faster if its section was linked to be moved from flash to ram on startup? This code is executed many times.
- Labels:
-
STM32F1 Series
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2020-06-22 6:19 AM
Mostly linear code with run fast in flash. Jumps however will stall for the number of wait state cycle. Putting program in flash will keep RAM busy, maybe delaying loading data from ram or vice versa.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2020-06-22 2:30 PM
Is this a genuine STM32F103?
The counterfeits may behave differently, especially those which emulate FLASH by RAM loaded from an on-chip serial FLASH.
As Uwe said, running code from SRAM, while it's nominally faster and does not impose waitstates, will cause conflicts with other usage of the SRAM. The 'F1 is a relatively simplistic design compared to e.g. 'F2 - e.g. there's only one SRAM, SRAM can't be remapped to the I/D ports of processor - so there are not many degrees of freedom in what you can do to improve execution from SRAM, and it may quite well turn out that running from SRAM is no faster or even slower than running from FLASH.
You'd have to benchmark yourself, perhaps on the most computationally intensive part of your own code.
JW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2020-06-22 2:38 PM
The F1 is a sloth, the slowness of the FLASH is always shared with the processor, as there is no attempt to cache, or hold lines for the prefetch queue.
Putting critical code in RAM, including the vector table, would certainly be worth trying and benchmarking.
With questions like this, JUST TRY IT, it is not hard to do, or quantify.
Up vote any posts that you find helpful, it shows what's working..
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2020-06-22 11:24 PM
Thanks!
I tried on purpose with non-optimal code and its execution in RAM was 11% faster than in flash.
Reallocation in RAM is only on the code, without the vector table.
uint32_t t1, t2, su, in;
t1 = Millis();
su = 0;
in = 1;
la1:
su += in;
in++;
if(in < 1000000) goto la1;
t2 = Millis();
xprintf("R: %lu %lu %lu\n", t2, t1, (long)(t2-t1));
-------------------------------------------------
F: 153125 152754 371
R: 153453 153125 328
------------------------------------------------
/usr/local/bin/stm32flash -b 115200 /dev/ttyUSB0
stm32flash 0.5
http://stm32flash.sourceforge.net/
Interface serial_posix: 115200 8E1
Version : 0x22
Option 1 : 0x00
Option 2 : 0x00
Device ID : 0x0410 (STM32F10xxx Medium-density)
- RAM : 20KiB (512b reserved by bootloader)
- Flash : 128KiB (size first sector: 4x1024)
- Option RAM : 16b
- System RAM : 2KiB
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2020-06-23 10:58 PM
Test shows faster execution of the code in RAM for this device.
