2014-05-12 06:19 AM
Hello
I have a problem with different speed of processing rate while doing even trivial changes in source code.I mean it seems as CPU is changing its clock, so for same instructions it takes significantly longer for CPU.For example my code contains primitive delay function with NOPs. But when I add somewhere (from point of programmer - does not matter where) piece of additional code then sometimes it happen. Same source code works significantly slower (20-50%) and the rate is dependent on changes.Strange is, I checked Clock and it is still 72Mhz. Also I've examined some options in RCC, PWR, etc, but I am unable to determine what is wrong.I am nearly sure problem is in uninitialised memory.It seems as such thing in CPU is reading memory from some address used for firmware.It is enough to for example add or remove some lines in source code - if (variable), changing variable type, ... to fix slow rate but it is very tricky to fix it sometimes.This rate is only changing when program is compiled again with such changes.It must be memory related, since I found more issues with unitialised structures in my case.I checked my BSS section is zeroed too.It seems PWM rate and other interrupts are not affected even it is hardcoded to 72Mhz in init functions.My code is based on ST StdLibrary 3.4.0, no additional changes in template.Using Linaro GCC.Thanks for any help #compiler-processing-speed2014-05-12 06:48 AM
Are you aware that there are no uCs that can read flash op-codes at 72MHz?
How would you solve that problem if contemporary flash works at ~20MHz rate? Read 4 op-codes at a time? With some wait states in between? Well, that is how STM32F103 works. So you either read 4 opcodes or 8 or 12 or ... N*4, use what is needed and discard all the rest. Now, if you need to load 4*nop then either one or two fetches need to be taken depending on how you instructed ld to place your nops. If you want to test the performance without wait states then execute from a memory that does not have wait states. In your case it is enough to have fixed timing, not necessarily without wait states - just align op-codes with ld.2014-05-12 06:58 AM
I seriously doubt the processor is changing speed. You need to examine the code that is looping and understand what is causing it to iterate longer. You can examine the original source, you can look at a disassembly listing.
Remember uninitialized stack (local/auto) variables will have RANDOM content in them. Make sure you have caught all .BSS names, review the .MAP file to confirm where thing are situated./* This is the uninitialized data section */
.bss :
{
. = ALIGN(4);
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .;
*(.bss)
*(.bss.*)
*(COMMON)
. = ALIGN(4);
/* This is used by the startup in order to initialize the .bss secion */
_ebss = . ;
} >RAM
2014-05-12 09:34 AM
Thank you very much
I added section (.bss.*) to my .ld file, unfortunately it does not help.I've also checked .MAP file, but I can't see any other .bss section and its address is correct I think.Here is section list from my .ELF binary:Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .isr_vector PROGBITS 08000000 008000 00010c 00 A 0 0 1
[ 2] .text PROGBITS 08000110 008110 009f58 00 AX 0 0 8
[ 3] .data PROGBITS 20000000 018000 00011c 00 WA 0 0 4
[ 4] .bss NOBITS 2000011c 01811c 000444 00 WA 0 0 4
[ 5] ._usrstack NOBITS 20000560 01811c 000100 00 WA 0 0 1
[ 6] .comment PROGBITS 00000000 01811c 000026 01 MS 0 0 1
[ 7] .ARM.attributes ARM_ATTRIBUTES 00000000 018142 000027 00 0 0 1
[ 8] .debug_aranges PROGBITS 00000000 018170 0007b8 00 0 0 8
[ 9] .debug_pubnames PROGBITS 00000000 018928 002dd3 00 0 0 1
[10] .debug_info PROGBITS 00000000 01b6fb 017b04 00 0 0 1
[11] .debug_abbrev PROGBITS 00000000 0331ff 005403 00 0 0 1
[12] .debug_line PROGBITS 00000000 038602 0062e9 00 0 0 1
[13] .debug_frame PROGBITS 00000000 03e8ec 002d6c 00 0 0 4
[14] .debug_str PROGBITS 00000000 041658 004abd 01 MS 0 0 1
[15] .debug_loc PROGBITS 00000000 046115 0080ab 00 0 0 1
[16] .debug_ranges PROGBITS 00000000 04e1c0 000578 00 0 0 8
[17] .debug_pubtypes PROGBITS 00000000 04e738 001ae9 00 0 0 1
[18] .shstrtab STRTAB 00000000 050221 0000e2 00 0 0 1
[19] .symtab SYMTAB 00000000 05064c 006370 10 20 901 4
[20] .strtab STRTAB 00000000 0569bc 002f20 00 0 0
1
I know my problem is very curious, but it seems as whole code is running slower just because I modified line which is doing nearly nothing.Even code which is not related with main loop at all can affect speed of it.I've measured running time (with SysTick) with and without issue and I can see significant difference.I can clearly see it on this delay function: while (i --) for (j = 0; j < 200; j ++);My firmware does not use any additional threads, only main thread + interrupts.So if I can see difference in this function, it must be related with processing rate of instructions or CPU is overloaded in some strange way?2014-05-12 03:19 PM
Could it be wrong FLASH alignment?
I've checked FLASH->ACR and it's value is still 32.But when I change Flash Latency to 1, it works fast as it should but now ACR value is 31.I guess Latency 1 is not OK for 72Mhz - but it works for me.So something is strange with Flash Latency in my firmware..It really seems as instructions could be misaligned.I can put whatever line into source to fix this problem and it works with Latency 2 correctly as it should. Then when I add next line, problem may appear again.Sometimes speed is only little bit slower, sometimes it is even slower.Even swapping lines could do the trick. Very frustrating.2014-05-12 04:12 PM
The current version of the library is 3.5.0
You want to make sure that SystemInit() in system_stm32f1xx.c is being called, typically this is done prior to the C runtime startup code, and before the jump to main() You should also consider outputting your clocks via PA8 (MCO) and using a scope to confirm the internals speeds. Yes, FLASH is very slow (perhaps 35-40 ns), and will be sensitive to the flash line width. In the subsequent STM32 (F2/F3/F4) designs the performance of the flash is masked somewhat by a caching mechanism outside the core.2014-05-12 11:43 PM
Post two examples (both source and disassembly, or mixed-view disassembly) with different behaviour, and explain how exactly do they differ.
JW2014-05-13 03:14 AM
Try to align the loop code to 8 bytes so it corresponds to the 64-bit prefetch buffer.
When correctly aligned at the start of the loop all 8 bytes are fetched and code execution continues smoothly.
When not aligned, possibly only one prefetched halfword is executable and CPU needs to wait for another flash read.2014-05-13 04:58 AM
Big thanks to all
I can clearly see how addresses of functions are changing in MAP file while doing small changes and I've checked not all are aligned even 4 bytes.. BUTI've tried -falign-functions=8 to align all functions which works by the way only for -O3 not for -Os but it does not solve my issue.My compiler is arm-none-eabi-gcc version 4.5.2 (Linaro GCC 4.5-2011.02-0)Small alignment changes in LD script helped too, but only for current source code. When I changed some line, problem is here again.So probably it is not alignment issue when all functions are aligned but it does not help.Do you think Latency 1 is reliable for 72Mhz? Because this temporary solution just works.2014-05-13 05:31 AM
My compiler is arm-none-eabi-gcc version 4.5.2 (Linaro GCC 4.5-2011.02-0)
That's pretty antiquated, can't you find a newer 4.6.x or 4.7.x version?Do you think Latency 1 is reliable for 72Mhz?
I'm surprised it's even functional.