cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F103 - different processing rate

zexx86
Associate II
Posted on May 12, 2014 at 15:19

Hello

I have a problem with different speed of processing rate while doing even trivial changes in source code.

I mean it seems as CPU is changing its clock, so for same instructions it takes significantly longer for CPU.

For example my code contains primitive delay function with NOPs. But when I add somewhere (from point of programmer - does not matter where) piece of additional code then sometimes it happen. Same source code works significantly slower (20-50%) and the rate is dependent on changes.

Strange is, I checked Clock and it is still 72Mhz. Also I've examined some options in RCC, PWR, etc, but I am unable to determine what is wrong.

I am nearly sure problem is in uninitialised memory.

It seems as such thing in CPU is reading memory from some address used for firmware.

It is enough to for example add or remove some lines in source code - if (variable), changing variable type, ... to fix slow rate but it is very tricky to fix it sometimes.

This rate is only changing when program is compiled again with such changes.

It must be memory related, since I found more issues with unitialised structures in my case.

I checked my BSS section is zeroed too.

It seems PWM rate and other interrupts are not affected even it is hardcoded to 72Mhz in init functions.

My code is based on ST StdLibrary 3.4.0, no additional changes in template.

Using Linaro GCC.

Thanks for any help

#compiler-processing-speed
15 REPLIES 15
os_kopernika
Associate II
Posted on May 12, 2014 at 15:48

Are you aware that there are no uCs that can read flash op-codes at 72MHz?

How would you solve that problem if contemporary flash works at ~20MHz rate?

Read 4 op-codes at a time? With some wait states in between?

Well, that is how STM32F103 works.

So you either read 4 opcodes or 8 or 12 or ... N*4, use what is needed and discard all the rest.

Now, if you need to load 4*nop then either one or two fetches need to be taken depending on how you instructed ld to place your nops.

If you want to test the performance without wait states then execute from a memory that does not have wait states.

In your case it is enough to have fixed timing, not necessarily without wait states - just align op-codes with ld.

Posted on May 12, 2014 at 15:58

I seriously doubt the processor is changing speed. You need to examine the code that is looping and understand what is causing it to iterate longer. You can examine the original source, you can look at a disassembly listing.

Remember uninitialized stack (local/auto) variables will have RANDOM content in them. Make sure you have caught all .BSS names, review the .MAP file to confirm where thing are situated.

/* This is the uninitialized data section */
.bss :
{
. = ALIGN(4);
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .;
*(.bss)
*(.bss.*)
*(COMMON)
. = ALIGN(4);
/* This is used by the startup in order to initialize the .bss secion */
_ebss = . ;
} >RAM

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
zexx86
Associate II
Posted on May 12, 2014 at 18:34

Thank you very much

I added section (.bss.*) to my .ld file, unfortunately it does not help.

I've also checked .MAP file, but I can't see any other .bss section and its address is correct I think.

Here is section list from my .ELF binary:

 

Section Headers:

 

  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al

 

  [ 0]                   NULL            00000000 000000 000000 00      0   0  0

 

  [ 1] .isr_vector       PROGBITS        08000000 008000 00010c 00   A  0   0  1

 

  [ 2] .text             PROGBITS        08000110 008110 009f58 00  AX  0   0  8

 

  [ 3] .data             PROGBITS        20000000 018000 00011c 00  WA  0   0  4

 

  [ 4] .bss              NOBITS          2000011c 01811c 000444 00  WA  0   0  4

 

  [ 5] ._usrstack        NOBITS          20000560 01811c 000100 00  WA  0   0  1

 

  [ 6] .comment          PROGBITS        00000000 01811c 000026 01  MS  0   0  1

 

  [ 7] .ARM.attributes   ARM_ATTRIBUTES  00000000 018142 000027 00      0   0  1

 

  [ 8] .debug_aranges    PROGBITS        00000000 018170 0007b8 00      0   0  8

 

  [ 9] .debug_pubnames   PROGBITS        00000000 018928 002dd3 00      0   0  1

 

  [10] .debug_info       PROGBITS        00000000 01b6fb 017b04 00      0   0  1

 

  [11] .debug_abbrev     PROGBITS        00000000 0331ff 005403 00      0   0  1

 

  [12] .debug_line       PROGBITS        00000000 038602 0062e9 00      0   0  1

 

  [13] .debug_frame      PROGBITS        00000000 03e8ec 002d6c 00      0   0  4

 

  [14] .debug_str        PROGBITS        00000000 041658 004abd 01  MS  0   0  1

 

  [15] .debug_loc        PROGBITS        00000000 046115 0080ab 00      0   0  1

 

  [16] .debug_ranges     PROGBITS        00000000 04e1c0 000578 00      0   0  8

 

  [17] .debug_pubtypes   PROGBITS        00000000 04e738 001ae9 00      0   0  1

 

  [18] .shstrtab         STRTAB          00000000 050221 0000e2 00      0   0  1

 

  [19] .symtab           SYMTAB          00000000 05064c 006370 10     20 901  4

 

  [20] .strtab           STRTAB          00000000 0569bc 002f20 00      0   0  

1

I know my problem is very curious, but it seems as whole code is running slower just because I modified line which is doing nearly nothing.

Even code which is not related with main loop at all can affect speed of it.

I've measured running time (with SysTick) with and without issue and I can see significant difference.

I can clearly see it on this delay function:

while (i --)

for (j = 0; j < 200; j ++);

My firmware does not use any additional threads, only main thread + interrupts.

So if I can see difference in this function, it must be related with processing rate of instructions or CPU is overloaded in some strange way?

zexx86
Associate II
Posted on May 13, 2014 at 00:19

Could it be wrong FLASH alignment?

I've checked FLASH->ACR and it's value is still 32.

But when I change Flash Latency to 1, it works fast as it should but now ACR value is 31.

I guess Latency 1 is not OK for 72Mhz - but it works for me.

So something is strange with Flash Latency in my firmware..

It really seems as instructions could be misaligned.

I can put whatever line into source to fix this problem and it works with Latency 2 correctly as it should. Then when I add next line, problem may appear again.

Sometimes speed is only little bit slower, sometimes it is even slower.

Even swapping lines could do the trick. Very frustrating.
Posted on May 13, 2014 at 01:12

The current version of the library is 3.5.0

You want to make sure that SystemInit() in system_stm32f1xx.c is being called, typically this is done prior to the C runtime startup code, and before the jump to main()

You should also consider outputting your clocks via PA8 (MCO) and using a scope to confirm the internals speeds.

Yes, FLASH is very slow (perhaps 35-40 ns), and will be sensitive to the flash line width. In the subsequent STM32 (F2/F3/F4) designs the performance of the flash is masked somewhat by a caching mechanism outside the core.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on May 13, 2014 at 08:43

Post two examples (both source and disassembly, or mixed-view disassembly) with different behaviour, and explain how exactly do they differ.

JW
zzdz2
Associate II
Posted on May 13, 2014 at 12:14

Try to align the loop code to 8 bytes so it corresponds to the 64-bit prefetch buffer.

When correctly aligned at the start of the loop all 8 bytes are fetched and code execution continues smoothly.

When not aligned, possibly only one prefetched halfword is executable and CPU needs to wait for another flash read.
zexx86
Associate II
Posted on May 13, 2014 at 13:58

Big thanks to all

I can clearly see how addresses of functions are changing in MAP file while doing small changes and I've checked not all are aligned even 4 bytes.. BUT

I've tried -falign-functions=8 to align all functions which works by the way only for -O3 not for -Os but it does not solve my issue.

My compiler is arm-none-eabi-gcc version 4.5.2 (Linaro GCC 4.5-2011.02-0)

Small alignment changes in LD script helped too, but only for current source code. When I changed some line, problem is here again.

So probably it is not alignment issue when all functions are aligned but it does not help.

Do you think Latency 1 is reliable for 72Mhz? Because this temporary solution just works.
Posted on May 13, 2014 at 14:31

My compiler is arm-none-eabi-gcc version 4.5.2 (Linaro GCC 4.5-2011.02-0)

That's pretty antiquated, can't you find a newer 4.6.x or 4.7.x version?

Do you think Latency 1 is reliable for 72Mhz?

I'm surprised it's even functional.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..