cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7 Questions regarding code and performance

megahercas6
Senior
Posted on December 02, 2015 at 08:20

Hello.

I have very DSP load program to run, and STM32F4 is not up to the task, so i switched to STM32F7 and HAL ( it is just asking for trouble ) I have few question related to code execution from flash storing DSP coefficients for math. 1) STMCubeMX SORTEX_M7 configuration allows for AXI and TCM interface. I have small program, around 16-32k. What i have to use to get best performance? Or this is not a issue ? 2) As far as i understand, for best Cortexm7 performance, DSP coefficients must be held in DTCM, what is the syntax to generate f32 128-256 points array in this part of memory ? This is code i need to get stored inside DTCM:

float sin_coeff[256];
float cos_coeff[256];
uint32_t i = 0;
i=0;
while(i<256)
{
sin_coeff[i]=(float)sinf((6.2831853f*0f*(float)i)/256);
cos_coeff[i]=(float)cosf((6.2831853f*0f*(float)i)/256);
i++;
}

8 REPLIES 8
Nesrine M_O
Lead II
Posted on December 02, 2015 at 11:28

Hi karpavicius.linas,

•“STMCubeMX CORTEX_M7 configuration allows for AXI and TCM interface. I have small program, around 16-32k. What i have to use to get best performance? Or this is not a issue?�

 

When the code size of the user application fits into the internal Flash memory, the latter would be the best execution region either: 

– Through TCM (Flash-ITCM) by enabling the ART-accelerator or

– Through AXI/AHB by enabling the cache in order to reach 0-wait state at 216 MHz

Note that the execution from Flash-ITCM/data in DTCM-RAM and Flash-AXI/data in DTCM-RAM have the same CoreMark score which is 5 CoreMark/MHz.

•“As far as i understand, for best Cortexm7 performance, DSP coefficients must be held in DTCM.�

 

You are right, DTCM-RAM is accessible by bytes, half-words (16 bits), words (32 bits) or double words (64 bits), and it’s accessible at a maximum CPU clock speed without latency, which enables the Cortex-M7 processor to achieve excellent performance in many control and DSP applications. 

•What is the syntax to generate f32 128-256 points array in this part of memory? “

 

The syntax depends on the tool that you use. You have to modify your linker file, create a new memory section starting from 0x2000 0000 to 0x2000 FFFF (64 Kbyte of DTCM-RAM) in order to place your data in this area.

• I'd highly recommend you to have a look to the 

http://www.st.com/web/en/catalog/tools/FM147/CL1794/SC961/SS1743/LN1920/PF262415?s_searchtype=keyword

 application note: it provides a demonstration of the performance of the STM32F7 Series devices in various memory partitioning configurations (different code and data locations).

-Syrine-

megahercas6
Senior
Posted on December 02, 2015 at 13:37

I am using IAR ARM.

It does say, that variable is placed at :

cos_coeff <
array
> 0x2000086C float[50000] 

But can it place array that long that start in DTCM RAM, and end up outside this memory ? Should linker place it in SRAM ?
Nesrine M_O
Lead II
Posted on December 02, 2015 at 15:02

Hi  karpavicius.linas,

With 50000 size of array, it clear that you exceed the 64 Kbyte of DTCM-RAM, so a part of data is placed on DTCM and the rest will be placed in SRAM.

-Syrine-
Posted on December 02, 2015 at 15:08

There are surely a number of options to do this.

You could use fixed pointers, pragma/attribute settings with the memory suitably carved up in the linker script/scatter file, a heap that's sufficiently large for these structures and parked in this area.

The math making the table could probably done more cleanly

sin_coeff[i]=(float)sinf((6.2831853f*0f*(float)i)/256);

sin_coeff[i]=sinf(6.2831853f*0f*(float)i*0.00390625f);  sin_coeff[i]=sinf(3.1415926535897932f*(float)i);  

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
megahercas6
Senior
Posted on December 02, 2015 at 15:19

sin_coeff[i]=sinf(3.1415926535897932f*(float)i);
 Compiler will make my code like that, with highest optimization, it should be smart enough to do math before. Also it is LUT and sinf takes ages to calculate, thats why i never use sinf functions, only LUT 


AvaTar
Lead
Posted on December 02, 2015 at 17:13

>sin_coeff[i]=sinf(3.1415926535897932f*(float)i);
 >
 >Compiler will make my code like that, with highest optimization, it should be smart enough to do math before. Also it is LUT and sinf takes ages to calculate, thats why i never use sinf functions, only LUT 

Why not pre-generating the LUT, and include it as array of constants in the source code ?
megahercas6
Senior
Posted on December 02, 2015 at 17:25

Because it is very flexible, and give best results in terms of performance. It's all about execution speed with me.

AvaTar
Lead
Posted on December 02, 2015 at 20:31

It's just an idea.

I used this method - with integer calculations and a respective LUT - for a tight loop on a F100.

>Because it is very flexible, ...

 

Unless it reads and changes variables that define the resultant LUT at runtime, there is no additional flexibility, compared to a constant table.

Both methods need a rebuild/reflash cycle.

For my application, I wrote a tool to create that LUT as separate source file.

> and give best results in terms of performance. It's all about execution speed with me.

If access to RAM is faster than to code/Flash, that would be an argument. Never tried a F7, though.

And honestly, in that case I would prefer a M7 with double-precision FPU ...