cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H745 Maximal performance configuration

Jvela.11
Associate II

I'm working in a project that involve a thermal camera image processing.I need to process a buffer of 384*288 (16bits) in fastest way possible.I'm using the 480Mhz VCO template to set the procesor in 400Mhz (Cache enabled).

I already did some code optimization using FPU unit and I'm getting about 65hz (toggle every frame processing in a loop).but I want to know if using ITCM SRAM instead of flash I can reach more speed , or some other tecnique.

Other matter is that I'm not getting a expectedd value when I use the cyce counter , for example when a measure a __NOP() , the result is 32!

About the "processing" that I did .That is only multiplications and sums in float

PD: Is there a way to instantly share the memory between processor to do real parallel process ?

Thanks ! I hope a response soon

2 REPLIES 2
TDK
Guru

> I want to know if using ITCM SRAM instead of flash I can reach more speed

Running code from RAM will result in a speed increase, but running from FLASH is pretty heavily optimized. If you have instruction cache enabled, I wouldn't expect a significant speed increase.

> Other matter is that I'm not getting a expectedd value when I use the cyce counter , for example when a measure a __NOP() , the result is 32!

Measuring execution time has overhead, unless you have a very clever implementation. You're likely measuring the overhead more than the NOP.

> Is there a way to instantly share the memory between processor to do real parallel process ?

Both cores have access to SRAM1, 2 and 3. Put the array there and you can read it from either one. You'll have to manage the cache settings appropriately. If both cores are reading and writing, it can get complicated. If the flow is one way, it's much easier.

If you feel a post has answered your question, please click "Accept as Solution".
Jvela.11
Associate II

Thanks for the reply! TDK

I'm using the 480Mhz VCO example from ST32Cube IDE (https://github.com/STMicroelectronics/STM32CubeH7/tree/beced99ac090fece04d1e0eb6648b8075e156c6c/Projects/NUCLEO-H745ZI-Q/Examples/PWR/PWR_VOS0_480MHZ) in 400Mhz mode.It already enables the cache ( I and D).My cycle counter routine is too simple .I migrated the code from stm32f446( cortex m4) and where I used to measure the cycles to guide me in the manual C optimization.

Also I'm using O2 optimization for the building.If there is no hardware or extra configuration missing .I would have to explore a assebler optimization.And In this casea I like to know if there is a detailed guide or example to properly mix the arm assembler with C

these are the clock configuration from STM32Cube IDE Hal libraries based:

static void SystemClock_Config_400MHz(void) {

RCC_OscInitTypeDef RCC_OscInitStruct;

HAL_StatusTypeDef ret = HAL_OK;

/* Enable HSE Oscillator and activate PLL with HSE as source */

RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSE;

RCC_OscInitStruct.HSEState = RCC_HSE_BYPASS;

RCC_OscInitStruct.HSIState = RCC_HSI_OFF;

RCC_OscInitStruct.CSIState = RCC_CSI_OFF;

RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;

RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE;

RCC_OscInitStruct.PLL.PLLM = 4;

RCC_OscInitStruct.PLL.PLLN = 400;

RCC_OscInitStruct.PLL.PLLFRACN = 0;

RCC_OscInitStruct.PLL.PLLP = 2;

RCC_OscInitStruct.PLL.PLLR = 2;

RCC_OscInitStruct.PLL.PLLQ = 4;

RCC_OscInitStruct.PLL.PLLVCOSEL = RCC_PLL1VCOWIDE;

RCC_OscInitStruct.PLL.PLLRGE = RCC_PLL1VCIRANGE_1;

ret = HAL_RCC_OscConfig(&RCC_OscInitStruct);

if (ret != HAL_OK) {

Error_Handler();

}

}

and CACHE:

static void CPU_CACHE_Enable(void) {

/* Enable I-Cache */

SCB_EnableICache();

/* Enable D-Cache */

SCB_EnableDCache();

}