STM32F746 slower then STM32F407???

Evgeny Popov · ‎2018-06-20

Posted on June 20, 2018 at 13:46

As i had lack of speed while using stm32f407 disco, i ordered stmf746 nucleo-144. And What suprised me was that it is slower then stmf407, despite that stm32f07 168 max mhz, and stmf756 is up to 216Mhz. I configured my code with CUBEMX. Simple test:

While(1)

{

HAL_GPIO_WritePin

(GPIOD, GPIO_PIN_12, GPIO_PIN_SET);

HAL_GPIO_WritePin

(GPIOD, GPIO_PIN_12, GPIO_PIN_RESET);

}

gives me 150ns pulses on STM32F746

when the same test gives me 100ns on STM32F407.

I used HSE 8mhz external resonator. Aclually clock config were the same for both mcu, except that

STM32F746 has 216 mhz max when stmf407 168.

So why is that? STM32F746 should be more powerful. Also it was released later

#stm32f407 #stm32f746 #external-rc

STOne-32 · ‎2018-06-20

Posted on June 21, 2018 at 00:13

Hi,

Performance on an MCU is not just Pin I/O toggle limited by many factors: AHB speed ( Frequency, latency of CPU core write and propagation to the physical I/O and then external bus load ).

in advanced MCUs like the Cortex-M cores, Peripheral usage : SPI, I/O, Timer should work in autonomous way like using DMA or builtIn FIFOs to let CPU Cortex do important tasks (Real MIPS) and math computation , Algos. not like legacy old MCUs.

on STM32F7 we have Cortex-M7 core providing twice M4 MIPS (STM32F4) Thanks to pipeline and dual issue instruction set when running at same speed. Then particular care should be taken from where code is put like Cache activation etc.

Have te a look on this app note :

http://www.st.com/content/ccc/resource/technical/document/application_note/0e/53/06/68/ef/2f/4a/cd/DM00169764.pdf/files/DM00169764.pdf/jcr:content/translations/en.DM00169764.pdf

Good lecture,

Ciao

STOne-32

View solution in original post

Tesla DeLorean · ‎2018-06-20

Posted on June 20, 2018 at 14:32

>>

So why is that? STM32F746 should be more powerful. Also it was released later

Perhaps jamming a GPIO up and down, via write buffers, and slower buses is not a good measure of computational work?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Evgeny Popov · ‎2018-06-20

Posted on June 20, 2018 at 15:00

the same thing with SPI

Evgeny Popov · ‎2018-06-20

Posted on June 20, 2018 at 15:14

Btw i dont know what do you mean by 'slower bus' i used the same pins their use the same bus AHB1. I always thought, that more mhz = more speed, is in it related to GPIO pins and mcu's performance?

henry.dick · ‎2018-06-20

Posted on June 20, 2018 at 15:21

'

I always thought, that more mhz = more speed'

Then your expectation is wrong.

That generalization is built on many assumptions, some of which apparently went wrong in your particular case.

waclawek.jan · ‎2018-06-20

Posted on June 20, 2018 at 22:53

... and the twice-as-much MHz 'H7

https://community.st.com/0D50X00009XkWanSAF

... ;)

gives me 150ns pulses onSTM32F746

when the same test gives me 100ns onSTM32F4

There may be other things influencing this sort of 'speed' too, but first, try switching on compiler optimization.

JW

STOne-32 · ‎2018-06-20

Posted on June 21, 2018 at 00:13

Hi,

Performance on an MCU is not just Pin I/O toggle limited by many factors: AHB speed ( Frequency, latency of CPU core write and propagation to the physical I/O and then external bus load ).

in advanced MCUs like the Cortex-M cores, Peripheral usage : SPI, I/O, Timer should work in autonomous way like using DMA or builtIn FIFOs to let CPU Cortex do important tasks (Real MIPS) and math computation , Algos. not like legacy old MCUs.

on STM32F7 we have Cortex-M7 core providing twice M4 MIPS (STM32F4) Thanks to pipeline and dual issue instruction set when running at same speed. Then particular care should be taken from where code is put like Cache activation etc.

Have te a look on this app note :

http://www.st.com/content/ccc/resource/technical/document/application_note/0e/53/06/68/ef/2f/4a/cd/DM00169764.pdf/files/DM00169764.pdf/jcr:content/translations/en.DM00169764.pdf

Good lecture,

Ciao

STOne-32

Tesla DeLorean · ‎2018-06-20

Posted on June 21, 2018 at 00:49

Data transactions to buses beyond the TCM are not occurring in a single-cycle, I'd expect something on the AHB to take at least 4 cycles. I'd also expect any decision or arbitration hardware to add additional delays and sequencing. Buses which take more time are by definition slower. Can't find a good diagram of the timing, but you've got a 6 stage pipeline trying to stuff data into a subsystem that needs to sequence through something clocking at least 1/4 the speed/throughput. Observe the behaviour on a multi-lane motorway when everyone has to suddenly funnel into a single lane at half the rated speed limit.

Your code example is also very poor, you want to get this as close to single instructions as possible (repetitive stores), not call subroutines with asserts, comparisons, shifts and loads.

See how something like this works,

{

volatile uint32_t *p = (volatile uint32_t *)&GPIOD->BSRR;

uint32_t a = 1 << 12, b = 1 << (12+16);

while(1)

{

*p = a;

*p = b;

*p = a;

*p = b;

*p = a;

*p = b;

*p = a;

*p = b;

*p = a;

*p = b;

*p = a;

*p = b;

*p = a;

*p = b;

*p = a;

*p = b;

}

But also observe that whetstone and dhrystone benchmarks don't do IO.

If you want a pin to bang up and down at 54 MHz use a TIM, and zero CPU cycles.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Evgeny Popov · ‎2018-06-21

Posted on June 21, 2018 at 07:22

i didnt use BSRR register, i made a simple test with ODR one

GPIOA->ODR = 0x40;

GPIOA->ODR = 0x0000;

gives me around 12ns, while stm32f407 gives 25ns. So yes its faster when we use registers directly, but i dont undertand why HAL is slower on more powerful mcu. Maybe its some way to oprimize it?

Evgeny Popov · ‎2018-06-21

Posted on June 21, 2018 at 07:34

Im not sure what do you mean by ''

switching on compiler optimization''. Guess its not about CUBEmx clock optimization.

But anyway this is my clock config for both mcu's:

F4:

F7: