HAL_RCC_GetSysClockFreq(void) wastes code space with unnecessary (unt64_t) divide.

EdGi · ‎2020-04-24

HAL_RCC_GetSysClockFreq(void) in stm32f7xx_hal_rcc.c uses a (unt64_t) divide which could be avoided. This uses, on my build, 716 bytes of code space alone for the required _udivmoddi4 call, and this is not used anywhere else in my code.

Would it be possible to investigate recreating the calculation more efficiently?

There is already some bit manipulation in the calculation, and it would appear that the 64-bit datatype is just not required for the significant bits involved.

TDK · ‎2020-04-24

The relevant code is

pllvco = (uint32_t) ((((uint64_t) HSE_VALUE * ((uint64_t) ((RCC->PLLCFGR & RCC_PLLCFGR_PLLN) >> RCC_PLLCFGR_PLLN_Pos)))) / (uint64_t)pllm);

which is effectively

pllvco = HSE_VALUE * PLLN / PLLM;

They have chosen accuracy over small code size here. By multiplying by N before dividing by M, accuracy is increased in the event HSE_VALUE/HSI_VALUE is not a multiple of M.

HSE_VALUE * PLLN can definitely overflow a uint32_t. Consider PLLN = 432 and HSE_VALUE = 16MHz.

716 bytes is a lot, but hey if you're using HAL you've already made some significant concessions on code size. I could see an argument either way.

If you feel a post has answered your question, please click "Accept as Solution".

berendi · ‎2020-04-24

There are some unusual but perfectly legal clock settings that require 64 bit calculations.

The bit manipulations are there only to extract bitfields from configuration registers,

Can you perhaps propose a more efficient calculation? I would not trust the HAL developers with anything that complicated.

RMcCa · ‎2020-04-24

Calculate the value in advance and use pllvco = 0xnnnn;

2 cycles?

TDK · ‎2020-04-24

That assumes the clock never changes. Which in most applications is not the case. Most switch at least once from the starting HSI clock.

If you feel a post has answered your question, please click "Accept as Solution".

chaaalyy · ‎2020-04-24

Yepp... When it comes to HAL, even in a simple addition, i would bitcheck the result for possible overflows manually ;)

chaaalyy · ‎2020-04-24

1 cycle ... M7´s can do 2 instructions in one cycle xD

RMcCa · ‎2020-04-24

Excellent.

the clock initialization is one of the very few bits of HAL in my code. It works and is called only once.

EdGi · ‎2020-04-25

Thanks for commenting (everyone) and TDK, that is indeed the code in question.

While HSE_VALUE * PLLN can overflow a uint32_t, consider what that represents - it's the internal PLL frequency and of course it would have to be over ~4.3GHz before that would happen.

In fact, in the case of my chosen device and probably most others, this value is limited in CubeMX to 432MHz. The calculation will never overflow a uint32_t.

Correct me if I'm mistaken!

I do take onboard your comments about using the HAL and control over code size - their goal is code which works in every case, but this instance really could be improved!

Looking closer into my build, this function is only called to set baud rates in UARTs, so I could probably lose it by substituting a few HAL calls in there.

berendi · ‎2020-04-25

Note that the ordering of the multiplications and divisions in the expression is different from the actual ordering of the frequency multipliers and dividers in the hardware. The hardware divides HSE with PLLM first and then multiplies the resulting frequency with PLLN, but if they had coded it straight as

pllvco = (HSE_VALUE / PLLM) * PLLN;

a significant loss of precision were possible in the (HSE_VALUE / PLLM) subexpression when its result is not an integer. They had to choose between three possible ways of calculating this for every possible legal combination of HSE, PLLM and PLLN

The code above using integer arithmetic and sometimes getting imprecise results
The code above with floating point arithmetic, go figure the impact on code size and speed
The way they did it with 64 bit integer arithmetic

While (HSE_VALUE * PLLN) can be as high as 25*432 MHz, it is a pure mathematical value, never actually generated anywhere on the silicon.