2024-11-18 08:23 PM
I am new to Arm but old to assembly coding (inc Toshiba 32 bit CISC - now extinct).. Just completed RCC startup code for a STM32F344 chip by looking at the 1124 pages (!) of the RM0364 manual and the CUBE LL start up code (!!!). The CUBE code has waits for: Flash latency/HSI clk/LSI clk/PLL clk/Sysclk ready (at min). As all the waits involve the CPU in reading flash correctly to perform the wait loop, I wondered why wait?? The CPU might as well be doing something useful while it is waiting - and save some space while doing it. So I cut out all the waits. Works fine- amd code a lot smaller/neater. Yes of course everything has to settle down by the time the UART starts up and reads 250,000 baud sensibly but that is a long way down the track.
So why wait?? .
2024-11-18 08:38 PM
Correction - STM32F224 chip not 344!
2024-11-18 08:40 PM
Let me get it right this time - STMF334 chip!!
2024-11-18 09:15 PM - edited 2024-11-19 01:37 AM
You are right - the cpu can be doing other things while waiting for e.g. the pll to stabilise. But stm32cube is meant to be simple, and code that takes advantage of this otherwise-wasted cpu time gets complicated.
For example the pll might overshoot the target frequency while trying to lock. So you can’t reliably switch to it until you are confident that it won’t go so fast as to exceed the specs and risk crashing.
So your clock-setting code now has two calls - one to start the pll and another after you have filled this otherwise-wasted time to switch to the pll.
Now consider this: the cpu will be running slowly until you switch to the pll, so you don’t want to try to do too much before switching; the optimum case might be to periodically check if the pll has locked and if so, switch to it. And then continue with what you were doing. The code could quickly get messy unless you use a timer interrupt to periodically check if the pll has locked (there is no pll-locked interrupt; I guess ST concluded it is too rare to justify the extra circuitry).
It is all a trade-off between many different things. But one if the most expensive things is my time in writing and debugging code. That, after all, is why people use stm32cube to put together something that can be used as a basis for their code. Make stm32cube more efficient and people will find it harder to adapt it to their specific needs.
Edit: I just assumed there wasn't an interrupt as I've never thought to want it. So I didn't even bother to check before replying.
2024-11-19 01:30 AM
Nowhere is said Cube code is optimal.
> I cut out all the waits
You can do that *if* you know exactly what are the consequences of this, especially of *all the waits*. And not every combination of options/steps/states is documented. For example, in some STM32 system clock switch may require that the target clock is already up and stable and the switch won't happen if it's not; this may or may not be documented.
Also, with some peripherals you want to be sure you may run at a certain frequency (e.g. timer driven external circuitry depending on particular pulse width, otherwise it bursts into flames). With skipping the waits, it may happen that the actual frequency switch happens only after some time, when the oscillators/PLL are stable
Btw. RCC has interrupts, so you can make much of that setup to be interrupt-driven, if you want (although, as @Danish1 pointed out above, not all state changes can trigger an interrupt, so this may turn out to be tricky).
It's your leg, your shot, and you've been warned.
JW
2024-11-19 01:34 PM
Ta very much Danish and Jan for your excellent comments both (not to mention the warning!).
Yes I can see the PLL cd get itself in a frenzy and crash the flash.
Out of curiosity I set up the AHB and APB1/2 prescalers first, set the PLL to 16 x the input 8MHz HSI/2 clk for a PLL output of 64Mhz. Next (assembly = machine code) instruction - wait time nigh on nothing) switched the Sysclk from HSI to PLL output . Works fine every time! Maybe ST designed a better PLL than they give themselves credit for.. But I will wait for the PLL in the final code just in case..
All this just for the record!!!
And thank you both for your interest!
2024-11-19 03:56 PM
Jan
I read yr FLASH_ACR post - v interesting and thanks for the reference,
I had noted that the CUBE LL code implemented that flash wait check too. And yr explanation was better than ST;s ref manual (RM0364 for the STM32F334 chip) .
ST: I do wish the whole of that manual was written in a succinct imperative tense than the passive/verbose style much of it is currently, An army will lose every war without a system of clear and concise orders.
So I will include a read back check of flash wait states too!
Which inspires me to do some tests of just how many wait loops those PLL and Flash take in practice!
Will post back with some figures when done.
2024-11-19 05:31 PM
RCC typical setup wait loop count results:
Loop in assembly pseudo-code:
set loop cnt = -1
NRDY: read ready bit
inc loop cnt
tst ready bit
jp NRDY if not ready
Results:
FLASH_ACR ready loops = 33 decimal (always after change - didnt expect that!)
HSI ready = 0 (always - not so surprising - its been on since reset)
8 x PLL ready = 63 decimal. EVERY time!
16 x PLL ready = 100 decimal. EVERY time!
So the PLL always takes the same loooong time to ramp up to speed with longer times for higher mul factors - at least on this chip. Obviously not an analogue PLL circuit,
I have learnt lots!
Stuff I wd never have any idea of if I had gone the HAL driver way which (in my case) resulted in total obfuscation/obscuration of what was happening under the hood,
Again ta Daniel and Jan v much for your input.
Rex, New Zealand
2024-11-19 09:59 PM
I conclude you are not using LSE. That is the slowest to start in my experience.
As to the order of doing things, you could start the PLL before setting the APB/AHB dividers and wait-states. Only then wait for PLL loch. That way some of the PLL startup time is spent usefully.
Danish
2024-11-20 12:43 AM
Ta for those ideas Danish. All is grist to this mill!.
Yes -don't need any crystal - less bits, less failures, The 8 Mhz HSI can do everything we need - receive 8 bytes of data at 250Khz baudrate, and apply those 4 words to 4 channels of ~2OkHz, 16 bit resolution PWM to drive high power RGBW LED modules. Thank you ST for your HRTIM1! (about 128x32/72 = 57 times faster than the 72 mHz EFM8 in the outgoing models. - as in https://theatrelight.co.nz/led-stage-light/
Having done those checks of real wait times and based on this chat with you and Jan, I now have a much clearer path forward!
After all this just the start up code yet already its prob about five times faster and a fifth of the size of the Cube LL code (not to mention the HAL implementation). Arm coding with the GNU assembler was initially quite daunting but with the help of a mile of macros and lots of SED preprocessing I am now more comfortable.. Also the ST Programmer GUI allows me for example to test wait times as mentioned above.
I just like working at the nuts and bolts level. And I have a life- long loathing of C!!!
Thank you for your help Danish!
Rex