Code Optimization

vbk22398 · ‎2024-09-23

Hi, I am using 3 different STM32MCU's for our various projects, which include STM32F423, STM32L496, STM32F407 etc., I am using HAL Layers for this code in general. Now I need to create HAL Layers for these boards only, and I need to reduce the overheads caused by generic HAL code generated by cube ide. Suggest me some solutions.! what is the efficient and quick way to do this. Register level code is too much time consuming and very error prone and at the same time, I need to balance this for overheads also. Kindly help!

Andrew Neil · ‎2024-10-08

@unsigned_char_array wrote:
Sometimes float can be used instead of double.

And remember that the Cortex-M4 FPU only does float - not double ...

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/10-useful-tips-to-using-the-floating-point-unit-on-the-arm-cortex--m4-processor#:~:text=Floating%20point%20numbers%20can%20be,handle%20the%20calculation%20in%20software.

@unsigned_char_array wrote:
Or use fixed point instead of floating point.

Or just (scaled) integers ...

vbk22398 · ‎2024-10-09

@Andrew Neil

Again, have you done any analysis to find specifically what bloat and/or bottlenecks you may or may not have? Or is this just some general notion that HAL code must be slow & bloated?

Why? What is the purpose of that?

I don't know how to do this. Could you explain me how? i am not understanding "bloat and/or bottlenecks"

and the reason why we are doing this is that, the person who is reading shouldn't be bothered about the backend working of all sensors and its related logic. What he/she should see is a user readable high level code.

#include "AllLib.h"
int main(void)
{
   init();  //this will constitute all the related functions

   while(1){
    control_logic();   //this will include the logic which we write
   }
}

Is it ok if I write all this in HAL, or is it OK if I write all this in LL, or if using CMSIS for all the different different STM32 cores which we have used?

SofLit · ‎2024-10-09

@vbk22398

In next time, please use </> button to paste your code. I edited your comment then ..

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS:
1 - This is NOT an online support (https://ols.st.com) but a collaborative space.
2 - Please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help.

Andrew Neil · ‎2024-10-09

@vbk22398 wrote:
i am not understanding "bloat and/or bottlenecks"

"Bloat" = excessive code size;

"Bottlenecks" = the key places which hold-up code execution; ie, the things which have the biggest impact on execution speed.

So what you need to be doing is looking to see what is actually using up the most code space, and what is actually having the biggest impact on execution speed.

There's no point trying to "optimise" (sic) stuff which isn't actually causing you any real problems!

@vbk22398 wrote:
the reason why we are doing this is that, the person who is reading shouldn't be bothered about the backend working of all sensors and its related logic. What he/she should see is a user readable high level code.

But that has absolutely nothing to do with putting everything into one single .c and .h file.

You need to decide what is your key concern here: is it readability of the source code, or is it maximum execution speed and/or minimum code size?

Although they're not entirely mutually exclusive, they do tend to have conflicting demands.

The whole point of HAL is to "hide" (some of) the arcane details - and the cost of that is in extra code space and, thus, execution time.

Using LL is not a panacea: it means that you will have to write for yourself (some of) the stuff that HAL does for you - so you might not end up saving anything!

Again, this is why you need to be sure that the "optimisation" (sic) is actually needed.

vbk22398 · ‎2024-10-09

@Andrew Neil Sir thanks for the time and reply. My only concern is that, i have to reduce the time and space occupied by the code I write. My superior wants me to do it Bare Metal in register level, but I feel it is overwhelming as there are lots of registers and bit fields to be concerned about. So I am searching if there is a smarter way to do this or any alternate is there for this. That's why I have posted this in forum. Also I don't know how to find "the things which have the biggest impact on speed." Please suggest me what to do in this issue?

Andrew Neil · ‎2024-10-09

@vbk22398 wrote:
@Andrew Neil My only concern is that, i have to reduce the time and space occupied by the code I write.

But if you don't know how much space your code currently occupies, or how long it currently takes to operate - how can you tell if either of them is "too much"?

And how will you know if any other code is any "better"?

How can you tell if there is any wasted space, or anything unduly slowing down the execution?

If there isn't anything, you can't optimise it out!

@vbk22398 wrote:
@Andrew Neil My superior wants me to do it Bare Metal in register level,

Well, if that's the task you've been set, then that's what you need to do!

@vbk22398 wrote:
@Andrew Neil but I feel it is overwhelming as there are lots of registers and bit fields to be concerned about.

Indeed - that is exactly why ST provides HAL (and all other manufacturers provide similar).

When you talk about "optimisation", you need to consider that the programmer's time & effort is also a resource which needs to be "optimised".

But you have been set the task of writing register-level code - if you don't feel up to doing that, then you need to discuss that with your Superior.

@vbk22398 wrote:
Also I don't know how to find "the things which have the biggest impact on speed." Please suggest me what to do in this issue?

If you don't know how to do that, how will you know if your register-level code is actually performing "optimally" - or even any better than HAL?

One of the simplest techniques is to toggle an output pin at suitable points in your code, view that on an oscilloscope or logic analyser, and get the timing from there.

Search "Code Profiling" for more advanced techniques; eg,

https://www.st.com/resource/en/user_manual/um2609-stm32cubeide-user-guide-stmicroelectronics.pdf#page=182

https://en.wikipedia.org/wiki/Profiling_(computer_programming)

TDK · ‎2024-10-09

If you have been given direction on what to do, I would try to do that rather than worrying about it being too difficult. It takes time to learn.

HAL is not some bloated beast that needs to be changed. It does what it does well, and is fairly general.

The task to "rewrite HAL but have it consume zero overhead while still being as general as possible" is not a reasonable task or expectation.

If you feel a post has answered your question, please click "Accept as Solution".

unsigned_char_array · ‎2024-10-09

@vbk22398 wrote:
My superior wants me to do it Bare Metal in register level, but I feel it is overwhelming as there are lots of registers and bit fields to be concerned about.

Why does your superior want that? Performance reasons? Or because your superior wants all code to be written in house without any third party code?

STM32CubeMX has HAL and LL.
LL is Low Level and basically only uses macros or inline functions to directly access to registers, while HAL uses functions. In STM32CubeMX you can select per peripheral if you want to use HAL or LL.
LL is less portable and harder to use. But I would call that bare metal.
My suggestion is to first get your code to work and then one-by-one rewrite the provided functions only if needed.

@vbk22398 wrote:
Also I don't know how to find "the things which have the biggest impact on speed."

Profiling. Measure the speed. One way is to set an IO pin before calling a function and clearing it afterwards. You can use a Logic Analyzer or an oscilloscope to measure the duration of the function. Using different IO pins for different functions can give you a nice visual overview of the timing. You can also use timers to measure duration of functions.
Generally you want to avoid busy waiting for things like peripherals. Example:
Uart sends "Hello world!" at 9600baud 1 stop bits, no parity. This should take 12.5 milliseconds. Usually the uart reports done while the last byte is being send so it can report it is done a little sooner. Waiting for the uart to finish at the end of the send function results in the function to take about 12.5 milliseconds. But you can also check if it is done sending with a separate function. You can do other things in the mean time.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

Andrew Neil · ‎2024-10-09

@unsigned_char_array wrote:
@vbk22398 wrote:
My superior wants me to do it Bare Metal in register level
Why does your superior want that?

@vbk22398 or is this a school assignment, and your Superior wants you to learn how to do register-level coding?

@unsigned_char_array wrote:
My suggestion is to first get your code to work and then one-by-one rewrite the provided functions only if needed.

^^^ Absolutely this! ^^^