2024-03-11 10:15 AM
I made external flash loader for STM32H735G-DK. I’m able to read, write and erase flash. However, programming 1M of flash takes ~ 50 seconds. STM provided loader takes around 7 sec for the same. Octo SPI in both cases, clocks are the same- verified, DTR in both cases- verified. I am testing on the same board, the same STLINK, so hardware is not limiting factor.
Would anyone have an idea what method STM loader uses for such a huge speed improvement?
Solved! Go to Solution.
2024-03-11 03:42 PM
I'd suspect you have some HAL_Delay() somewhere or something adding delay. Most of the erase/write is paced by the part, and shouldn't differ between implementations.
Perhaps instrument, or use a TIM count for micro-second delays
2024-03-11 11:15 PM
You haven't told us yet if you are using HAL functions for your own programming.
If yes, go through these and check for while() and HAL_Delay().
And check if you are actually using the flash in octal mode. The speed difference is close to a factor of 8, so maybe you are using it in single bit/IO SPI mode.
2024-03-12 08:50 AM
Got 1M programming close to ST loader. 1M programming ~9.5 sec vs ~7 sec with ST loader. It is good enough for now. You guys were right it was HAL issue.
Thank you all that commented.
2024-03-12 08:56 AM
@AS1956 We are some curious folks here, so could you please give us a hint what the actual problem was and how you solved it? ;)
Thanks!
2024-03-12 09:49 AM
Overwrote __weak void HAL_Delay(uint32_t Delay) with:
void HAL_Delay(uint32_t Delay){
int i=0;
for (i=0; i<0x1000; i++);
}
Works well.
2024-03-12 09:57 AM
Wow, that's ... interesting.
HAL_Delay() is used in many, many HAL functions, so you might "break" other HAL stuff with this modification.
2024-03-12 10:41 AM
"Externa loaders" cannot use interrupts.
2024-03-12 11:03 AM
Seems to be working well in the loader context. I can not take credit for this. I did see it in one of the examples from “stm32-external-loader-main”. What surprises me a bit is that constant value used in the Delay does not influence speed of any of the processes (erase, program, verify). I tried few radically different values with no noticeable difference. I guess bulk of the time is used by self-timed operations in the memory.