cancel
Showing results for 
Search instead for 
Did you mean: 

Slow performance of custom external flash loader on STM32H735G-DK.

AS1956
Associate II

            I made external flash loader for STM32H735G-DK. I’m able to read, write and erase flash. However, programming 1M of flash takes ~ 50 seconds. STM provided loader takes around 7 sec for the same. Octo SPI in both cases, clocks are the same- verified, DTR in both cases- verified. I am testing on the same board, the same STLINK, so hardware is not limiting factor.

Would anyone have an idea what method STM loader uses for such a huge speed improvement?

17 REPLIES 17

I'd suspect you have some HAL_Delay() somewhere or something adding delay. Most of the erase/write is paced by the part, and shouldn't differ between implementations.

Perhaps instrument, or use a TIM count for micro-second delays

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
LCE
Principal

You haven't told us yet if you are using HAL functions for your own programming.

If yes, go through these and check for while() and HAL_Delay().

And check if you are actually using the flash in octal mode. The speed difference is close to a factor of 8, so maybe you are using it in single bit/IO SPI mode.

AS1956
Associate II

Got 1M programming close to ST loader. 1M programming ~9.5 sec vs ~7 sec with ST loader. It is good enough for now. You guys were right it was HAL issue.

Thank you all that commented.

LCE
Principal

@AS1956  We are some curious folks here, so could you please give us a hint what the actual problem was and how you solved it?  

Thanks!

AS1956
Associate II

Overwrote __weak void HAL_Delay(uint32_t Delay) with:

void HAL_Delay(uint32_t Delay){

  int i=0;

  for (i=0; i<0x1000; i++);

}

Works well.

LCE
Principal

Wow, that's ... interesting.

HAL_Delay() is used in many, many HAL functions, so you might "break" other HAL stuff with this modification.

 

Pavel A.
Evangelist III

"Externa loaders" cannot use interrupts.

 

AS1956
Associate II

Seems to be working well in the loader context. I can not take credit for this. I did see it in one of the examples from “stm32-external-loader-main”. What surprises me a bit is that constant value used in the Delay does not influence speed of any of the processes (erase, program, verify). I tried few radically different values with no noticeable difference. I guess bulk of the time is used by self-timed operations in the memory.