Skip to main content
Associate II
March 11, 2024
Solved

Slow performance of custom external flash loader on STM32H735G-DK.

  • March 11, 2024
  • 8 replies
  • 4408 views

            I made external flash loader for STM32H735G-DK. I’m able to read, write and erase flash. However, programming 1M of flash takes ~ 50 seconds. STM provided loader takes around 7 sec for the same. Octo SPI in both cases, clocks are the same- verified, DTR in both cases- verified. I am testing on the same board, the same STLINK, so hardware is not limiting factor.

Would anyone have an idea what method STM loader uses for such a huge speed improvement?

Best answer by AS1956

Seems to be working well in the loader context. I can not take credit for this. I did see it in one of the examples from “stm32-external-loader-main”. What surprises me a bit is that constant value used in the Delay does not influence speed of any of the processes (erase, program, verify). I tried few radically different values with no noticeable difference. I guess bulk of the time is used by self-timed operations in the memory.

8 replies

LCE
Principal II
March 11, 2024

Maybe you are using some blocking HAL stuff?

I would grab a scope and compare some signals.

Pavel A.
Super User
March 11, 2024

STM provided loader takes around 7 sec for the same. 

Does this include erase of 1 MB?

 

AS1956Author
Associate II
March 11, 2024

14:59:38 : Memory Programming ...
14:59:38 : Opening and parsing file: testbinary1M.bin
14:59:38 : File : testbinary1M.bin
14:59:38 : Size : 1.13 MB
14:59:38 : Address : 0x90000000
14:59:38 : Erasing memory corresponding to segment 0:
14:59:38 : Erasing external memory sectors [0 18]
14:59:43 : Download in Progress:
14:59:46 : File download complete
14:59:46 : Time elapsed during download operation: 00:00:07.732

 

 

Yes

Tesla DeLorean
Guru
March 11, 2024

Hard to know..

ST uses 64KB sector erase, and 1 KB pages.

If ST's is smaller in RAM there is more for payload data. Perhaps less initialization / reset, repeatedly?

Some comparative logs at Verbose Level 3 might be informative.

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
AS1956Author
Associate II
March 11, 2024

logs included, if you wish to take a look.

Thanks

AS1956Author
Associate II
March 11, 2024

too fast

AS1956Author
Associate II
March 11, 2024

AS per MX25LM51245G data sheet programming page is 256 bytes. Are you saying that somehow ST uses 1KB?

Tesla DeLorean
Guru
March 11, 2024

Actually is reports taking multiples of 0x1000 / 4KB (16 x 256) per Write() operation, so there are some operational efficiencies there, it decomposes to 256-byte pages internally.

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
LCE
Principal II
March 12, 2024

You haven't told us yet if you are using HAL functions for your own programming.

If yes, go through these and check for while() and HAL_Delay().

And check if you are actually using the flash in octal mode. The speed difference is close to a factor of 8, so maybe you are using it in single bit/IO SPI mode.

AS1956Author
Associate II
March 12, 2024

Got 1M programming close to ST loader. 1M programming ~9.5 sec vs ~7 sec with ST loader. It is good enough for now. You guys were right it was HAL issue.

Thank you all that commented.

LCE
Principal II
March 12, 2024

@AS1956  We are some curious folks here, so could you please give us a hint what the actual problem was and how you solved it? ;) 

Thanks!

AS1956Author
Associate II
March 12, 2024

Overwrote __weak void HAL_Delay(uint32_t Delay) with:

void HAL_Delay(uint32_t Delay){

  int i=0;

  for (i=0; i<0x1000; i++);

}

Works well.

LCE
Principal II
March 12, 2024

Wow, that's ... interesting.

HAL_Delay() is used in many, many HAL functions, so you might "break" other HAL stuff with this modification.

 

Pavel A.
Super User
March 12, 2024

"Externa loaders" cannot use interrupts.