Read external SPI Flash with external flashloader and CLI is slow.

Marco.R · ‎2020-04-20

Hello

I implemented an external flashloader for STM32F767 for an external SPI Flash. The size of the flash is 256MBit. I want to readout the whole flash with *.bat file. I tried with this line:

STM32_Programmer_CLI.exe -c port=SWD reset=HWrst freq=8000 -el "PathToFlashloader.stldr" -r 0x90000000 500000 "readfile.bin"

I works properly but only until a size of about 500KBytes (Is this limited by the ST-Link v3 or the STM32CubeProgrammer CLI?)

What I do now, is executing the same line of code (see above) with with different addresses. In the end I get whole memory, but in different files and it tooks a lot of time, because the reconnect of the device tooks a lot of time.

Is there a possibility to get the whole memory in one file?
Is there a possibility to add several readouts to one instance like this?:

STM32_Programmer_CLI.exe -c port=SWD reset=HWrst freq=8000 -el "PathToFlashloader.stldr" -r 0x90000000 500000 "readfile.bin" -r 0x9007A120 500000 "readfile2.bin"

I tried this, but it doesn't work properly.

Are there any other hints for optimizing the script? Primarly I need to read the memory. Writing and erasing is not important.

Thanks

\Marco

Kraal · ‎2020-04-21

Hello,

I would like to have an answer from ST as well regarding this size limit.

Using the GUI the maximum size you can read at once is 1M bytes, it is strange that using the CLI it is 500k...

Also regarding the time it take to read, with ST-Link/V2 I can't get less than 8s to read 1M, can you tell me how long it take for you to read 500k ?

Marco.R · ‎2020-04-21

Hi @Kraal

I'll do some other tests for the size maximum. Maybe also 1Mb is possible, but after 500k I get wrong values. I'm not sure if this comes from the ST-Link or my implementation.

For reading 1M it only needs about 2s or less. I think it is much faster than the ST-Link v2 but I never made tests with a STLink v2, so can't compare these two debuggers.

Do you use an external loader with CLI? Have you any hints?

Kraal · ‎2020-04-21

Interesting that it takes only 2s with V3. As I understood V3 uses full speed driver (480MBps) while the V2 does not, so it could explain the difference.

I developped my own external loader based on ST example, using either standard SPI+DMA or QSPI+DMA. For both the minimum time for 1M was around 8s, however with the GUI.

If I have the time I will test with the CLI, and maybe also with my logic analyzer.

Andreas Bolsch · ‎2020-04-21

The limiting factor here is probably SWD clock rather than USB speed. STLink-V2 is limited to 4 MHz, whereas V3 allows up to 24 MHz.

With V2 I got up to about 150 kBytes/s, with V3 close to 1 MBytes/s. That's goes pretty well along with the SWD clock ratio.

There was even little difference between flash read and flash programming (for the V3 with dual flash, so that the actual page programming

time is almost completely absorbed by the transfer time).

Kraal · ‎2020-04-22

Hi @Andreas Bolsch ,

I would agree regarding the SWD clock, but in @Marco.R case the SWD clock is set to 8MHz, so twice mine.

In any case I will test more my setup, maybe I can improve somewhere.

@Khouloud GARSI , can we get an official answer from ST regarding the max size that we can read with the CubeProgrammer ? Exactly why do we have this limitation, is this something that can be changed for future revision of the tool ?

Marco.R · ‎2020-04-22

I have made some new test and I have to correct my statement.

It is possible to read out the whole line with one line:

STM32_Programmer_CLI.exe -c port=SWD reset=HWrst freq=24000 -el "PathToSTLDR" -vb 1 -r 0xD0000000 10 "dummy.bin" -r 0xD0000000 33554432 "flash.bin"

There is no limit of size. The Programmer splits into 500kB chunks.

I measured with the Logic Analyzer. Thats the result:

The most time is lost between two 500kB chunks (>1.2s). The transfertime of one 500kB chunk is less than 0.6s. For 32MB it tooks about 2min to transfer.

@Khouloud GARSI : Is there a possibility to reduce the time between the 500kB chunks?

Thanks

\Marco

Kraal · ‎2020-04-25

I also tested the CLI with a Nucleo-L432KC and a QuadSPI flash memory.

I was wondering why @Marco.R you had to do a dummy read beforehand but the CLI would not do a full read without it (it would stop after the first chunk). @Khouloud GARSI is this a known issue ?

In my case it takes 2min and 25s to read 16M in one go. The transfer is divided in chunks of 43520 bytes (my device has only 64k of SRAM), with a delay of approx. 300ms in between.

I believe the difference of delays is due to the size of the chunks and the time it take for the ST-Link to retrieve them.

On a side note I see that Init flash loader is always called, even in between chunks. Is it really necessary ?

Thanks

Kraal

Tesla DeLorean · ‎2020-04-25

This could all be done more efficiently, but it requires a lot of effort.

The chunk size is a function of the SRAM size, and how much is consumed by your loader.

The Init() calls likely relate to switching in/out of mapped mode. You might want to have your code determine if the clock and memory are already up, and expedite the startup.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Kraal · ‎2020-04-25

Hi @Community member

Thank you for the insights.

Regarding the init() calls, it is not really time consuming compared to the other tasks, so I can live with it. However, I believe the logic is flawed here since it is CubeProg that splices the amount of memory that I want to read into smaller chunks, so it should know that there won't be any other kind of operation in between. My knowledge is not deep enough so I might miss something here.

Regarding the memory-mapped mode, I think that a better implementation should be made. I understand that not all the memories / microcontroller have the possibility to use the memory-mapped mode, but maybe we can add a field in dev_inf.c/.h that says it is enabled. So the read operation can be done by the debugger directly without a limited RAM buffer in between.

Best regards,

Kraal