STLINK-V3MINIE - Feature: Pipelined Data Transfers

Question

@ ST engineers: This is a feature request for the firmware of the STLINK-V3MINIE and related debuggers/programmers: pipelined data transfers over the existing USB High-Speed link.

In my pursuit to greatly improve the STLINK-V3MINIE's performance for large continuous reads from MCU memory, I discovered there is huge room for improvement: SWD's full potential (~2038 KB/s = 24M [cycles/s] * 4 [bytes/transfer] / (46 [cycles/transfer])) is not yet used at all. (Note that USB shouldn't be the bottleneck, judging from the maximum clock speed the STLINK-V3MINIE's STM32F723's USB IP can support.) I only achieve ~549 KB/s when using an STM32H7S3 MCU with ST's programming SW (v2.20.0) using STLINK-V3MINIE (FW V3J17M10):

STM32_Programmer_CLI --connect port=SWD freq=24000 ap=1 mode=HOTPLUG -halt --read 0x24000000 0x60000 /tmp/dmp.bin

A simplified sequence diagram of the communication used here is shown:

I measured the SWD clock line with an oscilloscope to verify 24 Mhz is reached, and I can confirm this is the case, so SWD should only take ~491 us (= 1 KB / ~2038 [KB/s]).
But when looking at the USB timings, the time from JTAG_READMEM_32BIT command to 1024-byte data arrival is ~1642 us (of which only ~40 us for the sending of the command). So it seems like most of the time (~1111 us = ~1642 us - ~40 us - ~491 us) is not usefully spent, considering the USB data rate is much faster than the SWD data rate.

In order to improve performance, I tried dividing the buffer read into bigger chunks (6144 instead of 1024 bytes) and leaving out the status queries:

Read memory region; use bigger chunks, leave out status

and this did improve performance a bit (~633 instead of ~549 KB/s), but there still seems to be a lot of time uselessly spent: the time from JTAG_READMEM_32BIT command to 6144-byte data arrival is ~9340 us (of which only ~40 us for the sending of the command), which is again much higher than the time SWD communication should take: ~2944 us (= 6144 bytes / ~2038 [KB/s]).

Therefore I wanted to go one step further, and enable pipelined transfers: queue multiple concurrent read requests such that all the time that was previously spend doing stuff other than SWD communication (i.e., preparation, USB communication, and finalization) can be done in parallel with SWD communication. I figured two concurrent read requests would be a good starting point. Here is how the communication would look like:

When I tested this, after having sent two commands, reading the data timed out. Likely this is because the STLINK-V3MINIE's firmware doesn't support two concurrent read requests. It seems like the firmware is protected with an authentication header, so I wasn't able to easily patch the firmware for this new feature. Therefore I would like to ask ST whether they can implement such feature?

Thanks a lot!

S C · Accepted Answer

Hello,Thank you for your effort and suggestion;I’m aware there is a room for improvement regarding the maximal performance of the ST-Link at SWD COM level. First Iwould like to focus about the context of a read command on STM32: currently the maximal absolute gain would apply to a flash full read back (4MB max currently on STM32), after a programming phase. Transfers in debug context are much smaller. Without entering into the details, it might explain why the effort has not already been done. Anyway, I share my analysis of the topic, as it differs from yours: your theoretical computing would give something like 1,92µs for a 32bits word. While if you measured SWCLK with an oscilloscope, you could have seen that a word duration is rather around 4µs, and followed by 1,75µs of silence ! So if we want to improve, the work will reside here (at SWD COM implementation level only). I don’t speak about the WAIT state and ReadOk check which also impact a little bit the whole frame compared to the theoretical computing.As you said, the USB is not the bottleneck; currently the USB transfer and SWD transferare done sequentially; another improvement would be to parallelize both flows (as yousuggest), but this would not give the results you currently expect(the USB will anyway at a time, wait for data from the SWD; the maximal gain we can expect is the USB transfer duration, optimal with ST-Linkas you saw with 6144-bytes transfers. I measured 380µs with USB analyzer for such a transfer (note that it is also far from 480Mbits/s ...). The other latency you could see at USB level is the waiting for end of SWD transaction).So as a conclusion I would say that your suggestion is not the solution for greater performance, andwith ST-link I’m afraid you have to cope with the performance of the official firmware. Depending on the context, the overall performance of a tool might also be improved by reducing the SWD traffic (sometimes it’s possible), once you identified thatSWD is the bottleneck. The improvement of SWD performance by ST-Link is one of the background tasks depending on the priority assignment...Best regards

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded