2016-05-18 07:19 AM
Hi All,
I am using the latest STM32 Cube distro with an STM32F7x DISCOVERY board for hardware (class 10 uSD card in the slot) and IAR Kickstart development environment.The USB MSC Device project (high speed port through ULPI) works/builds well. However... the file transfer rate (writing from PC to the device) is very low (~350kBps).Debugging the code showed that on line 351 of usbd_conf.c, DMA is disabled for the USB HS port. I enabled DMA at this point (changed the 0 to a 1) but there is still no improvement.There are a couple of other posts on this subject already (https://my.st.com//public/STe2ecommunities/mcu/Lists/STM32Java/USB%20MSC%20low%20transfer%20speed
andhttps://my.st.com//public/STe2ecommunities/mcu/Lists/STM32Java/Slow%20transfer%20rate%20USB-HS%20MSC
) but no advice available.Am I missing something obvious? I am expecting too much performance from the hardware?Thanks in advance. #device #usb #msc2016-05-18 07:54 AM
That's basically citing the posts of one guy using an F4 and who said it got 3MBps at one point. I'm not sure how representative that is of the F7 implementation.
The critical paths tend to be the use of large consecutive transfers, and streamlining for such. Check also optimal clocking and buffering.I'm not convinced that HAL/Cube is the way to get efficient and high performance USB code, you might want to review commercial USB stacks.2016-05-18 08:52 AM
Hi clive1
I was surprised to see no change in performance after enabling DMA... which to me suggested that: a) I hadn't actually enabled, or b) the issue/bottleneck is somewhere else.One of the cited posts stated that they achieved ~3MBps with an F4. I think that should be achievable with an F7 running at 215MHz.I have reviewed the clock setup and it looks OK. The SD interface is using DMA.Interested in any other thoughts you have.2016-05-18 10:40 AM
I'd agree with a) and b), you'd need to mine deeper into the code to ensure it is using DMA on both the SDIO and USB interfaces.
You could look at the writes being dispatched to the card. The command overhead is very high, you're talking to a card with the brains of an 8051 doing caretaker work of some high speed data transfer buffers, the larger the transfers you can do on each interaction at the command level the more efficient things will be.Yes 350KBps is a bit disappointing, I can do 600-700 KBps on a USB-FS connection on an F4.The upper ceiling is going to be the sustained write speed of the card itself, bus clock/width being critical on the STM32 side, and how effectively you can pipeline the data between the USB and SDIO interfaces.I'd recommend you benchmark the SDIO write in the code as generated, ie wrap FatFs around it and write several MB with 32KB blocks. This should demonstrably be several MBps, 5-6 MBps might be reasonable for a quality card, not all Class 10 cards work alike.2016-05-18 11:38 AM
Thanks for the info about the F4 speed - very interesting. I really think that the F7 should be able to at least match that :).
I think I'll look at a) first, followed by some benchmarking of the SD card writes to make sure that there's nothing strange happening there.I've worked with STM32F1/STM32F2 micros and USB on the high speed port, and DMA makes a massive difference. Even when using the built-in PHY on the HS port at FS, DMA increases throughput a lot.2016-05-18 12:01 PM
I really think that the F7 should be able to at least match that :).
You'd hope, but you're using a whole other library, and maybe they ''improved'' the SDIO peripheral. I like my Cortex-M7 with an FPU-D, so really have expended any resources on the F7 to this point, but the new reboot of the part looks more promising.2016-05-18 12:13 PM
Hi chris250 and clive1,
With quick look in MSC_Standalone example, (\STM32Cube_FW_F7_V1.3.0\Projects\STM32746G-Discovery\Applications\USB_Device\MSC_Standalone) 1) SD card Read/Write size This example handles read/write of SD card sector by sector, because MSC_MEDIA_PACKET macro is set to the sector (block) size.usbd_conf.h
// line 49
#define MSC_MEDIA_PACKET 512
You may increase this figure into 512 x N (N = 2, 3, 4, ...), so that the stack could read/write multiple sectors at a time. It reduces overhead of sector read/write of SD card.
For example, set it to the cluster size of FAT file system,
#define MSC_MEDIA_PACKET 4096
2) Heap size
A buffer of MSC_MEDIA_PACKET is taken on the heap. If you would increase this macro value, increase the heap size as much. On EWARM IDE, heap size appears on the linker option.
Tsuneo
2016-05-18 01:31 PM
It reduces overhead of sector read/write of SD card.
One sector would certainly be brutal on performance. Thank you for your insight.2016-05-19 03:29 AM
Tsuneo/clive1 - many thanks for your knowledge.
I have been doing some debugging this morning. Changing the MSC_MEDIA_PACKET define to 32768 (and the linker config to suit) has improved the performance from 350kbps to ~5.5MBps.I have found that the MSC device does not enumerate correctly if the HS-USB DMA is enabled, so the above figures are with HS-USB DMA disabled. There is a warning in the code about enabling HS-USB DMA for this project:usbd_conf.c, line ~346:
/* Be aware that enabling DMA mode will result in data being sent only by multiple of 4 packet sizes. This is due to the fact that USB DMA does not allow sending data from non word-aligned addresses. For this specific application, it is advised to not enable this option unless required. */I'm not totally sure what this means - I assume ''packet'' means USB packet. I would have thought that all the data being DMA'ed would be aligned but clearly I am mistaken.2016-09-22 02:07 AM
Considering only the USB channel, it is possible to achieve ~15Mbytes/sec (or ~125Mbit/sec) with the STM32 F7 Discovery, adopting the HS USB CDC class with DMA enabled (external SRAM to USB dma transfer) and the provided ST USB stack. So I do not think the bottle neck is on the USB side.
Regards