cancel
Showing results for 
Search instead for 
Did you mean: 

USB MSC device low transfer rate - STM32F7

charles239955
Associate II
Posted on May 18, 2016 at 16:19

Hi All,

I am using the latest STM32 Cube distro with an STM32F7x DISCOVERY board for hardware (class 10 uSD card in the slot) and IAR Kickstart development environment.

The USB MSC Device project (high speed port through ULPI) works/builds well. However... the file transfer rate (writing from PC to the device) is very low (~350kBps).

Debugging the code showed that on line 351 of usbd_conf.c, DMA is disabled for the USB HS port. I enabled DMA at this point (changed the 0 to a 1) but there is still no improvement.

There are a couple of other posts on this subject already (

https://my.st.com//public/STe2ecommunities/mcu/Lists/STM32Java/USB%20MSC%20low%20transfer%20speed

 and 

https://my.st.com//public/STe2ecommunities/mcu/Lists/STM32Java/Slow%20transfer%20rate%20USB-HS%20MSC

) but no advice available.

Am I missing something obvious? I am expecting too much performance from the hardware?

Thanks in advance.

#device #usb #msc
11 REPLIES 11
Posted on May 18, 2016 at 16:54

That's basically citing the posts of one guy using an F4 and who said it got 3MBps at one point. I'm not sure how representative that is of the F7 implementation.

The critical paths tend to be the use of large consecutive transfers, and streamlining for such. Check also optimal clocking and buffering.

I'm not convinced that HAL/Cube is the way to get efficient and high performance USB code, you might want to review commercial USB stacks.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
charles239955
Associate II
Posted on May 18, 2016 at 17:52

Hi clive1

I was surprised to see no change in performance after enabling DMA... which to me suggested that: a) I hadn't actually enabled, or b) the issue/bottleneck is somewhere else.

One of the cited posts stated that they achieved ~3MBps with an F4. I think that should be achievable with an F7 running at 215MHz.

I have reviewed the clock setup and it looks OK. The SD interface is using DMA.

Interested in any other thoughts you have.

Posted on May 18, 2016 at 19:40

I'd agree with a) and b), you'd need to mine deeper into the code to ensure it is using DMA on both the SDIO and USB interfaces.

You could look at the writes being dispatched to the card. The command overhead is very high, you're talking to a card with the brains of an 8051 doing caretaker work of some high speed data transfer buffers, the larger the transfers you can do on each interaction at the command level the more efficient things will be.

Yes 350KBps is a bit disappointing, I can do 600-700 KBps on a USB-FS connection on an F4.

The upper ceiling is going to be the sustained write speed of the card itself, bus clock/width being critical on the STM32 side, and how effectively you can pipeline the data between the USB and SDIO interfaces.

I'd recommend you benchmark the SDIO write in the code as generated, ie wrap FatFs around it and write several MB with 32KB blocks. This should demonstrably be several MBps, 5-6 MBps might be reasonable for a quality card, not all Class 10 cards work alike.
Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
charles239955
Associate II
Posted on May 18, 2016 at 20:38

Thanks for the info about the F4 speed - very interesting. I really think that the F7 should be able to at least match that :).

I think I'll look at a) first, followed by some benchmarking of the SD card writes to make sure that there's nothing strange happening there.

I've worked with STM32F1/STM32F2 micros and USB on the high speed port, and DMA makes a massive difference. Even when using the built-in PHY on the HS port at FS, DMA increases throughput a lot.

Posted on May 18, 2016 at 21:01

I really think that the F7 should be able to at least match that :).

You'd hope, but you're using a whole other library, and maybe they ''improved'' the SDIO peripheral. I like my Cortex-M7 with an FPU-D, so really have expended any resources on the F7 to this point, but the new reboot of the part looks more promising.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
tsuneo
Senior
Posted on May 18, 2016 at 21:13

Hi chris250 and clive1,

With quick look in MSC_Standalone example, (\STM32Cube_FW_F7_V1.3.0\Projects\STM32746G-Discovery\Applications\USB_Device\MSC_Standalone) 1) SD card Read/Write size This example handles read/write of SD card sector by sector, because MSC_MEDIA_PACKET macro is set to the sector (block) size.

usbd_conf.h
// line 49
#define MSC_MEDIA_PACKET 512

You may increase this figure into 512 x N (N = 2, 3, 4, ...), so that the stack could read/write multiple sectors at a time. It reduces overhead of sector read/write of SD card. For example, set it to the cluster size of FAT file system, #define MSC_MEDIA_PACKET 4096 2) Heap size A buffer of MSC_MEDIA_PACKET is taken on the heap. If you would increase this macro value, increase the heap size as much. On EWARM IDE, heap size appears on the linker option. Tsuneo
Posted on May 18, 2016 at 22:31

It reduces overhead of sector read/write of SD card.

One sector would certainly be brutal on performance. Thank you for your insight.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
charles239955
Associate II
Posted on May 19, 2016 at 12:29

Tsuneo/clive1 - many thanks for your knowledge.

I have been doing some debugging this morning. Changing the MSC_MEDIA_PACKET define to 32768 (and the linker config to suit) has improved the performance from 350kbps to ~5.5MBps.

I have found that the MSC device does not enumerate correctly if the HS-USB DMA is enabled, so the above figures are with HS-USB DMA disabled. There is a warning in the code about enabling HS-USB DMA for this project:

usbd_conf.c, line ~346:

/* Be aware that enabling DMA mode will result in data being sent only by

  multiple of 4 packet sizes. This is due to the fact that USB DMA does

  not allow sending data from non word-aligned addresses.

  For this specific application, it is advised to not enable this option

  unless required. */

I'm not totally sure what this means - I assume ''packet'' means USB packet. I would have thought that all the data being DMA'ed would be aligned but clearly I am mistaken.

74gnas
Associate
Posted on September 22, 2016 at 11:07

Considering only the USB channel, it is possible to achieve  ~15Mbytes/sec (or ~125Mbit/sec) with the STM32 F7 Discovery, adopting the HS USB CDC class with DMA enabled (external SRAM to USB dma transfer) and the provided ST USB stack. So I do not think the bottle neck is on the USB side.

Regards