cancel
Showing results for 
Search instead for 
Did you mean: 

32F417: CDC_Receive_FS() or USBD_CDC_ReceivePacket() - any known bugs?

PHolt.1
Senior III

I am using USB for both MSC (removable drive) and CDC (virtual COM port).

MSC works perfectly.

CDC out has been working perfectly for ~2 years.

CDC in had never really been tested, and ~ 1 byte every 20 is corrupted.

I am running CDC to Teraterm on the PC, and the out mode has been used for debugs.

The in mode gets activated of you press keys on the keyboard.

I have a simple RTOS task which copies in data to output. That is where I see the corruption.

It is entirely fixable by disabling certain RTOS tasks! But the ones which "fix" it don't do anything related to USB. I am thus suspecting there is a critical region which is unprotected and RTOS related task switching messes it up.

These functions should be well known because it is generated by Cube MX. I don't use MX (the code generated tends to work but is bloated junk and often works by luck because the cloated code covers up timing issues) but it was used when this project was being created ~ 3 years ago by someone else.

24 REPLIES 24
PHolt.1
Senior III

After a lot more work I narrowed the data corruption down to a DMA driving SPI3, and I was yielding to the RTOS (with taskyield()) while waiting for DMA transfer to complete. Removing this yield fixes the issue.

This is completely weird. AIUI, USB FS does not use DMA. Only USB HS does. FS uses interrupts, but trying the highest priority for USB makes no difference (it is normally set below the timers for a specific reason: I need minimum latency for one timer ISR, and USB "suffers" regular interrupts as any connected PC checks the MSC removable drive at 1Hz; this reads a sector from the filesystem which uses DMA + SPI2 and takes ~200us).

I am ok with this crappy fix (the RTOS yield is hardly needed since the DMA transfers are all very fast) but wonder what the reason might be.

> I am running CDC to Teraterm on the PC, and the out mode has been used for debugs.

I am not interested in Cube nor RTOS, but you probably have the terminology wrong - USB is Host-centric:

0693W00000SuDXCQA3.pngJW

PHolt.1
Senior III

Sure; I had it backwards. The corruption is on the PC -> target direction.

I have found some interesting things, like the 1st byte is always corrupted (usually comes in as 0x00). I may be able to track that down.

It's always a good idea to start with a minimal code.

JW

PHolt.1
Senior III

Not possible for me to debug this. This is the chain involved just in receiving the first (corrupted) byte, and nothing is documented:

0693W00000SuE8DQAV.png

The Cube USB Device driver is documented in UM1734. Of course raw Cube functions are documented in your Cube's UM, I am lazy to look up its number. And some of those functions are even yours, even if you don't think of them as such (e.g. HAL_PCD_DataOutStageCallback()).

Most of those functions do nothing, but increment a counter and pass a pointer. As you appear to catch the error at the end of the chain, you may want to instrument all functions in that chain to leave a trace.

Make sure you don't do the error at the transmitting side. Try a different (simpler) terminal emulator. Bus analyzers are very useful in debugging USB issues.

JW

Two more things:

-- the OTG hardware is complex. One of the things it does is, that it first throws interrupt where software is supposed to pull data out of its FIFO, and then it throws interrupt which means "all data pulled, now act upon it". The reason is the existence of embedded DMA in the OTG module even if it's not present in its OTG_FS incarnation, I am not going into details here. What you've posted above is consequence of the second interrupt, but if there are invalid data for any reason, its read in the first interrupt. Look for USB_ReadPacket() (unless it changed from what I've analysed, CubeF4 v1.21). You definitively need to instrument it to be able to debug this, as all data for all endpoints go through that function. Luckily you deal with one very particular endpoint so that's your starting point.

-- the OTG hardware is complex, thus quirky. One of the quirks is, that you must not access the FIFO for two endpoints simultaneously (intertwined). You push or pull all data for one transaction for one endpoint, then all data for one transaction for other endpoint. The common risk is to push data in main and simultaneously pull because of the USB interrupt. That results of unpredictible behaviour. If you need to push data to FIFO in main, think about disabling the USB interrupt or something similar. I have no idea how or whether this is dealt with in Cube at all, and I am also not interested at all. And you using RTOS may this make worse, I don't know, I don't use RTOS and don't know and don't want to know anything about them. This may not be your problem.

JW

PHolt.1
Senior III

Thank you Jan as always.

I am not using OTG (what I call "USB Controller"). This was specifically not used because customers expect to be able to plug in all kinds of stuff and expect it to work, which causes problems, and not just with power consumption of the said device. It might have been nice to support a flash stick but then the FatFS file system would need extending to support the new logical drive.

This USB is "Slave only".

This is the function

0693W00000SuGDtQAN.png 

I can see this reads the (64 byte?) USB FIFO into a buffer. However, it is getting called all the time, possibly because it does both MSC and CDC, and Windows is polling the MSC device at about 1Hz all the time.

This is the debug stack before the breakpoint gets hit

0693W00000SuGEmQAN.pngThere are only two lines in the project where usb read packet is called and this shows line 1328

0693W00000SuGFfQAN.png I will try to get the content of that buffer when the corrupt data is seen.

PHolt.1
Senior III

Hard to get very far due to the constant activity on those functions (apparently, as I say, due to MSC activity).

However, working on the fact that the 1st recd byte is always corrupted, I've been trying to set up code which one can breakpoint on. Unfortunately the corrupted bytes are variable, and can exist in other packets.

So I went to look whether one can disable interrupts during output (either USB, or all using the "critical" macro). This is the function used to transmit

0693W00000SuI2sQAF.png 

and USBD_LL_Transmit is not easy because it is called constantly by MSC

0693W00000SuI4KQAV.pngI think if I put in code in this one which I can breakpoint on if it gets called with a non MSC packet, that might do it, but I don't know how.

This is pdev for MSC:

0693W00000SuI4yQAF.pngI suspect the id=0 refers to the single USB port.

I managed to get a breakpoint on size=1 (unlikely with MSC, but can happen) but the stack trace sequence is again very long and appears to be interrupt-driven, even for TX.

Anyway I tried some simple things, like a flag in USB_ReadPacket (which reads the FIFO) which is set around that, and waiting on that flag in USB_WritePacket, but that doesn't help.

I wonder if that ST code was ever supposed to work in both directions? In my test code the two paths may be occuring very close in time, so I put in a 100ms delay, but it doesn't help.

Fortunately USB VCP input is not a core feature of the product (output is, for debugs) so I am ok to abandon it. There is one place I want to use it (selecting factory test menus) but there I can just dump non ascii chars, and most of the corrupted chars are in that category. Also the % corruption is highly sensitive to what other tasks are running, especially SPI3 DMA, and most of these are not needed during factory test.

I also wondered about interrupt priorities. Currently the USB int is below timer ints (because I have some very short timer ISRs which are critical on latency) but moving it to the top makes no difference.

Also USB MSC is rock solid.