USB unexpected resets STM32F7

James Murray · ‎2018-05-01

Posted on May 01, 2018 at 23:55

I'm looking for any pointers here on where to start debugging.

I have a composite device based on example code that implements CDC device + FATfs and MSC device. The FATfs and MSC are not active at the same time. The CDC is a gateway to SPI. Processor is STM32F769I on the discovery board.

When a remote processor is powered up the MSC reports 'Media not present' and the FATfs is active. When the remote processor is off, the FATfs is unused and the MSC is active. That side appears to be working ok.

However, when the device is running and CDC <--> SPI is working ok I am intermittently getting port resets reported by the Linux kernel. When it reconnects, it is assigned a different /dev/ttyACM* number so the application loses connection.

I've traced this is Wireshark and the pattern of events seems very similar each time. I do not have a USB bus level sniffer to diagnose any deeper.

Endpoints:

EP0

EP1 CDC control

EP2 CDC data

EP3 MSC

Wireshark seems to be showing a failed IN transaction from the MSC endpoint, it shows a message such as 'protocol error' or 'connection reset by peer'

then there's usually a failed IN transaction from EP1 'cannot send after transport endpoint shutdown'

Then a number of failed INs saying 'No such file or directory'

Then the Host issues a port reset and enumeration completes and it works again for a bit. The processor is not resetting. USB_Init is not getting called. It is almost like the USB core gets stuck, but I don't know how to determine if that is true.

The incidence can be 5-15 minutes, it only seems to occur when CDC is active and MSC is in 'Media not present' mode. The failure always seems to be from the MSC endpoint.

I had hoped that changing the TXFIFO registers (per my other post) would resolve this, but no change.

I have been delayering the code as I found it very difficult to follow and it has a noticeable performance hit. I can now understand the code better and MSC speed has improved, but the random port reset problem is still there.

In debugging I have caught an instance where a TXFIFO was partially filled and the code wasn't hitting the TXFE interrupt, but I'm hoping that problem was caused by the TXFIFO address/size registers being scrambled.

Any suggestions on where to start are appreciated.

James

Tesla DeLorean · ‎2018-05-01

Posted on May 02, 2018 at 00:09

I wouldn't probe peripheral registers (FIFO and Data Registers particularly) in a Debug View, this certainly causes broken behaviour in USART and SDMMC.

Instrument your code so you can track flow and interaction via a USART or SWV rather than stopping or probing in the debugger which is likely to stall endpoints or cause the loss of realtime responsiveness.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

James Murray · ‎2018-05-01

Posted on May 02, 2018 at 00:32

Yes, good tip. I had earlier been using the LCD on the disco board, but any scrolling causes a problem. I've presently removed all of the LCD and touchscreen code from my project. Emitting some debug data via USART is easy enough.

What's I'm really looking for though are any pointers on where to start looking? What problem in my firmware is likely to cause the Linux kernel to reset the USB port (hub) ?

James

Tesla DeLorean · ‎2018-05-01

Posted on May 02, 2018 at 03:55

Hard to say, USB isn't much my thing these days, but I'd start with descriptors, and interactions with the end-point. On Windows I built filter drivers to track the requests/response on the USB stack at the IRP/URB level, and on the MSC the SCSI CDB and sense codes.

With Linux you have the source, so the reaction to errors and sense code, and how things are retried, and reset should be more apparent.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..