More USB Shenanigans : The Futility of EPENA.

PhucXDoan · ‎2024-07-06

This is a long one, so sit tight around the campfire, and let me tell you the tale of The Futility of EPENA.

Several days ago, I began to implement a Mass Storage interface for my own USB stack, equipped with a Bulk IN and OUT endpoint. The first thing I did was wait for the OTEPDIS interrupt ("OUT token received when endpoint disabled"); this lets me know that the host wants to send me a love letter, that is, a Command Block Wrapper. This OTEPDIS interrupt is a little strange, however, as RM0351 (Rev. 10) states that this interrupt only applies to Control OUT endpoints, but nonetheless seems to have effect for non-Control EPs (at least for the STM32L476RG).

Anyways, on this interrupt, I then configure the Bulk OUT endpoint to receive a single 8-byte packet in order to get the first 8-bytes of the 31-byte CBW. Of course, I could just have a 64-byte EP so it could be received in-all-one-go, but I wanted to ensure the flexibility of my own USB stack in being able to handle multi-packet transfers. After configuring OTG_DOEPTSIZx with the packet count (PKTCNT of 1) and byte count (XFRSIZ of 8), I set CNAK and EPENA for the Bulk OUT endpoint in OTG_DOEPCTLx.

Shortly thereafter, RXFLVL gets asserted, indicating there's a packet available to read, and it's indeed the 8-byte packet for the Bulk OUT endpoint as expected. Repeat this process a couple more times and I end up with the whole 31-byte CBW. It was all smooth sailing, and I was ready to clock out for the day, until I realized... I forgot to actually re-enabled the Bulk OUT endpoint after the first packet!

I became dumbfounded -- struck by a tempest of confusion! How is it possible for the whole transfer of 31-bytes to be done when the endpoint would've become disabled after receiving the first packet?

Well, after some consideration, I realized what should've happened is that the "OUT data transfer completed" pattern gets pushed into the RX-FIFO after reading in the first 8-byte packet. What ended up happening in reality was a series of "Data OUT packet" patterns (altogether making up the 31-byte CBW) and then finally the actual "OUT data transfer completed" word. But I only ever configured the endpoint for a single 8-byte packet, not this whole parade!

So I put my thinking cap on: okay, so what exactly are the values of PKTCNT and XFRSIZ of the OTG_DOEPTSIZx register after reading in each packet? Well, after the very first 8-byte packet, PKTCNT and XFRSIZ became zero (as expected), but according to the RM (pg. 1793):

The OUT data transfer completed pattern for an OUT endpoint is written to the receive FIFO on one of the following conditions:

– The transfer size is 0 and the packet count is 0

– The last OUT data packet written to the receive FIFO is a short packet (0 ≤ packet size < maximum packet size).

So the very next thing in the RX-FIFO should be the "OUT data transfer completed" pattern, right? Well, as I already said, this isn't the case; the Bulk OUT endpoint continues on receiving more data from the host and it is only at the last packet (a short packet in fact) that the pattern actually gets pushed into the RX-FIFO. As for what happens to the PKTCNT and XFRSIZ fields, PKTCNT stays at 0 while XFRSIZ underflows.

Of course, I could be wrong. I never happen to really run into anything that'd straight up contradict the RM like this -- usually it's some awkward wording that was a little ambiguous -- but from what I'm witnessing, this is not what is happening at all in my own painstakingly handwritten code!

To make matters worse, I decided to do something strange: what if I never configure and enable the Bulk OUT endpoint for the transfer at all? Would I still receive data packets in the RX-FIFO? As it turns out: yes!

So what gives?

After some tinkering, I came to the conclusion that the whole shebang of configuring the OTG_DOEPTSIZx and related registers for OUT transfers is completely baloney(?). The only actually important thing that needs to be done is clearing the NAK status of the OUT endpoint via CNAK. In other words, EPENA doesn't seem to be what determines whether or not the OUT endpoint receive packets.

I verified this by replacing all procedure calls that would've configured and enabled the OUT endpoint for transfers into a single line that just sets the CNAK bit, and things seem to run all perfectly fine. No issues with enumeration or CDC side of things. So it seems like the whole operational procedure of configuring OUT transfers can just be skipped...

Now it should be noted that this is just what I'm observing on the STM32L476RG. I don't have my hands on other MCUs with the USB OTG FS core, so I'm confident that things will be different due to different configurations of the Synopsys IP core, and that even varies within the same line of MCUs. This probably explains why the RM is the way it is: it's just trying to cover all bases of the weird quirks of the USB OTG FS core.

Regardless, I thought I'd share these findings for any masochists out there who wants to write their own stack. If I continue on working on the BOT interface and find out there's an important detail I'm misunderstanding (like the EPENA bit is actually important in this very specific edge case), then I'll make an update post with that information.

PhucXDoan · ‎2024-07-12

I believe I've figured out the puzzle, and it's quite multi-faceted.

First, I believe I made an error in my original setup, since I can't reproduce it now. I likely accidentally configured the wrong endpoint, that is, setting the PKTCNT and XFRSIZ for a completely different endpoint. What consequently happened because of this made things more confusing, however.

See, once an endpoint becomes activated (upon SET_CONFIGURATION), the endpoint is "not enabled" (EPENA off), but this doesn't necessarily imply that it's ignoring incoming OUT packets. Since the endpoint comes in activated, but not in NAK mode, it ACKs any received packets and pushes it into the RX-FIFO.

But since PKTCNT and XFRSIZ is zero on initialization and thus activation, how could the core be possibly be thinking that it's okay to accept any OUT packets?

Well, here's the kicker: the PKTCNT and XFRSIZ fields is decremented first and then gets checked whether or not it's zero. If it is, then EPENA get shut off and the endpoint goes into NAK mode. This also happens if a short packet is received.

Since the endpoint becomes activated, is not in NAK mode, and has PKTCNT=0 and XFRSIZ=0, once the core receives an OUT packet for that OUT endpoint, the fields underflow and thus become not zero! I happenstance to allude this in the original post.

So in my situation, I was handling the Mass Storage transactions mostly fine all because of the fact that the CBWs are 31-bytes long, which always results in a short packet scenario. This egregious handling of OUT transfers only becomes an issue once PKTCNT becomes zero, to which the core stops accepting any more packets (although there may be some still left in the RX-FIFO buffered up). When this happens, the "OUT transfer completed" word gets shoved into the RX-FIFO (after the last buffered packet), which will be quite unexpected if this all manages to happen in the middle of a sector write, which is how I managed to figure all of this out.

Regardless, it seems like there's only really two fields that matter: CNAK and PKTCNT. The former puts the OUT endpoint out of NAK mode and the latter indicates how many OUT packets the core should ACK before going into NAK mode (unless prematurely ended by a short packet). The EPENA and XFRSIZ fields still doesn't seem to actually do anything important. I'm guessing XFRSIZ is useful for determining the difference between the expected amount of data and the actual amount of data received, but it's redundant given that each packet comes with its own BCNT field indicating the length anyways. Furthermore, with a quick test, it seems like XFRSIZ turning zero while PKTCNT is non-zero doesn't stop the core from pushing more packets into the RX-FIFO.

Likewise, EPENA doesn't seem to do much to "start" the OUT endpoint or indicate its enabledness; CNAK and NAKSTS seems better suited for doing those things, respectively.

To circumvent this nasty underflow issue, I believe all I have to do is put the endpoints into NAK mode (via SNAK) once I activate them on SET_CONFIGURATION. From there on out, I can set PKTCNT and CNAK and then handle things more reliably. Although I'll probably still configure XFRSIZ and EPENA for completeness' sake.

View solution in original post

waclawek.jan · ‎2024-07-06

Thanks for sharing this.

The Synopsys OTG module is quirky and has a lot of historical and configuration layers (even those which are not actually enabled in the STM32, do influence the design and behaviour, e.g. absence/presence/exact type of DMA used. There is complex signaling going on between USB-facing "core", the RAM/FIFO/(DMA/AHB-master) interface, and the processor-facing logic. And the documentation is poor, to put it mildly.

Let me make a guesses: wMaxPacketSize in your endpoint descriptor for given Bulk OUT endpoint is set to 64, isn't it? And DOEPCT.MPSIZ is set to 8?

So, what IMO happens here is, that the host sent a single 31-byte packet and the USB Core received it as such, pushed all of it (i.e. 8 words) to FIFO plus pushed one "end of packet" marker; it then sent through the GRXSTSP mailbox messages to pop data of DOEPCT.MPSIZ size until your program popped all the data, and at that point the "end of packet" marker caused the "transfer completed" message to be sent through the mailbox.

It might've been differently, I am not an insider and also I don't know your particular setup. Details matter a lot.

IMO EPENA behaviour changed between versions 2.x and 3.x of the OTG, I faintly recall seeing radically different endpoint enable/disable procedures. (I also had quite some fight when I needed to disable endpoint in 'F407 (v2.81a) and IIRC it was simple with the 'F446 (v3.20a). But it was a couple of years ago and I don't remember the details and don't wish to revisit it either.) So the discrepancy between documentation and reality might be of historical origin. But again I am not insider, and I gave up trying to support multiple versions of the OTG due to the crappy documentation and significant undocumented differences/features. You might've seen some of my frustrated posts here.

If you mean USB seriously, I recommend to get a (possibly cheap, full-speed-only - maybe also a full-speed-only hub (which is very nontrivial to find)) protocol analyzer (some LA and oscilloscopes can analyze USB waveforms, too). Seeing things happening on the bus is an eye opener, and facilitates debugging greatly.

JW

PhucXDoan · ‎2024-07-06

wMaxPacketSize is set appropriately, actually, but it's still possible something funny is happening on the bus that isn't revealed in WireShark side of things, or I really did configure things weird and it have been miraculously working just fine. I'd love to get an analyzer, but seems like the Beagle goes nearly $500, which isn't exactly cheap for a hobbyist like me. It's fun to play detective with the limited tools I have though.

FBL · ‎2024-07-08

Hi @PhucXDoan

Thank you for sharing

Would it be possible to share the traces in WireShark? with a screenshot of your findings correlated to expected behavior. We have already forwarded your request to our expert. As soon as I get a response, I will share it with you.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

Grant Bt · ‎2024-07-08

Get a used one. I have several, including two of these:

Ellisys USB Tracker Model 110b

https://www.ebay.com/itm/204708517420

Edit: check to make sure that it supports the speed you are operating at.

PhucXDoan · ‎2024-07-08

Not sure how traces in WireShark will help, as it's pretty much identical to what would happen if I had configured the OUT transfer properly as described in the RM ("Generic non-isochronous OUT data transfers"). What I'd expect to happen is for the OUT endpoint to not actually receive any of the packets, since I never explicitly set EPENA to enable it. Nonetheless, setting just CNAK is sufficient in having the OUT endpoint receive and push data into the RX-FIFO, thus making configuring OTG_DOEPTSIZx and such registers moot.

I think in the end I'll open-source my USB stack and revisit this thread (and perhaps create a minimal example) of this situation.

PhucXDoan · ‎2024-07-12