USB Bulk XACT Error

DrWidget · ‎2024-04-01

I'm creating a new USB device with a custom USB class on an STM32F779NIH. We have previously developed a fully operational device doing the same thing with USB full speed, now we're trying to do the same with USB highspeed using an external ULPI. I'm reasonably certain the issue is firmware not hardware.

The overall scheme is using bulk transfers to send data out at a periodic rate, say 1ms for example, so 8 microframes will elapse before data is sent back to a read request. The PC read function will blocks until it receives that data which creates a timing loop that turns out is quite resilient compared to Windows timers. Anyway, back to this device, when I run my user application, it will generate a read request for which the device will return a response after 1ms with a set of data which is correctly decoded. This is repeated 8 times, but on the 8th packet, my device returns no data and a status of 0xC0000011 (which is a USBD_STATUS_XACT_ERROR) according to WireShark:

According to page 34 of this PDF: https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/ehci-specification-for-usb.pdf it may mean:

Set to a one by the Host Controller during status update in the case where the host did not receive a valid
response from the device (Timeout, CRC, Bad PID, etc.). This bit may only be set for isochronous IN transactions.

However, 1) this definition is for isochronous, 2, the bit position doesn't align with the data in the image above, and 3) as far as I can tell, this is being set somewhere in the device firmware rather than from the host. This is the first time I'm using a ULPI, so I'm uncertain if perhaps it is generating this directly. I'm am, however, fairly certain this isn't a timeout issue (at least from the device side) as the PC blocks until it receives the data. I can run the device as fast as one packet per 125us, or slow it down to slow it down to 2ms and it always throws the error on the 8th packet.

The closest explanation I've found elsewhere was this link:

https://community.infineon.com/t5/Knowledge-Base-Articles/USBD-STATUS-XACT-ERROR-0xC0000011-on-high-bandwidth-isochronous-IN-endpoint/ta-p/248718

Which has the same hex error code, but again, it refers to isochronous rather than bulk which is throwing me.

If anyone can point me in the direction of what to look for it would be greatly appreciated!

DrWidget · ‎2024-04-09

Looks like that interrupt is already being used in the STM32F7xx_hal_pcd.c to call the PCD_WriteEmptyTxFifo. I did get a slightly cleaner result by putting my transmit routines inside the DataIn callback. Once a read occurs, then the DataIn callback gets called and I can trigger another packet to be sent without any overflows. Still trying to figure out all the layers of this USB stack...

View solution in original post

FBL · ‎2024-04-02

Hello @DrWidget

Do you reproduce on reference board? Could you send me the project in private message?

Wondering if it's customized class for bulk transfer why seeing in the trace URB FUNCTION_BULK_OR_INTERRUPT TRANSFER (0x0009). Could you reproduce on standard class? Also, could you specify you ULPI transceiver?

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

DrWidget · ‎2024-04-02

We are using the Microchip USB3300 PHY, but as this is a first prototype, we are open to changing devices if necessary. If you have a reference board you recommend, I'm happy to purchase it to try to reproduce my results there to rule out any hardware issues.

I certainly can send you the project, but it's rather large at the moment so probably better if I can strip it down to the core USB functionality first. Attempting to reproduce this with a standard class is also on the list of things to try, and I just purchased a hardware bus analyzer so hopefully I'll get a lower-level view in a couple days of what's going on.

In the mean time, I did make one other interesting discovery. I stripped down the PC side to just basic libUSB reads so even though there is a bus reset, it will try to keep reading. On the following plot, the upper trace is the device 2ms timer interrupt for sending data. The lower trace is the global USB interrupt. This is immediately following a bus reset:

Generally, these interrupts come in on the microframe (125us) as expected, but whenever data is sent out, we'll also have an interrupt for the next read request as well (not sure what the third interrupt is for):

Shortly before it starts to experience problems, we stop seeing an interrupt response to transferring the data:

Then we stop getting the 125us microframe interrupts. At this point, the data is still being received by the PC (I can correlate what is sent with Wire Shark) and I *think* I'm only getting the read requests, but something else has stopped (still investigating exactly what that is). The bus is reset again (the 125us interrupts come back on the right) and the whole process repeats until the next time an error occurs:

When I try to send data every microframe, it stalls for every 8 packets received, but as I slow it down, the number of packets between stalls becomes more variable.

DrWidget · ‎2024-04-03

I made some further progress with this and now I can confidently say the issue is firmware, not hardware. I'm mostly detailing this for the other people I've seen that have run into this error message, but I do still have a question at the bottom. :)

I know in USB that everything is driven from the host. For a device to transmit to the host, it first must first wait for a read packet to come in. The goal of our project is to stream real-time data to the PC so what I was attempting to do above was send out data at a regular interval by using a timer to take a snapshot of our data and putting it into a byte array and then passing to our transmit function:

USBD_StatusTypeDef  USBD_CUSTOM_TransmitD2H(USBD_HandleTypeDef *pdev, uint8_t* buf, uint16_t length, uint8_t interface)
{
  USBD_StatusTypeDef usb_status;
  pdev->ep_in[USB_I0_CUSTOM_EPD2H_ADDR & 0xFU].total_length = length;
	
  if (interface == 0)
  {
    usb_status = USBD_LL_Transmit(pdev, USB_I0_CUSTOM_EPD2H_ADDR, buf, length);
  }
  else
  {
    // handle other interfaces here
  }	
  return usb_status;
}

Digging further down into USBD_LL_Transmit to HAL_PCD_EP_Transmit to USB_EPStartXfer to USB_WritePacket it appears it directly copies the data into the Tx FIFO. However, all these functions always return HAL_OK and I don't see anything along the way that checks that the Tx FIFO actually has space to receive this. Until I dug into this, I had assumed if there was a failure, these functions would have made some indication so my code above USBD_CUSTOM_TransmitD2H would pause sending until space was available again.

As a temporary hack, I set a flag in HAL_PCD_IRQHandler (stm32f7xx_hal_pcd.c) on the transfer completed interrupt just before HAL_PCD_DataInStageCallback(hpcd, (uint8_t)epnum) and then my transmit function checks that flag. If it is true then I assume my last packet went out already so it's safe to send another (in which case I set the flag to false again). Since this is handling all the transmissions on my IN endpoint it is probably also getting set during the enumeration so when I run my PC code, it still gives me a single bus stall and reset, but after that first reset of the bus I can then run continuously for hundreds of thousands of packets with a packet going out on almost every microframe when I run the user code as a high-priority process on Windows. As a normal-priority process there are frequent millisecond gaps in the transmission of data, but it correctly throttles the transmission of data and no longer throws errors.

So in hindsight, I think (and someone please correct me if I'm wrong) that the error I saw on WireShark was effectively telling the PC that the FIFO was full. Normally if the PC was first requesting the data then it would be a failure on the PC side to read the data after it had requested it, but since this was more of a "push" approach, the PC simply wasn't keeping up and particularly when running as a normal-priority process a single delay from the OS running another process caused the FIFO to overfill quickly.

What I have now isn't a perfect solution as I still get a single stall that I would like to avoid. Is there a function, a variable, or a register that I can check to poll the to test if the Tx FIFO has space available for my packet?

FBL · ‎2024-04-04

Hi @DrWidget

Maybe if you configure TXFELVL to trigger the TXFE interrupt when the FIFO is completely empty, This way, you're only writing to the FIFO when you know there is space available, which should help avoid the stalls.

USB_OTG_GlobalTypeDef *USBx = hpcd->Instance;
USBx->GCCFG |= USB_OTG_GCCFG_TXFELVL;

// TXFE interrupt service routine
void OTG_TXFE_IRQHandler(void) {
    // Check which endpoint triggered the interrupt
    if (USBx_INEP(epnum)->DIEPINT & USB_OTG_DIEPINT_TXFE) {
        // Clear the TXFE interrupt
        USBx_INEP(epnum)->DIEPINT = USB_OTG_DIEPINT_TXFE;
        
        // Transmit data if there is data waiting to be sent
        TransmitDataIfAvailable();
    }
}

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

DrWidget · ‎2024-04-09

Looks like that interrupt is already being used in the STM32F7xx_hal_pcd.c to call the PCD_WriteEmptyTxFifo. I did get a slightly cleaner result by putting my transmit routines inside the DataIn callback. Once a read occurs, then the DataIn callback gets called and I can trigger another packet to be sent without any overflows. Still trying to figure out all the layers of this USB stack...