2010-09-29 08:39 PM
How to handle too much USB receive data on connectivity devices
2011-05-17 05:09 AM
Hi duncan,
I too am using a CL (`105) for USB comms, and have had to face similar problems (by the way, I’m assuming that you’re using STM32_USB-FS-Device_Lib_V3.2.1 (circa 07/05/2010)?).My project is a commercial one, using CDC (aka Virtual COM Port) profile, and I was not happy with the standard Windows driver – throughput seems to be poor – so we’ve plumped for using the Thesycon driver – not cheap!
Whilst this now appears to allow up to the full theoretical bandwidth for a Bulk Transfer (which is what a CDC profile is using underneath) - ~2..3 Mbps - this brings its own problems with ST’s stack! This includes us finding a bug regarding Zero Length Packets (and a cure – ST have been made aware of it, and promise that it will go in the next library update, real soon now!).
Anyway, back to your exact question.
I started integrating USB into our existing project by using ST’s VCP demo, then modifying it as needed.
They service each incoming packet in situ within EP3_ OUT_Callback() – they read it into a local buffer, USB_Rx_Buffer, by the call to USB_SIL_Read(), with USB_Rx_Cnt being set to the number of bytes received (up to the maximum of 64). As far as I can see, this is the least that the callback has to do, otherwise the USB hardware will remain tied up indefinitely, with further packets from the host being NAKed.
They then (somewhat simplistically) send it (the whole contents of their local buffer) to the UART, via a call to USB_To_USART_Send_Data(). As they put it, “USB data will be immediately processed, this allow next USB traffic beeing NAKed till the end of the USART Xfet�? [sic].
The trouble is, this is a blocking function, and will ‘baby-sit’ the USART until the buffer has emptied, but remember, this call is within a callback, that itself has been (indirectly) called from a USB ISR, so effectively it’s still inside an interrupt – it maxes-out the CPU! (this won’t be noticed if only typing data into a terminal emulator though).
I changed this to still immediately transfer the whole of the latest packet within the callback (in order to free up the hardware ASAP), but to my own (larger) FIFO (circular buffer). Subsequently, foreground code then ‘consumes’ the FIFO's contents at its own pace – I don’t even need a separate flag – the head and tail pointers of the FIFO are just inspected, and if they’re different, then there’s FIFO contents to consume. Clearly, there’s a separate issue of average ‘producer’ and ‘consumer’ rates being matched, otherwise the FIFO will overflow, but our specific higher-level protocol sorts that out – we never ‘stream’ data continuously, so the FIFO's size is chosen according to our protocol's 'blockiness'.
This appears to be more in line with what you want (and should apply regardless of what USB profile you’re using – the technique of calling USB_SIL_Read() within the callback should be universal?).
NB whilst you might be tempted to call USB_SIL_Read() from foreground code, I think that this might be dangerous – ST’s stack doesn’t seem to be ‘robust’! At least calling it from within the callback guarantees that it’s directly in response to a hardware interrupt, rather than some asynchronous event in your code.
Regarding the USB stack’s robustness (or lack of it!), I have a separate discussion about how slowly one can clock a STM32 and still get correct enumeration – theoretically, as long as SYSCLK is either 72 or 48 MHz, it should work (no matter what (lower) frequency HCLK is). However, I’ve discovered that anything less than 72 MHz (for both of them) is ‘flaky’. I cannot prove it, but I think that there may be a race condition within their stack which, when combined with Windows’ variable latency, causes hit and miss behaviour. Alas, to date, no-one has replied. Out of interest, what clock frequencies are you running your CL at?
Also, I have similar problems to you (with “interrupt crap storms�?) when sending data to a PC with an app that’s ‘not there anymore’. I await leppie’s update!
I hope this helps.
Regards, Mike.2011-05-17 05:09 AM
Thanks for your comprehensive reply. I am basically doing what you are but USB to CAN. I basically do exactly as you say and read data in the callback and put it into a big fifo. I do have a requirement to stall the USB when my fifo is full. The data rate on CAN is much slower than USB, so almost all large write to the device get stalled waiting to make it onto the CAN bus.
I have been stepping through the ST library today looking how it works. I see from the ref manual that the EP is disabled in HW upon a rx complete. It remains disabled until a call to USB_Sil_Write (one of its sub functions). I am also quite bothered by the fact that USB_Sil_Read actually sets the interrupt flag, reenabling the endpoint.
I am less than impressed with this library so far, its poor documentation and over abreviated code! I see no provision in it to handle data only at the rate you can consume it without doing all the work under interrupt context.
I have no choice but to work out a way to do this. The answer lies in when the EP is disabled and reenabled. I am also worried that if I don't read from the EP during the call back that another EP will have an rx and overwrite the internal USBD_Data_Buffer that needs to remain intact until I can use it.
Can you go into more detail about exactly how you deal with data arriving on USB faster than you can send it? Eventually your rx buffer must fill and require you to hold off USB receive until you can deal with it again...
Thanks again,
Ashley
2011-05-17 05:09 AM
OK, I have this dialed now!!! Firstly I just want to thank ST for making the most amazingly unintuitive USB stack, but in saying that it does actually work. Here is how to handle USB bulk transfer data on a connectivity device where the USB data reception rate must be limited to prevent internal buffer overruns... The USB FS stack does not allow for this internally but they have given all the tools to do it. The problem with the stack is that they make you do a buffer read in the call back. All their buffer read functions re-enable the endpoint right there and then. But, we don't want it re-enabled, well, not until we have handled the data. I have attached some code. Some of it is specific to my project but easily adapted to get the same result:
In the USB endpoint callback, we call our data handling function:
void EP2_OUT_Callback(void)
{
// This occurs when a transfer complete interrupt is handled
// This is for each packet, not the full transfer. So, for a 1k transfer
// this will be called when each 64 byte packet arrives.
// *** THIS function is called under USB interrupt context ***
// This function MUST remove the data from USBD_Data_Buf otherwise it
// could be overwritten be another endpoints receive.
// Note that the endpoint is disabled in hardware upon transfer complete.
// Only this EP is disabled, so others could still overwrite the USBD_Data_Buf
// We must only re-enable the endpoint after all data has been handled to
// ensure the EP stays in a NAK state until we are ready.
// Warning USB_SIL_Read() and PCD_EP_Read() both re-enable the interrupt, so
// we cant use them.
// The function below implements a custom read function to read from the EP
// without re-enabling it. It is only re-enabled once data has been handled.
USBBlockArrived();
}
In our data handling function we do this:
void USBBlockArrived(void)
{
CopyUSBRxDataToRxBuffer();
ProcessUSBReceivedBlock();
}
// This one is just a USB_SIL_Read, with the endpoint enable removed
void CopyUSBRxDataToRxBuffer(void)
{
USB_OTG_EP *ep;
uint32_t i = 0;
// This part is from USB_SIL_Read()
// Get the structure pointer of the selected Endpoint
ep = PCD_GetOutEP(EP2_OUT & 0x7F);
// Get the number of received data
uint32_t dataLength = ep->xfer_len;
// This part copied from PCD_EP_Read()
// Copy received data into application buffer
g_USBRxDataBufFree = FALSE;
for (i = 0 ; i < dataLength ; i++)
{
g_USBRxDataBuf[i] = ep->xfer_buff[i];
}
g_USBRxDataBufSize = dataLength;
// Setup for the Xfer
ep->xfer_buff = g_USBRxDataBuf;
ep->xfer_len = dataLength;
ep->xfer_count = 0;
ep->is_in = 0;
ep->num = EP2_OUT & 0x7F;
}
The actual data processing happens in here:
// *** This function can be called under USB interrupt context ***
void ProcessUSBReceivedBlock(void)
{
// TODO:
// Need to protect the tx fifo as will be accessed under interrupt
// Need a way to check that the EP interrupt never happens while the rx buffer is not free.
// Tidy up tx fifo protected access functions
// Ensure protected access functions are used everywhere
// Disable EP interrupt, or complete USB int? Probably just EP mask as that is the only int that can mess things up...
// if(room in our tx buffer && usb rx buffer full)
if(!g_USBRxDataBufFree && (GetRoomInTransmitFifo() >= TX_FIFO_HEADROOM))
{
// Add the USB received data to our transmit FIFO or buffer here
g_USBRxDataBufFree = TRUE;
EnableUSBRxEndpoint();
}
}
Then we can reenable the endpoint:
void EnableUSBRxEndpoint(void)
{
USB_OTG_EP *ep;
ep = PCD_GetOutEP(EP2_OUT & 0x7F);
OTGD_FS_EPStartXfer( ep );
}
So, this works mint while our transmit buffer has room, but eventually the transmit buffer will fill up, leaving the USB endpoint blocked. So, in main code, we need to keep checking for room in the transmit buffer. When there is room, we use the last received data, re-enable the endpoint and life goes on...
mainloop()
{
void DoUSBReceive(void)
{
// Disable USB interrupts here
HW_SetUSBOTGInterrupts(DISABLE);
// We disable all USB interrupts while doing this as we dont want USB calling this function
// while we are working on the rx buffer. Disable all USB ints as it may be risky just to
// disable the EP tx complete interrupt. There seems to be a required order for interrupt
// handling of fifo level and tx complete. Disabling only one could mess up the order...
ProcessUSBReceivedBlock();
HW_SetUSBOTGInterrupts(ENABLE);
}
To enable./disable interrupts:
void HW_SetUSBOTGInterrupts(FunctionalState state)
{
if(state == ENABLE)
{
NVIC->ISER[OTG_FS_IRQn >> 0x05] = (uint32_t)0x01 << (OTG_FS_IRQn & (uint8_t)0x1F);
}
else
{
NVIC->ICER[OTG_FS_IRQn >> 0x05] = (uint32_t)0x01 << (OTG_FS_IRQn & (uint8_t)0x1F);
}
}
I have tested this code reasonably well and it works perfectly in my application. It would be good for any application sending USB to a slower communications channel (eg UART) or writing to slow memory etc... This actually is how it is done on non connectivity devices but instead of enabling/disabling endpoints you can use FreeUserBuffer after you have handled the data...
On a side note, WinUSB provides for a free driver that can achieve full data rates using the above code. You cant do VCP with WinUSB, but there is no need as your application can just read and write the pipes directly without the need for VCP complexity (unless you want all the modem type functioanlity).
Cheers to the people who helped me get this sorted...
2011-05-17 05:09 AM
sorry for resurrecting an old thread, but Duncan's work just helped me a lot. Thanks for that!
I had one issue with your code, though: you useduint32_t dataLength = ep->xfer_len; But in my system I sometimes received zero-length packets. I'm using CDC and a virtual Com Port; zero-length packets never arrived using Hyperterminal, but consistently appeared using TeraTerm. When a zero-length packet arrives, ep->xfer_len returns 64 instead of zero. So I used uint32_t dataLength = ep->xfer_count; This has been working so far. The same code is in the original ST library, as well. In my opinion this is a real bug in the ST sources. What do you think? jens