I too am using a CL (`105) for USB comms, and have had to face similar problems (by the way, I’m assuming that you’re using STM32_USB-FS-Device_Lib_V3.2.1 (circa 07/05/2010)?).
My project is a commercial one, using CDC (aka Virtual COM Port) profile, and I was not happy with the standard Windows driver – throughput seems to be poor – so we’ve plumped for using the Thesycon driver – not cheap!
Whilst this now appears to allow up to the full theoretical bandwidth for a Bulk Transfer (which is what a CDC profile is using underneath) - ~2..3 Mbps - this brings its own problems with ST’s stack! This includes us finding a bug regarding Zero Length Packets (and a cure – ST have been made aware of it, and promise that it will go in the next library update, real soon now!).
Anyway, back to your exact question.
I started integrating USB into our existing project by using ST’s VCP demo, then modifying it as needed.
They service each incoming packet in situ within EP3_ OUT_Callback() – they read it into a local buffer, USB_Rx_Buffer, by the call to USB_SIL_Read(), with USB_Rx_Cnt being set to the number of bytes received (up to the maximum of 64). As far as I can see, this is the least that the callback has to do, otherwise the USB hardware will remain tied up indefinitely, with further packets from the host being NAKed.
They then (somewhat simplistically) send it (the whole contents of their local buffer) to the UART, via a call to USB_To_USART_Send_Data(). As they put it, “USB data will be immediately processed, this allow next USB traffic beeing NAKed till the end of the USART Xfet” [sic].
The trouble is, this is a blocking function, and will ‘baby-sit’ the USART until the buffer has emptied, but remember, this call is within a callback, that itself has been (indirectly) called from a USB ISR, so effectively it’s still inside an interrupt – it maxes-out the CPU! (this won’t be noticed if only typing data into a terminal emulator though).
I changed this to still immediately transfer the whole of the latest packet within the callback (in order to free up the hardware ASAP), but to my own (larger) FIFO (circular buffer). Subsequently, foreground code then ‘consumes’ the FIFO's contents at its own pace – I don’t even need a separate flag – the head and tail pointers of the FIFO are just inspected, and if they’re different, then there’s FIFO contents to consume. Clearly, there’s a separate issue of average ‘producer’ and ‘consumer’ rates being matched, otherwise the FIFO will overflow, but our specific higher-level protocol sorts that out – we never ‘stream’ data continuously, so the FIFO's size is chosen according to our protocol's 'blockiness'.
This appears to be more in line with what you want (and should apply regardless of what USB profile you’re using – the technique of calling USB_SIL_Read() within the callback should be universal?).
NB whilst you might be tempted to call USB_SIL_Read() from foreground code, I think that this might be dangerous – ST’s stack doesn’t seem to be ‘robust’! At least calling it from within the callback guarantees that it’s directly in response to a hardware interrupt, rather than some asynchronous event in your code.
Regarding the USB stack’s robustness (or lack of it!), I have a separate discussion about how slowly one can clock a STM32 and still get correct enumeration – theoretically, as long as SYSCLK is either 72 or 48 MHz, it should work (no matter what (lower) frequency HCLK is). However, I’ve discovered that anything less than 72 MHz (for both of them) is ‘flaky’. I cannot prove it, but I think that there may be a race condition within their stack which, when combined with Windows’ variable latency, causes hit and miss behaviour. Alas, to date, no-one has replied. Out of interest, what clock frequencies are you running your CL at?
Also, I have similar problems to you (with “interrupt crap storms”) when sending data to a PC with an app that’s ‘not there anymore’. I await leppie’s update!
I hope this helps.
Retrieving data ...