2022-08-30 02:55 PM
I am using USB for both MSC (removable drive) and CDC (virtual COM port).
MSC works perfectly.
CDC out has been working perfectly for ~2 years.
CDC in had never really been tested, and ~ 1 byte every 20 is corrupted.
I am running CDC to Teraterm on the PC, and the out mode has been used for debugs.
The in mode gets activated of you press keys on the keyboard.
I have a simple RTOS task which copies in data to output. That is where I see the corruption.
It is entirely fixable by disabling certain RTOS tasks! But the ones which "fix" it don't do anything related to USB. I am thus suspecting there is a critical region which is unprotected and RTOS related task switching messes it up.
These functions should be well known because it is generated by Cube MX. I don't use MX (the code generated tends to work but is bloated junk and often works by luck because the cloated code covers up timing issues) but it was used when this project was being created ~ 3 years ago by someone else.
2022-09-01 04:55 AM
> I am not using OTG
By "OTG hardware" I don't mean "hardware which accomplishes the OTG functionality", rather, "the Synopsys USB IP module, which ST calls OTG_FS and OTG_HS".
I also wrote:
>> You definitively need to instrument it (i.e. modify) to be able to debug this, as all data for all endpoints go through that function. Luckily you deal with one very particular endpoint so that's your starting point.
> I wonder if that ST code was ever supposed to work in both directions?
Maybe yes. I don't know, as I've said, I don't care, I write my own code, in my own way, in my own environment (read: no RTOS). This is the same as I've told you when you complained about lack of "public wisdom" about ETH/lwip: these are very complex modules with complex interactions with environment, so most knowledge except for the very lowest level is very circumstance-specific thus very hard to specify, and may not apply to your particular setup anyway.
Btw. upstream works in the same way as downstream (just in the opposite order), i.e. you tell the OTG module only the fact that you are going to transmit so and so data on that and that endpoint and leave it so, and then the module throws an interrupt and in that interrupt are you supposed to enter the data. So it may be that the "transmit" functions you've mentioned don't actually move data. As I've said, that issue with FIFO may not be your problem.
JW
2022-09-01 06:17 AM
Tracing the TX code it indeed looks like the TX path just sets up some memory values, and does not do any obvious hardware interaction, so something is monitoring those values (struct members) and kicks off the TX.
It is like ETH, where the 32F4 ETH controller is continually monitoring some pointers in a linked list of packets, to see if any new ones have been left there. I spent a whole load of time there too but that now seems to be all working fine. (Actually I have some questions on that which I will post separately).
The TX path is
CDC_Transmit_FS
USBD_CDC_TransmitPacket
USBD_LL_Transmit
HAL_PCD_EP_Transmit
and in there, just some hpcd-> value gets set
so "something else" is monitoring the memory. And presumably enabling the interrupt you mention.
I would happily test out suggestions for any low hanging fruit but I am out of ideas : - )
I doubt the RTOS is at all relevant, except to the extent of exposing an issue via timing being "just right".
2022-09-01 09:31 AM
As I've said, the lowest I see hanging is USB_ReadPacket().
JW
2022-09-05 02:12 PM
It is hard to debug this one because MSC shares half the functions in the function stack with CDC, and windows is polling the MSC device at about 1Hz.
If there was a way to disable the MSC polling then I could trace the CDC data through.
2022-09-05 03:24 PM
Even a single fact that the ST's USB stack does everything in interrupt context is enough for me to never use it. And the malloc() call is the best of it all! Knowing all of it and the great ST's code quality and team's competence level, when I needed an USB stack, I just went strait to TinyUSB. Integrating it in my project was a breeze and it works reliably. Of course, I cannot stop you from enjoying the fun of fighting against another broken bloatware from ST, but just in case:
https://github.com/hathach/tinyusb/tree/master/examples/device/cdc_msc_freertos
2022-09-05 03:28 PM
Unfortunately no way to suspend poling in Windows. For this you need your own host which you can modify (Linux or another MCU with USB). Try to make a faster trace, to ITM or just write to RAM buffer.
2022-09-05 11:14 PM
I don't have any heap usage (stupid idea in embedded work) in my project, except
I don't know why TinyUSB was not used for this project when it was started (not by me) but probably because it was started c. 2018 and TinyUSB wasn't around then, looking at the project dates. The guy who did it used the Cube MX "code generator" to generate the skeleton code. Like many others, he expected ST code to work!
MSC is rock solid and fast. It is just CDC "out" (PC to target) where I have a problem.
I will try ITM debug. Should be fast enough; use it already. Thanks. Problem is I don't really understand where to look. The structure (no pun intended) of that stack is right on the limit of my C competence. ST just love a struct for everything. Probably packet buffers in those two IRQ handlers.
2022-09-06 02:26 AM
The 1st keystroke, always corrupted in CDC_get_rx_bytes, is delivered correctly into fred1234[0] here. I trigger the breakpoint, line 984, with "3" on the terminal, and this enables it to isolate the issue
This is CDC_get_rx_bytes:
It is all the stuff in between where I struggle to find the data. Various buffers get passed by address.
AFAICT the data (the byte, and count=1) is right oin here (called from an ISR) and USBD_CDC_DataOut returns the right data also.
2022-09-06 05:21 AM
OK, found it and fixed it. It was junk in CDC_Receive_FS().
Various junk, but most notably it was enabling reception of next packet before reading out the previous one. Most of the time this would work (and did) unless the RTOS got in just at the wrong point and delayed things just long enough.
Final code:
One ISR loads data into a 64 byte linear buffer and another ISR calls (eventually) CDC_Receive_FS() where in the above example this buffer gets copied into a 1024 byte circular buffer. This is the flow:
"Even a single fact that the ST's USB stack does everything in interrupt context is enough for me to never use it."
On the one hand you have a go at me for polling ETH RX and on the other you say the above.
In the end it turned out that the USB CDC "OUT" ISR chain was delivering the right data all along.
My criticism of "HAL" is that it has poor granularity. All the code is basically bloated. But it shows up especially in the ISRs, which people rarely look at because by their nature they are convoluted, and interrupt generation (IP bits) behaviour is often nontrivial. ISR-driven software tends to be convoluted anyway. It is only when one starts to step through one of the ex-MX generated ISRs that one sees just how bad it is. It is stepping through a large number of possible int sources (most of which are not enabled anyway). They get away with this at 168MHz and 1 clock per cycle... You would never write ISRs that way yourself; it would do only what is needed and then gets out of there.
Also a lot of the code comments are simply wrong - probably due to the coder not speaking English.
2022-09-06 11:35 AM
CDC_Receive_FS() is apparently your code. If it originated from CubeMX-generated, it would be interesting to see what exactly did CubeMX generate.
JW