2020-03-30 09:45 AM
Dear Ethernet experts,
ST prepares a fix for RX descriptors initialization issue reported here previously.
Please can you take a look?
https://github.com/STMicroelectronics/STM32CubeH7/issues/7
@Piranha @alister
Thanks in advance
-- pa
2020-03-30 12:08 PM
The DMA engine stops when it increments the current descriptor pointer to the tail descriptor pointer. But how the driver initialises the tail pointer depends on how the driver is intended to work.
When I posted the inspection and fixes/improvements to the driver at https://community.st.com/s/question/0D50X0000C6eNNSSQ2/bug-fixes-stm32h7-ethernet, I wasn't sure what the intention was. I changed the wild tail descriptor pointer assignment to zero because that was the least code and cycles to make the DMA engine never stop, and if my rx buffer pool ever became empty, I made the next buffer-free signal the rx thread to receive and rebuild the descriptors, and then RBU is needed if the rx buffer pool is still empty when the past packet is received before the last descriptor's done. Then I found even without running out of buffers, if descriptor_count * buffer_size can be close to a giant packet size (RM0433 naming), and more or less depending on interpacket gap, RBU is needed for that too.
The bit about receiving the last packet before the last descriptor is done requires some backstory. My project is memory constrained and its ethernet packets are small. So I defined a small rx buffer size to maximise the buffers. For testing I whipped the target with giant packets, so the received packets are long buffer-chains and the buffer handling needs to be robust, then to properly test, I reduced the number of buffers to minimum and then the number of descriptors to minimum... and then any problems would be quickly found.
This is what ST intended: They're initialising the tail pointer to the last descriptor because they know at initialisation the current descriptor will be the first and they're re-writing the tail pointer at the new last (i.e. furthest from current) every time they receive a packet. If descriptor_count * buffer_size can be close to a giant packet size, I expect ST's driver would need RBU handling too.
EDIT: There's an assumption in ST's intention that at init, the current rx descriptor points to the first (lowest address) descriptor. But if init's not explicitly resetting it, and especially if the buffer size is small, if the driver was later deinitialised and reinitialised, it'd RBU.
BTW a code snippet in a recent post to your bug in github, has volatile on the BackupAddr0 and BackupAddr1 members of ETH_DMADescTypeDef. That's a bug too because volatile on those only cost cycles. They're not accessed by the DMA engine (DSL says they're skipped).
2020-03-31 04:10 AM
Thank you Alister. So if I understand your comment correctly, this register can be set differently depending on the overall design of the driver, there's no single "correct answer". If so, fixing that issue is not meaningful without a proper fix of the whole ETH driver. Your comment basically explains why the ST examples seem to work (with luck) even without any fix.
-- pa
2020-03-31 10:00 AM
First someone in HAL team "fixes" the working code from v1.3.0 by adding braces in wrong places. We report the bug and offer 3 correct versions, including the old one from v1.3.0. ST have a link to the topic, but they are unable to read/understand it, therefore they try to reinvent the wheel by themselves. They threat this as some big issue and after four months come up with a... again a "fix" broken because of wrong braces. You explain it to them, they offer even more tragic "fix" and comment it with complete nonsense, which reveals that ST's developers don't know what a pointer arithmetic is at all. But they have no shame of writing this: "I hope this makes things clearer. Please do not hesitate if you still have questions about the suggested fix." Apparently because of a Dunning–Kruger effect. After another comments from you, they silently go and read what a pointer arithmetic is, and finally come to a correct solution, which is exactly the code I offered in a bug report topic. And that is one of the simplest and least important issues in that driver...
I'm sorry, but this is ridiculous. If they cannot calculate a single trivial pointer, then they are unable of making anything non-trivial, such as a stable, performing and flexible drivers.
2020-03-31 03:06 PM
>ST examples seem to work
Yes, assigning the tail pointer any value not in the descriptor list makes the DMA engine never stop until the next descriptor isn't ready. Then ST's driver would recover its intention when it received its first packet.
Our first experience with ST's driver was probably typical, e.g., a ping of default data size worked ok, so next job...
BTW almost the entire list of problems in the doc of my thread were found by code inspection only. It'd take much too long to find them by testting.
2020-04-21 08:21 PM
>The DMA engine stops when it increments the current descriptor pointer to the tail descriptor pointer.
Also, the DMA won't start if the current descriptor pointer = the tail descriptor pointer.
If not for this, a more correct implementation for ST would have been to initialize the tail as the first, because it's a ring and first is the greatest number of increments from first.
2021-11-02 12:10 PM
After updating to the HAL version with this fix, if a connection is made during initialization (tcp syn packet), I get a crash. The timing is a bit too tricky to make it occur with a debugger, but I have narrowed the change in code down to the change made by this fix.
Possible Issue - My suspicion is that when the tail points to the start of the last descriptor (RxDesc + (ETH_RX_DESC_CNT - 1)), it tries to consider that last descriptor's buffer as data owned by the application. I'll have to check exactly how this leads to a crash in my case, it may end up being bad handling somewhere.
Possible Solution - Setting DMACRDTPR to (RxDesc + ETH_RX_DESC_CNT) seems to work well and fits what is described by the figure in the reference manual: Eth->Descriptors-> Figure 798. DMA descriptor ring.
Possible Issue with Solution - As alister describes above, assigning the tail pointer any value not in the descriptor list makes the DMA loop until a packet is received. Is this also true for exactly (RxDesc + ETH_RX_DESC_CNT)? If so, why does the reference describe doing this when having only one descriptor?
2021-11-02 04:23 PM
>Possible Issue/Possible Solution
Initialising ETH_DMACRXDTPR as (RxDesc + (ETH_RX_DESC_CNT - 1)) is sensible if the driver uses ETH_DMACRXDTPR to stop the DMA if there are insufficient receive buffers. Initialising it as (RxDesc + ETH_RX_DESC_CNT) is sensible if the driver does not use ETH_DMACRXDTPR, and is the same as writing it any value not in the descriptor list.
>why does the reference describe doing this when having only one descriptor?
DMA will stop without using the descriptor or caring who owns it if ETH_DMACRXDTPR points to it.
>I get a crash
If you could clarify some details, it'd be easier for someone on Community to know or guess the cause.
What is the "crash" exactly? Just Ethernet rx stopping, or a fault exception or what?
What Cube FW_H7 are you using?
What changes have you made?
How many rx descriptors have you dimensioned?
>if a connection is made during initialization (tcp syn packet), I get a crash
Are you saying if this TCP connection occurs later it does not crash?
2021-11-03 09:23 AM
>If you could clarify some details, it'd be easier for someone on Community to know or guess the cause.
I am not looking for too much help, I will track it down further later. Given the odd behavior, it likely is specific to our implementation.
I just wanted to make a note in case it was something obvious or others are seeing the same thing.
The crash: The stm32 resets without hitting any fault handler, which I would not expect to be possible.
Changes: Only options for the hal eth and drivers, a large amount elsewhere.
Cube FW: STM32Cube FW_H7 V1.8.0
RX descriptors: 4
>Are you saying if this TCP connection occurs later it does not crash?
This is correct, everything works well unless the TCP connection occurs in a tiny window, maybe 200ms, during startup.
2021-11-03 10:26 AM
> The stm32 resets without hitting any fault handler
Stack overflow/corruption? Do you have any of the watchdog timers active?