2022-04-20 5:01 AM
I have a custom board with an STM32F417 MCU. It needs to communicate with a PC over Ethernet (UDP).
I have used Cube MX 6.5.0 to set up my project and have enabled LWIP. The board is now sending/receiving UDP packets with the PC.
However when the PC sends a UDP packet with a payload of 1007 bytes, the packet becomes truncated. Only 958 bytes of data appear in the pbuf in the udp_receive_callback .
It appears that there is a problem reconstructing the packet payload from the pbuf chain. I am calling pbuf_copy_partial(p, buff, p->tot_len, 0) in the udp_receive_callback function.
Any suggestions regarding this issue? Has anyone successfully received large UDP packets using a Cube MX 6.5.0 generated project?
Thanks
Solved! Go to Solution.
2022-04-21 6:02 AM
I have fixed this problem. It was caused by the ethernet rx buffer size being set at 1000 bytes.
This is caused by a bug in the STM32CubeF4 MCU Firmware Package STM32Cube FW_F4 V1.27.0
More info here: https://github.com/STMicroelectronics/STM32CubeF4/pull/119
2022-04-21 6:02 AM
I have fixed this problem. It was caused by the ethernet rx buffer size being set at 1000 bytes.
This is caused by a bug in the STM32CubeF4 MCU Firmware Package STM32Cube FW_F4 V1.27.0
More info here: https://github.com/STMicroelectronics/STM32CubeF4/pull/119
2022-04-23 9:20 AM
The ridiculous constant buffer size is only one of the problems with that code...
The code is setting the OWN bit before modifying other descriptor values in the subsequent lines. And, instead of SET_BIT() macro, it uses WRITE_REG() macro, which corrupts (clears) all other bits configured before. These are two critical and absolutely idiotic bugs in a single trivial line of code! As HAL developers have hard time understanding even the most basic issues and solutions, I'm writing the corrected code here:
if (heth->RxDescList.ItMode != 0U)
{
  WRITE_REG(dmarxdesc->DESC1, ETH_RX_BUF_SIZE | ETH_DMARXDESC_RCH);
}
else
{
  WRITE_REG(dmarxdesc->DESC1, ETH_DMARXDESC_DIC | ETH_RX_BUF_SIZE | ETH_DMARXDESC_RCH);
}
 
/* Ensure rest of descriptor is written to RAM before the OWN bit */
__DMB();
 
SET_BIT(dmarxdesc->DESC0, ETH_DMARXDESC_OWN);And I can even tell how all of this disaster happened... Let's take a look on the new rewritten H7 driver:
This code is correct. For H7 the buffer size is configured in a register and setting OWN bit in that IF construction is the last modification of the descriptor values. But then the H7 code was ported to F4, for which the hardware and descriptor format differs.
For buffer size the H7 code hasn't "a similar name to copy-paste" close to this code location, therefore the developer just put in the "nice" 1000 just to get it compiling. The fact, that the developer didn't use the ETH_RX_BUF_SIZE macro, shows that he/she doesn't understand HAL software architecture.
As for the OWN bit, the developer was probably instructed to read my article about barriers, where part of the conclusion is: "The universal fix is to put __DMB() macro just before: Setting OWN bit." So he/she added the necessary setting of OWN bit literally after the __DMB(). Most likely because the developer doesn't even understand the concept of a descriptor.
After disabling interrupts, one cannot just unconditionally enable them. They have to be restored to the previous state like it is shown in this topic. Another critical bug for both - H7 and F4!
That's about bugs. But wait, that's not all...
This code is not even from the new H7 driver, but from the previous F4 driver. Testing DMA status is useless as a simple write to ETH_DMATPDR register is correct and is more efficient. It's done efficiently for ETH_DMARPDR, so why are there different approaches? But the comment "Set the Tail pointer address" on the line before is copied from H7 and is wrong for the F4. Chaos!
This line is also copied over from H7 without reading the reference manual. On F4 any value issues a poll demand and therefore calculating some complex pointer is waste of resources and bloated code. Actually even writing zero is not the most most efficient code, but this one is:
ETH->DMATPDR = (uint32_t)ETH;  // Any value issues a descriptor list poll demand.For memory barriers the new drivers mostly correctly uses the more efficient DMB instruction, but why there is DSB at lines 918 and 992? This time it comes from the lines 938 and 1012 of the new H7 driver. There is no reason for using DSB in any of these!
Those additional BackupAddr0 and BackupAddr1 elements of the descriptor structure are necessary for H7, because it's hardware modifies the descriptor completely and looses those addresses. But for F4 those are completely useless waste of memory, because the F4 hardware doesn't modify the fields, where addresses are stored. Obviously the developer has not read the reference manual either.
Apparently the H7 code was written by someone, who at least to some level understands what he is doing, but then porting to F4 was done by someone, who definitely doesn't understand what he/she is doing. Is this disgusting attitude for software considered normal for ST?
@Imen DAHMEN, the fun with the new rewritten driver starts...
P.S. All of these issues of F4 code are also the same for F7, F2 and F1. Therefore, if someone is currently porting that code to those series, fix it for those also.
2022-04-25 1:06 PM
Hi @Piranha ,@grise
Thanks for you feedback and for taking time to report this with details.
I raised this issue internally to check it asap (Internal ticket number 126954) .
(PS: Internal ticket number 126954, is an internal tracking number and is not accessible or usable by customers).
Imen
2022-04-28 5:57 PM
Looked even closer to those drivers...
The code in both of these lines of the new F4 driver sets the OWN bit, but then modifies other descriptor values after that. On H7, from which the code was ported, there are two factors that can suspend the DMA - OWN bit and descriptor tail pointer. Therefore on H7, if the tail pointer is managed accordingly, the order of setting the OWN bit relative to other descriptor words doesn't matter. But on F4 there are no tail pointer and therefore, when the OWN bit is set, the descriptor and buffer data must not be modified by the CPU anymore, because, if the DMA is already running, it can start working on that buffer at any moment. Not respecting that is a serious critical bug and there are already several topics in the forum reporting the F4 driver not working. Again - the developer, who ported the code from H7, has not read the reference manual for F4.
The issues reported in my previous comment are trivial to fix, but this one requires a significant rewrite of ETH_Prepare_Tx_Descriptors() function. Additionally the code logic is poor also - it configures the first descriptor and then runs a loop with almost the same code to configure an additional descriptors. The code should be written as a single loop, which configures any number of descriptors. For a competent developer it should be obvious...
@Imen DAHMEN, after this one, I would say this new F4 ETH driver should be reverted to the previous version, which was at least mostly working.
2022-08-05 6:07 AM
ST have released CubeF4 v1.27.1, where they have fixed the issues with setting OWN bit, buffer size and a __DMB() placement - only the first 3 issues from my report in this topic. All other issues, including the completely broken ETH_Prepare_Tx_Descriptors() function, which is the most critical issue, are still there.
And, despite the reworked ETH driver for F4 still being broken, they didn't listen to me and pushed the code from CubeF4 v1.27.1 to F7 series with the release of CubeF7 v1.17.0. The ETH drivers in those two packages are absolutely the same.
2023-05-21 4:43 AM
Another stupid bug in the reworked drivers for F4, F7 and H7:
https://github.com/STMicroelectronics/stm32f4xx_hal_driver/issues/14
The code in function HAL_ETH_ReleaseTxPacket() at lines 1462 and 1500 assumes that the ETH_TX_DESC_CNT is a power of two. As proposed, the correct fix is to use the INCR_TX_DESC_INDEX() macro, which was created exactly for this purpose.
And then for F4 and F7 there are also these two:
https://github.com/STMicroelectronics/STM32CubeF7/issues/92
https://github.com/STMicroelectronics/STM32CubeF7/issues/93
The first one requires a rework of a code logic for a proper calculation of the size of individual buffers of a frame, when the frame consists of more than a single buffer.
