2025-07-07 6:09 AM - last edited on 2025-07-09 2:13 AM by Amel NASRI
Hello Folks
I'm looking for suggestions / hints on what's the best way to troubleshoot a strange problem we're experiencing.
We're running AzureRTOS/ThreadX + NetxDuo (Version 6.1.0) where we have a MQTT Client subscribing to a broker, waiting for a message and then reply back.
As Phy, we're using LAN8742A.
The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.
The other part of application is correctly running, and in order to troubleshoot better the issue we're experiencing (which is highly replicable) we have:
None of the previous tries give us a clue on what's happening. The only thing we've noticed is that after this happen we're no longer able to get ETH isr triggering.
Just for your information we're sending 10 message, at 10ms each one . After 4/5 burst, the ip stack get stuck and we loose also the ping (pc is pinging the board).
Any suggestion?
Regards
Davide
2025-07-07 6:40 AM
> The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.
This sentence is not fully comprehensible.
Do you mean, the PC floods the F4-based board, which suddenly stops to respond ?
I would recommend to instrument the code in question, including the ETH interrupts. Perhaps use GPIO toggles to reduce the additional load, and use a scope.
And, review your error handling.
Perhaps overflow (loss of packages due to overrun) is not handled well, or at all.
And due to ETH buffer capacities and core performance of you setup, you might need to reduce your expectations.
If I remember correctly, most application processors (Cortex A, x86) and associated network interface ICs have internal buffer capacities for at least two jumbo frames.
2025-07-07 8:27 AM
I had a similar problem with H7, and my workaround implies to re-write Interface between HAL and NetX.
I tested almost all examples and non could stand without crashing a flood of syn with original code.
Actually STM code is based on multiplexing ethernet interrupts, put in thread, demultiplex them, execute some processing with no mutexes on STM global variables.
With low traffic work fine but not on heavvy load.
Re-writing the interrupt interface, we achieved a stable solution, no way to crash on any Ethernet traffic, bad packets or similar.
Obviously, CPU load is a limit - reduce time and resources for ethernet thread and you can lose some packets but without any crash - this is why we use only H7, we do have heavvy HTTP traffic in IPv6 and IPv4 in parallel.
2025-07-07 8:56 AM
@Ozone wrote:> The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.
This sentence is not fully comprehensible.
Do you mean, the PC floods the F4-based board, which suddenly stops to respond ?
Yes, i mean the PC floods F4 board and it suddenly stop to respond (F4 board).
I would recommend to instrument the code in question, including the ETH interrupts. Perhaps use GPIO toggles to reduce the additional load, and use a scope.
And, review your error handling.
Perhaps overflow (loss of packages due to overrun) is not handled well, or at all.
I hade a look with a Profiling tool (Tracealyzer) and i see no problem of interrupts there.
Btw i could have a look enabling some debugging defines NetXDuo offers, to see if there's something interesting.
The fact is that after i send the first burst (8 messages @ 10ms each) everything work fine. Second burst works too, after that it could happen that Third burst work or stuck.
I could lower my expectation about performances, i can even accept to loose packets , that's fine.
What i can't accept is that after loosing packet , the communication can't be restored.
I'm using as low-level interface between LAN8742 and NetX the integration provided in ST Examples.
Thanks
Davide
2025-07-07 9:02 AM
@mbarg.1 wrote:I had a similar problem with H7, and my workaround implies to re-write Interface between HAL and NetX.
I tested almost all examples and non could stand without crashing a flood of syn with original code.
Actually STM code is based on multiplexing ethernet interrupts, put in thread, demultiplex them, execute some processing with no mutexes on STM global variables.
With low traffic work fine but not on heavvy load.
Re-writing the interrupt interface, we achieved a stable solution, no way to crash on any Ethernet traffic, bad packets or similar.
Obviously, CPU load is a limit - reduce time and resources for ethernet thread and you can lose some packets but without any crash - this is why we use only H7, we do have heavvy HTTP traffic in IPv6 and IPv4 in parallel.
I think you got the point. There's something that happens only on heavy load due to something broken between NetXDuo low level interface and Lan8742 integration.
Was you you using NetX too or LWIP ?
Regards
Davide
2025-07-07 6:21 PM
> first burst (8 messages @ 10ms each) everything work fine. Second burst works too, after that it could happen that Third burst work or stuck.
People familiar with testing of network equipment know that proper tests include thousands of hours, with all combination of packet sizes and data patterns. ST does not provide any low-level examples or tests.
2025-07-08 8:22 AM
@Davide Dalfra : actually I do use ThreadX + NetXDuo.
Two years ago STM stopped supporting LwIP even if now they decided that on new processors they will move back to FreeRtos+.
ThreadX so far (at least on my experiernce) has proven to be 100% reliable - one caveat, always check that you dont overflow threads stack, there is no warning and it is very difficult to forecast stack size, while FreeRtos had many more instabilities in several C functions.
Even if final design run in custom boards, I do always develop on STM nucleo boards, to be able to share possible bugs and limitations with other and exclude hw related problems.
As sad above, look at Ethernet interrrupts routines and you can easily see that proposed interface is a non sense.
Mike
2025-07-08 8:24 AM
@Pavel A. : to crash any demo, just use nc to generate a flood of syn - you do not need hours or days.
Mike