cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F429 + NetxDUO - Heavy traffic problems

Davide Dalfra
Associate III

Hello Folks

I'm looking for suggestions / hints on what's the best way to troubleshoot a strange problem we're experiencing.
We're running AzureRTOS/ThreadX  + NetxDuo (Version 6.1.0) where we have a MQTT Client subscribing to a broker, waiting for a message and then reply back.
As Phy, we're using LAN8742A.

The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.

The other part of application is correctly running, and in order to troubleshoot better the issue we're experiencing (which is highly replicable) we have:

  • Increased the ip packet pool
  • Increased the RX descriptors
  • Disabled all the other task , just leave NetX ip instance and the task running the mqtt client(wait for message and then reply with a static message saying "Hello").
  • Force speed from 100MBit down to 10Mbit.

None of the previous tries give us a clue on what's happening. The only thing we've noticed is that after this happen we're no longer able to get ETH isr triggering.

Just for your information we're sending 10 message, at 10ms each one . After 4/5 burst, the ip stack get stuck and we loose also the ping (pc is pinging the board).

 

Any suggestion? 

 

Regards

Davide

 

 

 

4 REPLIES 4
Ozone
Principal

> The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.

This sentence is not fully comprehensible.
Do you mean, the PC floods the F4-based board, which suddenly stops to respond ?

I would recommend to instrument the code in question, including the ETH interrupts. Perhaps use GPIO toggles to reduce the additional load, and use a scope.

And, review your error handling.
Perhaps overflow (loss of packages due to overrun) is not handled well, or at all.

And due to ETH buffer capacities and core performance of you setup, you might need to reduce your expectations.
If I remember correctly, most application processors (Cortex A, x86) and associated network interface ICs have internal buffer capacities for at least two jumbo frames.

mbarg.1
Senior III

I had a similar problem with H7, and my workaround implies to re-write Interface between HAL and NetX.

I tested almost all examples and non could stand without crashing a flood of syn with original code.

Actually STM code is based on multiplexing ethernet interrupts, put in thread, demultiplex them, execute some processing with no mutexes on STM global variables.

With low traffic work fine but not on heavvy load.

Re-writing the interrupt interface, we achieved a stable solution, no way to crash on any Ethernet traffic, bad packets or similar.

Obviously, CPU load is a limit - reduce time and resources for ethernet thread and you can lose some packets but without any crash - this is why we use only H7, we do have heavvy HTTP traffic in IPv6 and IPv4 in parallel.

 

 


@Ozone wrote:

> The problem we're experiencing is that while the application (based on pc) is flooding of messages the STM32F4 board, then it suddenly stuck somewhere on NetX side.

This sentence is not fully comprehensible.
Do you mean, the PC floods the F4-based board, which suddenly stops to respond ?

Yes, i mean the PC floods F4 board and it suddenly stop to respond (F4 board).


I would recommend to instrument the code in question, including the ETH interrupts. Perhaps use GPIO toggles to reduce the additional load, and use a scope.

 

And, review your error handling.
Perhaps overflow (loss of packages due to overrun) is not handled well, or at all.


I hade a look with a Profiling tool (Tracealyzer) and i see no problem of interrupts there. 
Btw i could have a look enabling some debugging defines NetXDuo offers, to see if there's something interesting.

The fact is that after i send the first burst (8 messages @ 10ms each) everything work fine. Second burst works too, after that it could happen that Third burst work or stuck.

I could lower my expectation about performances, i can even accept to loose packets , that's fine.
What i can't accept is that after loosing packet , the communication can't be restored.
I'm using as low-level interface between LAN8742 and NetX the integration provided in ST Examples.

Thanks
Davide


@mbarg.1 wrote:

I had a similar problem with H7, and my workaround implies to re-write Interface between HAL and NetX.

I tested almost all examples and non could stand without crashing a flood of syn with original code.

Actually STM code is based on multiplexing ethernet interrupts, put in thread, demultiplex them, execute some processing with no mutexes on STM global variables.

With low traffic work fine but not on heavvy load.

Re-writing the interrupt interface, we achieved a stable solution, no way to crash on any Ethernet traffic, bad packets or similar.

Obviously, CPU load is a limit - reduce time and resources for ethernet thread and you can lose some packets but without any crash - this is why we use only H7, we do have heavvy HTTP traffic in IPv6 and IPv4 in parallel.

 

 


I think you got the point. There's something that happens only on heavy load due to something broken between NetXDuo low level interface and Lan8742 integration.

Was you you using NetX too or LWIP ?

 

Regards

Davide