STM32MP1-DK2 ethernet stress test crash the board

FFisc.1 · ‎2020-03-26

Hello

On a STM32MP1-DK2 discovery board, running the latest starter image, flooding the ethernet interface with SYN packages leads to a reboot of the system. This behavior is reproducible and appears after around 30 secs.

I stress tested the network interface of the STM32MP1-DK2 board with high loads, generated by a self written SYN flood application.

The board is connected via Gbit Ethernet to a developer PC, which sends the SYN packets (around 400Mb/s on sender device).

After around 30 seconds the boards reboots due to a stack exception.

Is this issue already known?

I appended the debug print over the serial interface during the network attack.

Thank you and best regards

Florian

Olivier GALLIEN · ‎2020-03-26

Hi @FFisc.1

Thanks for your report.

I update heading and body of the post for clarity.

I escalate to expert

BR,

Olivier

Olivier GALLIEN
In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Christophe Guibout · ‎2020-03-31

Hello @FFisc.1 ,

The attached trace gives some explanations :

- a warn_on in GPU / DRM which shows that system is overloaded (CPU is used from a high priority thread for a long time)

- then a NETDEV WATCHDOG

- finally, board reboots due to an hardware watchdog because userland doesn't seem to be scheduled anymore.

I didn't reproduice the issue by flooding the board with the following command executed from sender device : ping -f

No lost packet seen.

- Could you please share your application to send SYN flood ?

- You mentionned you're using the latest starter image, do you confirm you are on MMDV-1.2.0 ? Could you please tell what is your kernel version ?

BR,

Christophe

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

FFisc.1 · ‎2020-03-31

Hello Christophe,

thank you for your reply.

"Ping -f" does not produce enough network load to trigger this behavior (around 200KB/s on the sender device, measured with ipstat). I figured out, that the tool hping3 creates enough network load, to reproduce the stack exceptions:

$ hping3 --flood STM32MP1_IP_ADDR

Run this multiple times, to increase the network load. I started this four times, which creates around 60MB/s traffic.

So you can install an use hping3 to reproduce this error.

I downloaded and flashed the image within this .tar file:

en.FLASH-stm32mp1-openstlinux-4.19-thud-mp1-19-10-09.tar.xz

My image runs the kernel:

Linux stm32mp1 4.19.49 #1 SMP PREEMPT Sun Jun 9 07:17:25 UTC 2019 armv7l armv7l armv7l GNU/Linux

I do not understand what "MMDV-1.2.0" is supposed to mean.

I hope you can now reproduce this behaviour now. If you need more details, please let me know.

Best regards

Florian

Christophe Guibout · ‎2020-03-31

Hello @FFisc.1 ,

The latest version is stm32mp1-openstlinux-4.19-thud-mp1-20-02-19 which is part of STM32MP15-Ecosystem-v1.2.0, please have a look in the release note.

So, you're not on the latest release, but this is not a big deal regarding our problem because when using hping3, I'am able to reproduce a NETDEV WATCHDOG which is not systematic, but the eth0 adapter is reset in other cases is less than 30s.

This is an issue which needs to be analysed. I'll keep you in touch.

BR,

Christophe

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Christophe Guibout · ‎2020-04-10

Hello @FFisc.1 ,

When ethernet is flooded by SYN packages (DoS attack), the ethernet driver remains in RX IRQ handler (net_rx), so kernel doesn't schedule anymore, which leads to a NETDEV WATCHDOG and/or eth0 adapter is reset.

The biggest issue is that makes reset the board through an hardware reset : it will be fixed in the next DV STM32MP15-Ecosystem-v2.0.0 which is planned in the coming months.

To avoid NETDEV WATCHDOG watchdog, kernel needs to schedule, so the idea is to manage ethernet IRQ on only one CPU, and keep the second one available.

The following command allows to handle ethernet IRQ only on secondary CPU (49 is ethernet IRQ number available in /proc/interrupts)

echo 2 > /proc/irq/49/smp_affinity

BR,

Christophe

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

FFisc.1 · ‎2020-04-15

Hello Christophe,

thanks for the error description and the workaround.

I do not get the cause for this error completely yet. May you can answer me the following question:

For my understanding by applying the full preempt_rt patch to the Linux kernel, like it is done in the STM32MP15-Ecosystem Image, all hard interrupts are replaced by threaded IRQs, to provide a preemptible system.

Is this "RX IRQ handler" a threaded IRQ and if yes, how can it block the kernel from scheduling?

Best regards

Florian

Christophe Guibout · ‎2020-04-22

Hello @FFisc.1 ,

RX IRQ handler is not threaded which explains the issue.

Nevertheless, to "thread" this driver would help to improve performances. I added this to my TODO list.

BR,

Christophe

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.