stmmac - XDP program in native mode - RCU errors

NGell.1 · ‎2022-11-13

Hi there,

I'm working with stmmac driver. I understand this driver is integrated with XDP support.

On my board I'm running Linux kernel 5.15.32 with preempt_rt enabled, and I'm running a AF_XDP program on user space at high RR priority, on isolated cpu.

When running the XDP program on the network interface driven by stmmac on the socket buffer level everything is working fine.

When trying to run the same XDP program in native mode, that is, on the network driver level I'm getting a bunch of the following errors:

NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #8

And from this point the system is not very usable anymore...

The AF_XDP program transmits a icmp request that is received on the other side and a reply is transmitted back. The AF_XDP program then poll() the xsk socket, no data is read from the socket, and this error comes out.

Have you any idea?

Thanks a lot,

Nir.

Olivier GALLIEN · ‎2022-11-15

Hi @NGell.1

Linux kernel 5.15.32 with preempt_rt is not a configuration ST will support.

I can just provide you below some generic hint and advice to rather find some help on pure linux community:

"If you're seeing a few of those every now and then, nothing to be alarmed about. It can happen when the CPU is stressed and is normal. If you're constantly seeing it then you might want to consider either reducing the CPU load or disabling NOHZ.

For understanding what NOHZ does, I highly recommend this Stack Overflow answer to: "How NOHZ=ON affects do_timer() in Linux kernel?".

If you want to get rid of the message, you can simply add the following to /boot/armbianEnv.txt:

extraargs=nohz=off

But if you decide to do this, make sure you fully understand what it does by reading the following article :

no_hz.rst - Documentation/timers/no_hz.rst - Linux source code (v5.15.32) - Bootlin

Hope it help

Olivier

Olivier GALLIEN
In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Kevin HUBER · ‎2022-11-24

Hello @NGell.1 ,

In order to give better visibility on the answered topics, please click on 'Select as Best' on the reply which solved your issue or answered your question. See also 'Best Answers'

Best Regards,

Kevin

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

NGell.1 · ‎2022-12-06

Hi Olivier,

Thanks for your reply.

Further investigating the problem I'm running into I have discovered the following:

When loading a XDP program in SKB or native mode, it is working just fine.

When loading an AF_XDP program in SKB mode, it is also working properly.

When loading an AF_XDP program in NATIVE mode, and forcing XDP_COPY, again, everything is working properly.

When loading an AF_XDP program in NATIVE mode, and trying to use XDP_ZEROCOPY, the problem comes up.

cat /proc/interrupts suggests that no eth1 interrupts are triggered, and the network interface is not responsive anymore.

If I kill the user process, still not unloading the XDP program, interrupts suddenly are triggered and the network interface is functional again.

Can you assist in debugging this issue?

Perhaps you could forward this issue to the following gentlemen, they might be interested in helping too:

Giuseppe Cavallaro <peppe.cavallaro@st.com>, Alexandre Torgue <alexandre.torgue@st.com>, Jose Abreu <joabreu@synopsys.com>

Thanks a lot,

Nir.