cancel
Showing results for 
Search instead for 
Did you mean: 

Issues with crypto IP core

dimax
Senior

We have encountered a problem with CRYPTO IP core on STM32MP157C.

Our setup is STM32MP157C-DK2 with latest image/SDK installed

(openstlinux-5.10-dunfell-mp1-21-11-17).

1. First of all, our IPSec solution based on strongSwan doesn't work at

all when stm32-cryp.ko is loaded: after processing several packets

IPSec connection stucks. The only message we got in kernel ring

buffer is:

```

[ 102.064269] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

```

It stucks no matter what cipher is selected or which settings are

used. However, if we don't use stm32-cryp at all (e.g. if we unload

this module), IPSec connection works perfectly.

We haven't found a simple way to reprocude this bug without deploying

IPSec infrastructure (it's very simple to do it with AlgoVPN [1]), so

we can provide you an access to our test environment or give you more

details on request.

2. Moreover, we have made a performance test using cryptodev-tests

([2], but this package is available in Yocto SDK too) and `openssl

speed`, and it looks like software implementations are much faster

than hardware accelerated one.

The first test was performed with userspace software implementation

(as evidence, CPU was mostly in userspace (18.02s/18.38s) during this

test):

```

root@stm32mp1:~# cat /proc/crypto | grep cbc

root@stm32mp1:~# time openssl speed -evp aes-256-cbc -elapsed

You have chosen to measure elapsed time instead of user CPU time.

Doing aes-256-cbc for 3s on 16 size blocks: 1829060 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 64 size blocks: 548756 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 256 size blocks: 145037 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 1024 size blocks: 36751 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 8192 size blocks: 4614 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 16384 size blocks: 2304 aes-256-cbc's in 3.00s

...

type       16 bytes   64 bytes  256 bytes  1024 bytes  8192 bytes 16384 bytes

aes-256-cbc    9754.99k  11706.79k  12376.49k  12544.34k  12599.30k  12582.91k

real  0m 18.38s

user  0m 18.02s

sys   0m 0.00s

```

The second one uses hardware-accelerated algo:

```

root@stm32mp1:~# insmod stm32-cryp.ko

root@stm32mp1:~# cat /proc/crypto | grep cbc

name     : cbc(des3_ede)

driver    : stm32-cbc-des3

name     : cbc(des)

driver    : stm32-cbc-des

name     : cbc(aes)

driver    : stm32-cbc-aes

root@stm32mp1:~# time openssl speed -evp aes-256-cbc -elapsed

You have chosen to measure elapsed time instead of user CPU time.

Doing aes-256-cbc for 3s on 16 size blocks: 32666 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 64 size blocks: 26338 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 256 size blocks: 15378 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 1024 size blocks: 5661 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 8192 size blocks: 818 aes-256-cbc's in 3.00s

Doing aes-256-cbc for 3s on 16384 size blocks: 408 aes-256-cbc's in 3.01s

...

type       16 bytes   64 bytes  256 bytes  1024 bytes  8192 bytes 16384 bytes

aes-256-cbc    174.22k   561.88k   1312.26k   1932.29k   2233.69k   2220.82k

real  0m 18.13s

user  0m 0.07s

sys   0m 2.59s

```

Very similar results with speed test from cryptodev source code:

```

root@stm32mp1:~# insmod stm32-cryp.ko

root@stm32mp1:~# time ./speed

Testing AES-128-CBC cipher:

    Encrypting in chunks of 65536 bytes: done. 11.47 MB in 5.00 secs: 2.29 MB/sec

real  0m 5.00s

user  0m 0.00s

sys   0m 0.01s

```

So the question is: is this the real performance (2 MB/s for chunks

>16KB) of crypto IP core or is it an issue due to drivers or any

other hw/sw interaction problems? As we know, we are not the only

ones who bumped into this issue ([3], last answer).

It's worth noticing that during hardware-accelerated test CPU was

intensively used (95.4%) in kernel space with irq/60-54001000 task,

so this method can't be used even for reducing CPU load with

offloading it to crypto IP.

P.S. We have added these lines into local.conf to build strongSwan

and OpenSSL with cryptodev support:

```

PACKAGECONFIG_append_pn-openssl = " cryptodev-linux"

IMAGE_INSTALL_append = " strongswan cryptodev-module cryptodev-tests"

```

Thank you in advance!

[1]: https://github.com/trailofbits/algo

[2]: https://github.com/cryptodev-linux/cryptodev-linux/tree/master/tests

[3]: https://community.st.com/s/question/0D50X0000C4POdo/crypto-api

4 REPLIES 4
Kevin HUBER
ST Employee

Hello @dimax​ ,

Thank you for your detailed message.

I will try to help you.

  1. stm32-cryp.ko problem

Regarding your first question about your issue with stm32-cryp.ko.

You are talking about this error that you got:

[ 102.064269] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

If you're seeing a few of those every now and then, nothing to be alarmed about. It can happen when the CPU is stressed and is normal. If you're constantly seeing it then you might want to consider either reducing the CPU load or disabling NOHZ.

But if you decide to do this, make sure you fully understand what it does by reading the kernel documentation: https://elixir.bootlin.com/linux/v5.10.10/source/Documentation/timers/no_hz.rst

To help you on this error of stuck behavior, I need to have:

  • your complete log from the boot to the issue.
  • Your dts files and / or your dtb, if you made some modifications in it.

2. Cryptodev performance tests

Unfortunately, it is a known linux issue that crypto-dev framework and other framework are not optimized for HW engines.

The result is better performance in Full SW than with HW accelerated framework

for the crypto functions that require many cyclic operation on small size data (linked to key

size):

- Linux framework is using work queues that will extend scheduling usage

- dma use will not help (more time to configure than copy)

This is the same issue for any vendor.

Regarding that, I advise you to use the SW implementation instead of HW acceleration, if it is possible in your project.

One more question:

In the first part of your post, you are talking about the fact that IPsec get stuck when stm32-cryp.ko is loaded.

But during your tests of performance in the second part, you are doing a

insmod stm32-cryp.ko

And the crypto seems to work. Did you disable your IPsec for this test?

Regards,

Kevin

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
dimax
Senior

Yes IPsec was disabled during testing.

I went on and made the same testing on old x86 machine. Here I get about 5 times improvement with HW acceleration.

How can you explain that?

→ ./speed

Testing AES-128-CBC cipher:

Encrypting in chunks of 65536 bytes: done. 6.13 GB in 5.00 secs: 1.23 GB/sec

→ sudo rmmod aesni_intel

→ ./speed

Testing AES-128-CBC cipher:

Encrypting in chunks of 65536 bytes: done. 1.23 GB in 5.00 secs: 0.25 GB/sec

dimax
Senior

And here are testing results with NXP part that show up to 100 times performance increase wit ha use of HW acceleration:

https://community.nxp.com/t5/Layerscape/Openvpn-is-not-working-with-IOT-GW-LS1021a-hardware-encryption/td-p/406332

Bernard PUEL
ST Employee

Hello,

ST policy about Linux is to rely on Linux framework + HW driver adaptations to these framework to better upstream an overall solution maintained by the community (and therefore giving more sustainability of ST solution for customers).

Doing this for Cryptodev framework shows the results you highlight here: for some cases, performances are very low compared to full cpu and with the same cpu load ....

A study is ongoing to check what could be done to improve this (depending on HW capability and SW adaptation required on top of HW drivers) but there is no short term solution.