A guide to the HAL of the AES accelerator, or how to fix it

Dominik Lorych · ‎2022-01-18

Hi,

I am writing this both as a guide to other people struggling with their STM32 AES accelerator, and as a suggestion to STM how to improve their HAL so it is easier to use. My testing and debugging was done on a STM32WB, but i think other accelerators on STM32 chips have similar properties. I used a STM32L4A6 before, and its accelerator seemed to have the same properties.

First, these are the badly documented properties of the HAL you might have problems with:

1:

When setting the DataWidthUnit field to use byte buffers, this only changes the processing of the data itself. Key and IV are still expected as word buffers.

2:

The accelerator is only able to process data that has a size of a multiple of 4 bytes. The HAL will not throw any errors when passing an invalid data size, but the result will not be a valid AES result.

3:

When using an AES mode using any initial vector, it might seem like the HAL always expects 4 words / 16 bytes as IV, but this is not true. Actually, it expects 3 words / 12 bytes, and the last word always must be set according to the reference manual. ~~For AES-CTR it must be set to 0x00000001~~, for AES-GCM it must be set to 0x00000002. So the example code generated by CubeMX (with key and IV set to 0) will never produce valid results, because it breaks the assumption of the HAL that the last word of the IV is set properly.

Update for 3: AES-CTR actually seems to work with 128 bits IV. I previously only had tested AES-GCM, and found that the specification is correct there. So I assumed that the specification is also correct for AES-CTR, but that does not seem to be the case. For details, see the comments below.

Now to the improvements to the HAL that could be made here:

In general I think it should not be necessary to read the chip's reference manual to use HAL code. The HAL code should be documented well enough that the reference manual is not necessary. Also, when the HAL does not throw any error, then no user will expect that the result from the HAL is actually invalid.

1:

In general I am fine with this property of the HAL, but it could be documented more clearly. However, adding code to convert from byte buffers to word buffers, when writing the buffers into the registers, would highly improve usability because key and IV are usually stored as byte buffers. This code also should be relatively small, and at least smaller than converting the data to word buffers before calling the HAL.

2:

Here the minimal solution would be to at least document in the HAL that sizes are expected to be multiples of 4 bytes, for example in the documentation of the HAL_CRYP_Encrypt function. But checking the input data size also should be implemented, as no one will expect to receive invalid data when the HAL says HAL_OK.

3:

When only three words of the IV can actually be used, then why do you let the user supply 4 words? Also, why is CubeMX generating example code which will never generate valid results? I would suggest shortening the IV to 3 words. Then the user can set those 3 words, and the HAL would fill in the correct value for the last word. But again, at least document the assumptions of the HAL.

TL;DR:

The HAL of the AES accelerator has multiple badly documented or undocumented properties. It should not be necessary to read the reference manual to know how to use HAL code, so please, STM, improve the HAL. To STM, feel free to contact me for more details.

Jocelyn RICARD · ‎2022-01-20

Hello Dominik,

thank you for your constructive feedback.

I transmitted it internally for action.

Best regards

Jocelyn

Pavel A. · ‎2022-01-20

@Jocelyn RICARD And another small thing by the occasion -

the size arg of HAL_CRYP_Encrypt, HAL_CRYP_Decrypt is 16-bit. This makes impossible to encrypt/decrypt more than 64K at once. IMO this should be documented, or just change the arg to 32 bit and remove the limitation.

https://github.com/STMicroelectronics/stm32wbxx_hal_driver/blob/2c5f06638be516c1b772f768456ba637f077bac8/Src/stm32wbxx_hal_cryp.c#L1203

MSaez.1 · ‎2022-07-04

Hello Dominik,

this is actually pretty helpful. I am struggling since quite some days with the AED HW acceleration and I agree that is very hard to use o bad documented. The STM32 FW implementation of the Cryptographic Library is regretfully not much better.

I have one question regarding the IV registers. In the NIST Special Publication 800-38A Standard for CTR-AED256.Encrypt the initialization counter is 128 bits long (f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff).

Does this mean that this example cannot be reproduced at all with the STM32 AES HW accelerator?

Key 603deb1015ca71be2b73aef0857d7781 1f352c073b6108d72d9810a30914dff4

Init. Counter f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff

Block #1

Input Block f0f1f2f3f4f5f6f7f8f9fafbfcfdfeff

Output Block 0bdf7df1591716335e9a8b15c860c502

Plaintext 6bc1bee22e409f96e93d7e117393172a

Ciphertext 601ec313775789a5b7a7f504bbf3d228

Thank you and best regards,

Manuel

Dominik Lorych · ‎2022-07-04

Hi Manuel,

there is an AES-CTR example in the AES_Modes example project for the WB55 that uses 128 bits IV. There they use the key and IV from the NIST example, but NOT the plaintext and ciphertext from the example. (Usually these examples implement the NIST examples fully). I just checked whether it works when I exchange the plaintext and ciphertext for the ones from the NIST example, and it worked when I switched the dataWidthUnit to 32B. So it is indeed possible to use 128 bits as IV, the WB55 specification there is wrong.

I previously had tested AES-GCM and found that the specification there is correct, there you actually cannot pass more than 96 bits. But AES-GCM also works in a different way internally. MbedTLS appends 0x00000001 as counter value to 96 bits IV (making it 128 bits), while the accelerator expects the 96 bit IV appended with 0x00000002. These then give the same results (although the counter values are different!). Other lengths of IV will not work because MbedTLS will apply processing to them to make them 128 bits long, while the accelerator just assumes that the IV is 96 bits long.

I assumed that the specification would also be correct for AES-CTR as it was correct for AES-GCM, but seemingly that is not the case.

Best regards,

Dominik

MSaez.1 · ‎2022-07-04

Thank you very much for your response. I finally found the source confusion.

In the documentation is commented that the IVR register needs to be initialized with the B0 vector comented in the NIST example, but this is not completely true.

In the example 2 from the standard the B0 vector is defined as:

56101112 13141516 17000000 00000010

(defined following A.2.1 Formatting of the Control Information and the Nonce)

While for the encryption phase in CTR mode the standard defines one vector (Ctr1) for the inicialization vector:

06101112 13141516 17000000 00000001

(defined following A.3 Formatting of the Counter Blocks)

Trying the encryption phase in python I was able to find out the correct configuration of the IV (Ctr1). Now the encryption phase of the CCM is working fine for me and I get the same results than the standard examples. This could be a recomendation to improve the documentation of the reference manual.

I guess now I have to change the IV to the B0 vector defined in the standard and perform the CBC phase to calculate the MAC. This is the next step to try for me.

Thank you again for your support.