on
2026-05-13
1:30 AM
- edited on
2026-05-13
7:01 AM
by
Laurids_PETERSE
The STM32H5 microcontroller family integrates hardware HASH peripheral capable of computing MD5, SHA-1, SHA-224, and SHA-256 digests, which can be efficiently supplied with data using DMA.
In many real-world applications, input data is naturally handled as a byte stream (for example, network packets, encrypted payloads, or TLV structures). A common approach is to configure the DMA for byte transfers, using a byte-wide source and the HASH_DIN register as a 32-bit word destination. However, if the DMA data width and HASH data packing are not configured correctly, the transfer may still complete successfully, but the resulting digest can be incorrect.
In particular:
This article explains the root cause of this behavior and provides guidance on how to configure DMA and HASH data packing correctly on STM32H5 to ensure valid digest computation.
A typical configuration on STM32H5 is:
From the application perspective, this approach is attractive because:
A common (but incorrect) way to use DMA in this context is to rely on the fact that the message is a sequence of bytes. Additionally, assuming that the DMA + HASH combination implicitly packs these bytes into 32‑bit words as needed by the HASH engine. In practice, the CPU path uses a single 32-bit write, while the DMA path generates multiple 32-bit writes from bytes. As a result, they do not produce the same input word sequence for the HASH core.
The figure below (CPU vs DMA transfer) illustrates this behavior using a simple 4‑byte input buffer [B3 B2 B1 B0]:
Even though the original byte values are the same and the DMA transfer completes without error, the mismatch in byte‑to‑word packing leads to h₁ ≠ h₂
Under this kind of configuration, the HASH transfer:
This is particularly confusing because:
However, the HASH engine is not receiving the same sequence of 32‑bit input words in the two cases, hence the different digest values.
Internally, the HASH core processes the message as a sequence of 32‑bit words. Conceptually, for a 4‑byte chunk (B3, B2, B1, B0) of the message, the core expects a single 32‑bit word (for example, [B3 B2 B1 B0], depending on endianness and word-swapping configuration).
On STM32H5:
When data is written by the CPU as a single 32‑bit word, the relationship between the four bytes and the resulting word is explicit and well‑defined. In contrast, DMA may process the message as bytes without a consistent packing strategy. In that case, each byte can end up as a separate 32-bit word written to HASH_DIN.
The digest difference comes from the fact that the HASH engine sees two different messages:
Although the original input buffer [B3 B2 B1 B0] is identical in both cases, the sequence of 32‑bit words processed by the HASH core is not. Since the HASH algorithm operates on these words, not directly on the conceptual byte stream, the resulting digest h₂ is different from h₁.
In other words:
The next sections of this article shows how to configure the data path so that:
The root cause of the issue is that the CPU path and the DMA path do not present the same 32‑bit words to the HASH peripheral:
To solve this, we must ensure that the GPDMA packs the bytes into 32‑bit words in the same way a CPU 32‑bit write would. On STM32H5, this is done with the Programmed Data Handling feature, controlled by the PAM, SB, DB and DH bits of GPDMA_CxTR1 (see RM0481, “Programmed data handling”).
The idea is:
Before configuring the DMA, you must decide what you want the HASH engine to see, in terms of byte order inside each 32‑bit word.
For a 4‑byte group B3, B2, B1, B0, typical options are:
Depending on:
This convention must be:
Once clarified, consult the Programmed data handling table in the DMA section of your microcontroller’s Reference Manual. This helps you to determine the combination of SDW_LOG2, DDW_LOG2, PAM[1:0], SB, DB, and DH needed to obtain the required destination data stream.
For the typical case discussed in this article:
The “Programmed data handling” table gives, for each combination of:
The destination data stream may therefore appear as [B7, B6, B5, B4, B3, B2, B1, B0] or as [B3, B2, B1, B0], depending on the use case.
Instructionally:
Conceptual example (to be aligned with your actual convention):
In code, after HAL_DMA_Init() you typically do something like:
/* Positions and masks from RM / device headers */
#define GPDMA_CTR1_PAM_Pos 11U
#define GPDMA_CTR1_PAM_Msk (0x3U << GPDMA_CTR1_PAM_Pos)
/* Example encoding – adapt to device headers */
#define GPDMA_PAM_PACK (0x2U << GPDMA_CTR1_PAM_Pos) // 10: PACK
/* Configure PAM in GPDMA_CxTR1in HAL_DMA_Init() */
static void HAL_DMA_Init() (void)
{
__HAL_RCC_HASH_CLK_ENABLE();
/* HASH DMA Init */
/* GPDMA1_REQUEST_HASH_IN Init */
handle_GPDMA1_Channel0.Instance = GPDMA1_Channel0;
handle_GPDMA1_Channel0.Init.Request = GPDMA1_REQUEST_HASH_IN;
handle_GPDMA1_Channel0.Init.BlkHWRequest = DMA_BREQ_SINGLE_BURST;
handle_GPDMA1_Channel0.Init.Direction = DMA_MEMORY_TO_PERIPH;//DMA_MEMORY_TO_MEMORY;//
handle_GPDMA1_Channel0.Init.SrcInc = DMA_SINC_INCREMENTED;
handle_GPDMA1_Channel0.Init.DestInc = DMA_DINC_FIXED;
handle_GPDMA1_Channel0.Init.SrcDataWidth = DMA_SRC_DATAWIDTH_BYTE;//DMA_SRC_DATAWIDTH_WORD;//
handle_GPDMA1_Channel0.Init.DestDataWidth = DMA_DEST_DATAWIDTH_WORD;//DMA_DEST_DATAWIDTH_BYTE;//
handle_GPDMA1_Channel0.Init.Priority = DMA_LOW_PRIORITY_LOW_WEIGHT;
handle_GPDMA1_Channel0.Init.SrcBurstLength = 4; //1;//4;
handle_GPDMA1_Channel0.Init.DestBurstLength =4;// 1;//4;
handle_GPDMA1_Channel0.Init.TransferAllocatedPort = DMA_SRC_ALLOCATED_PORT0|DMA_DEST_ALLOCATED_PORT0;
handle_GPDMA1_Channel0.Init.TransferEventMode = DMA_TCEM_BLOCK_TRANSFER;
handle_GPDMA1_Channel0.Init.Mode = DMA_NORMAL;
if (HAL_DMA_Init(&handle_GPDMA1_Channel0) != HAL_OK)
{
Error_Handler();
}
// Read the current value (if you want to preserve the other bits)
uint32_t reg = GPDMA1_Channel0->CTR1;
// Clear only PAM[1:0]
reg &= ~GPDMA_CTR1_PAM_Msk;
// Set PAM = PACK (byte -> packed into word)
reg |= GPDMA_PAM_PACK;
// Write back to the register
GPDMA1_Channel0->CTR1 = reg;
__HAL_LINKDMA(hhash, hdmain, handle_GPDMA1_Channel0);
if (HAL_DMA_ConfigChannelAttributes(&handle_GPDMA1_Channel0, DMA_CHANNEL_NPRIV) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN HASH_MspInit 1 */
/* USER CODE END HASH_MspInit 1 */
}
If you prefer using CubeMX, the same configuration is expressed in the Data Handling section of the GPDMA channel (as shown below):
Cube MX then generates:
void HAL_HASH_MspInit (HASH_HandleTypeDef* hhash)
{
DMA_DataHandlingConfTypeDef DataHandlingConfig;
/* USER CODE BEGIN HASH_MspInit 0 */
/* USER CODE END HASH_MspInit 0 */
/* Peripheral clock enable */
__HAL_RCC_HASH_CLK_ENABLE();
/* HASH DMA Init */
/* GPDMA1_REQUEST_HASH_IN Init */
handle_GPDMA1_Channel0.Instance = GPDMA1_Channel0;
handle_GPDMA1_Channel0.Init.Request = GPDMA1_REQUEST_HASH_IN;
handle_GPDMA1_Channel0.Init.BlkHWRequest = DMA_BREQ_SINGLE_BURST;
handle_GPDMA1_Channel0.Init.Direction = DMA_MEMORY_TO_PERIPH;
handle_GPDMA1_Channel0.Init.SrcInc = DMA_SINC_INCREMENTED;
handle_GPDMA1_Channel0.Init.DestInc = DMA_DINC_FIXED;
handle_GPDMA1_Channel0.Init.SrcDataWidth = DMA_SRC_DATAWIDTH_BYTE;
handle_GPDMA1_Channel0.Init.DestDataWidth = DMA_DEST_DATAWIDTH_WORD;
handle_GPDMA1_Channel0.Init.Priority = DMA_LOW_PRIORITY_LOW_WEIGHT;
handle_GPDMA1_Channel0.Init.SrcBurstLength = 4;
handle_GPDMA1_Channel0.Init.DestBurstLength = 4;
handle_GPDMA1_Channel0.Init.TransferAllocatedPort = DMA_SRC_ALLOCATED_PORT0|DMA_DEST_ALLOCATED_PORT0;
handle_GPDMA1_Channel0.Init.TransferEventMode = DMA_TCEM_BLOCK_TRANSFER;
handle_GPDMA1_Channel0.Init.Mode = DMA_NORMAL;
if (HAL_DMA_Init(&handle_GPDMA1_Channel0) != HAL_OK)
{
Error_Handler();
}
DataHandlingConfig.DataExchange = DMA_EXCHANGE_NONE;
DataHandlingConfig.DataAlignment = DMA_DATA_PACK;
if (HAL_DMAEx_ConfigDataHandling(&handle_GPDMA1_Channel0, &DataHandlingConfig) != HAL_OK)
{
Error_Handler();
}
__HAL_LINKDMA(hhash, hdmain, handle_GPDMA1_Channel0);
if (HAL_DMA_ConfigChannelAttributes(&handle_GPDMA1_Channel0, DMA_CHANNEL_NPRIV) != HAL_OK)
{
Error_Handler();
}
/* USER CODE BEGIN HASH_MspInit 1 */
/* USER CODE END HASH_MspInit 1 */
}
Key point: with PACK correctly configured, the DMA no longer sends one full 32‑bit word per byte, but packs several bytes into each destination word exactly as described by the table.
Once the GPDMA is packing bytes correctly, the HASH must interpret the 32‑bit words in the same byte order you decided in step 1.
This is controlled by the HASH data format configuration (for example HASH_DATATYPE and any word/byte swap options).
From an instructional perspective:
The goal is that, for a given message:
To summarize what must be respected to fix the issue:
By explicitly using GPDMA programmed data handling, you can control packing, not just alignment.
This ensures that the HASH peripheral receives the correct input word stream and resolves the digest mismatch problem.