2025-09-25 5:26 AM
Hello everyone,
I am working on the STM32U5G9ZJT6Q MCU and am trying to use the JPEG Encoder by feeding him data through a GPDMA linkedlist mode from RAM. The data to encode is an image of 640*480 8-bit grayscale image (the image is stored in a .c file ; for jpeg, i use tables 0, no chrominance in sight).
I am not using the HAL methods for jpeg/dma, eventhough the JPEG_DMA_StartProcess inside HAL_JPEG_Encode_DMA in stm32u5xx_hal_jpeg.h function seems to support dma linkedlist transfers.
I ran the JPEG_EncodingFromFLASH_DMA example for stm32u5g9 discovery kit, without issue, but having to work with normal dma transfers and having to handle the input/output in interrupts seems a bit off to me. I would like to use the linkedlist for simpler automation.
I configured the jpeg_rx (input) and jpeg_tx (output) dma transfers as linkedlists. The jpeg output works fine, the nodes are correctly updated and stream data as wanted.
However, the jpeg input dma stalls after the first node, eventhough the next node is correctly updated (not a LLR or SAR issue as it links to the right addresses).
Here is are the values for the number of dma transfers when the program stalls.
From my understanding, each jpeg_rx request allow to transfer 4 words (16 bytes) to the IFIFO (8-words long) half empty.
I set my transfer length to a multiple of 4 words (in my case GPDMA->BR1.BNDT = 9600) to avoid having a request leaving the IFIFO under the threshold, preventing any other request. Still, it doesn't work so there's definitely something I haven't understood (reason could be totally unrelated).
I tried changing the granularity of transfers completion, playing with the order of registers initialization as well as their values, enabling/disabling JPEG Header generation, I balanced the bandwidth on each ports (at least I tried all the port configuration without noticeable changes so I don't think the issue is from the different port allocations except for GPDMA.TR2.DREQ) as well as other configurations.
Why would the input DMA stall after the first node if it's not the case for the output? Do I need to manually rearm the GPDMA in interrupt for each transfer (thus having to use standard DMA)?
Note: The quantization tables are derived from the JPEG standard, the results with MX_JPEG_Init are the same as me for the basic config used for now so I didn't include the functions.
Here is my code draft (don't mind the missing functions/headers, I copy pasted the essential).
==== cfg.h ====
#include "stm32u5g9xx.h"
struct cfg_encoder_s {
JPEG_TypeDef *jpeg;
struct jpeg_dma_s {
DMA_Channel_TypeDef *gpdma;
uint32_t *nodes;
uint32_t mem_addr; // remove in the future
} in, out;
uint32_t length_encoded;
};
extern struct cfg_encoder_s enc;
void cfg_gpdma_jpeg_in(struct cfg_encoder_s *enc, uint32_t *src_buf);
void cfg_gpdma_jpeg_out(struct cfg_encoder_s *enc, uint32_t *dst_buf);
void cfg_gpdma_jpeg_in_nodes(struct cfg_encoder_s *enc);
void cfg_gpdma_jpeg_out_nodes(struct cfg_encoder_s *enc);
void cfg_jpeg(struct cfg_encoder_s *enc, uint8_t quality);
void jpeg_start(struct cfg_encoder_s *enc);
void jpeg_end(struct cfg_encoder_s *enc);
#define BURST_LENGTH 8
#define BURST_LENGTH_IN BURST_LENGTH
#define BURST_LENGTH_OUT BURST_LENGTH
#define RES_X 640 // has to be multiple of size MCU (8 or 16, 8 here)
#define RES_Y 480 // has to be multiple of size MCU (8 or 16, 8 here)
#define RES (RES_Y * RES_X)
#define SIZE_MCU_X 8
#define SIZE_MCU_Y 8
#define N_MCU_X (RES_X / SIZE_MCU_X)
#define N_MCU_Y (RES_Y / SIZE_MCU_Y)
#define N_MCU_TOTAL (N_MCU_X * N_MCU_Y)
#define L_BUF_IN (RES / 1)
#define N_SUB_BUF_IN 8
#define L_SUB_BUF_IN (L_BUF_IN / N_SUB_BUF_IN)
#define L_NODE_IN L_SUB_BUF_IN
#define N_NODE_IN N_SUB_BUF_IN
#define L_BUF_OUT (RES / 1)
#define N_SUB_BUF_OUT 8
#define L_SUB_BUF_OUT (L_BUF_OUT / N_SUB_BUF_OUT)
#define L_NODE_OUT L_SUB_BUF_OUT / 4
#define N_NODE_OUT N_SUB_BUF_OUT
==== cfg.c ====
#include "cfg.h"
struct cfg_encoder_s enc = {0};
static uint32_t cfg_nodes_in[N_NODE_IN][8] __attribute__((aligned(4))) = {0};
static uint32_t cfg_nodes_out[N_NODE_OUT][8] __attribute__((aligned(4))) = {0};
void cfg_gpdma_jpeg_in(struct cfg_encoder_s *enc, uint32_t *src_buf) {
uint32_t ccr, dar, sar, tr1, tr2, llr, la;
const uint32_t br1 = L_NODE_IN & 0xffff; // fixed
enc->in.gpdma->CCR &= ~(1 << DMA_CCR_EN_Pos); // disable dma during config
enc->in.gpdma->CCR |= (1 << DMA_CCR_RESET_Pos); // reset gpdma channel
ccr = enc->in.gpdma->CCR & ~(0xc37f07); // reserved bit mask
ccr |= (0b10 << DMA_CCR_PRIO_Pos); /* prio */
ccr |= (1 << DMA_CCR_LAP_Pos); /* linkedlist allocated port */
ccr |= (0 << DMA_CCR_LSM_Pos); /* link step mode */ // channel completed at last LLI (0) or ONCE for the current LLI (1)
ccr |= (0 << DMA_CCR_TOIE_Pos); /* trigger ovr ITE */
ccr |= (0 << DMA_CCR_SUSPIE_Pos); /* cplt susp ITE */
ccr |= (1 << DMA_CCR_USEIE_Pos); /* user setting error ITE */
ccr |= (1 << DMA_CCR_ULEIE_Pos); /* update link trsfer error ITE */
ccr |= (1 << DMA_CCR_DTEIE_Pos); /* data trfr error ITE */
ccr |= (0 << DMA_CCR_HTIE_Pos); /* HT cplt ITE */
ccr |= (1 << DMA_CCR_TCIE_Pos); /* T cplt ITE */
ccr |= (0 << DMA_CCR_SUSP_Pos); /* susp */
ccr |= (0 << DMA_CCR_RESET_Pos); /* reset */
enc->in.gpdma->CCR = ccr;
tr1 = enc->in.gpdma->CTR1 & ~(0xcffbfbfb); // reserved bit mask
tr1 |= (0 << DMA_CTR1_DSEC_Pos); /* dest sec */
tr1 |= (0 << DMA_CTR1_DAP_Pos); /* dest port */
tr1 |= (0 << DMA_CTR1_DHX_Pos); /* dest half word exch */
tr1 |= (0 << DMA_CTR1_DBX_Pos); /* dest byte exch */
tr1 |= ((BURST_LENGTH_IN-1) << DMA_CTR1_DBL_1_Pos); /* dest burst size of length (value-1) */
tr1 |= (0 << DMA_CTR1_DINC_Pos); /* dest inc */
tr1 |= (0b10 << DMA_CTR1_DDW_LOG2_Pos); /* dst data width of burst */
tr1 |= (0 << DMA_CTR1_SSEC_Pos); /* src sec */
tr1 |= (1 << DMA_CTR1_SAP_Pos); /* src port */
tr1 |= (0 << DMA_CTR1_SBX_Pos); /* src byte exch */
tr1 |= (0 << DMA_CTR1_PAM_Pos); /* align/padding */
tr1 |= ((BURST_LENGTH_IN-1) << DMA_CTR1_SBL_1_Pos); /* src burst size of length */
tr1 |= (1 << DMA_CTR1_SINC_Pos); /* src inc */
tr1 |= (0b10 << DMA_CTR1_SDW_LOG2_Pos); /* src data width burst */
tr2 = enc->in.gpdma->CTR2 & ~(0xc37fce7f); // reserved bit mask
tr2 |= (0b00 << DMA_CTR2_TCEM_Pos); /* trigger cplt transfer event mode */
tr2 |= (0b00 << DMA_CTR2_TRIGPOL_Pos); /* trigger event polarity */
tr2 |= (0 << DMA_CTR2_TRIGSEL_Pos); /* trigger event selection (no) */
tr2 |= (0 << DMA_CTR2_TRIGM_Pos); /* trigger mode */
tr2 |= (1 << DMA_CTR2_BREQ_Pos); /* block hw req (burst level ?) */
tr2 |= (1 << DMA_CTR2_DREQ_Pos); /* dest hw req (by a periph?) */ // 1: dest req hw, 0: src """"
tr2 |= (0 << DMA_CTR2_SWREQ_Pos); /* software req */
tr2 |= (124 << DMA_CTR2_REQSEL_Pos); /* gpdma hw req selection (jpeg_rx) */
dar = (uint32_t) &enc->jpeg->DIR; // fixed
enc->in.mem_addr = (uint32_t) &src_buf[0];
sar = enc->in.mem_addr; // to be updated
// Link first node to register for init
enc->in.nodes = &cfg_nodes_in[0][0];
la = ((uint32_t) enc->in.nodes) & 0xfffc;
llr = la | (1 << DMA_CLLR_ULL_Pos) | (0b11111 << DMA_CLLR_UDA_Pos); // (UT1-UT2-UB1-USA-UDA) update fields, LBA->LLR
enc->in.gpdma->CLLR = llr;
enc->in.gpdma->CLBAR = ((uint32_t) enc->in.nodes) & 0xffff0000;
// Link 2d node to the first one for update
la = ((uint32_t) &cfg_nodes_in[1][0]) & 0xfffc;
llr = la | (1 << DMA_CLLR_ULL_Pos) | (0b00010 << DMA_CLLR_UDA_Pos);
// init first node with full parameters
// tr1, tr2, br1, sar, dar, llr
cfg_nodes_in[0][0] = tr1;
cfg_nodes_in[0][1] = tr2;
cfg_nodes_in[0][2] = br1;
cfg_nodes_in[0][3] = sar;
cfg_nodes_in[0][4] = dar;
cfg_nodes_in[0][5] = llr;
return;
}
void cfg_gpdma_jpeg_out(struct cfg_encoder_s *enc, uint32_t *dst_buf) {
uint32_t ccr, dar, sar, tr1, tr2, llr, la;
const uint32_t br1 = L_NODE_OUT & 0xffff; // fixed
enc->out.gpdma->CCR &= ~(1 << DMA_CCR_EN_Pos); // disable dma during config
enc->out.gpdma->CCR |= (1 << DMA_CCR_RESET_Pos); // reset gpdma channel
ccr = enc->out.gpdma->CCR & ~(0xc37f07);
ccr |= (0b10 << DMA_CCR_PRIO_Pos); /* prio */
ccr |= (1 << DMA_CCR_LAP_Pos); /* linkedlist allocated port */
ccr |= (0 << DMA_CCR_LSM_Pos); /* link step mode */ // channel completed at last LLI (0) or ONCE for the current LLI (1)
ccr |= (0 << DMA_CCR_TOIE_Pos); /* trigger ovr ITE */
ccr |= (0 << DMA_CCR_SUSPIE_Pos); /* cplt susp ITE */
ccr |= (1 << DMA_CCR_USEIE_Pos); /* user setting error ITE */
ccr |= (1 << DMA_CCR_ULEIE_Pos); /* update link trsfer error ITE */
ccr |= (1 << DMA_CCR_DTEIE_Pos); /* data trfr error ITE */
ccr |= (0 << DMA_CCR_HTIE_Pos); /* HT cplt ITE */
ccr |= (1 << DMA_CCR_TCIE_Pos); /* T cplt ITE */
ccr |= (0 << DMA_CCR_SUSP_Pos); /* susp */
ccr |= (0 << DMA_CCR_RESET_Pos); /* reset */
enc->out.gpdma->CCR = ccr;
tr1 = enc->out.gpdma->CTR1 & ~(0xcffbfbfb);
tr1 |= (0 << DMA_CTR1_DSEC_Pos); /* dest sec */
tr1 |= (1 << DMA_CTR1_DAP_Pos); /* dest port */
tr1 |= (0 << DMA_CTR1_DHX_Pos); /* dest half word exch ??*/
tr1 |= (0 << DMA_CTR1_DBX_Pos); /* dest byte exch ??*/
tr1 |= ((BURST_LENGTH_OUT-1) << DMA_CTR1_DBL_1_Pos); /* dest burst size of length (value-1) */
tr1 |= (1 << DMA_CTR1_DINC_Pos); /* dest inc */
tr1 |= (0b10 << DMA_CTR1_DDW_LOG2_Pos); /* dst data width of burst */
tr1 |= (0 << DMA_CTR1_SSEC_Pos); /* src sec */
tr1 |= (0 << DMA_CTR1_SAP_Pos); /* src port */
tr1 |= (0 << DMA_CTR1_SBX_Pos); /* src byte exch ?? */
tr1 |= (0 << DMA_CTR1_PAM_Pos); /* align/padding (osef if burst_size = data_size */
tr1 |= ((BURST_LENGTH_OUT-1) << DMA_CTR1_SBL_1_Pos); /* src burst size of length */
tr1 |= (0 << DMA_CTR1_SINC_Pos); /* src inc */
tr1 |= (0b10 << DMA_CTR1_SDW_LOG2_Pos); /* src data width burst */
tr2 = enc->out.gpdma->CTR2 & ~(0xc37fce7f);
tr2 = (0b00 << DMA_CTR2_TCEM_Pos); /* trigger cplt transfer event mode */
tr2 |= (0b00 << DMA_CTR2_TRIGPOL_Pos); /* trigger event polarity */
tr2 |= (0 << DMA_CTR2_TRIGSEL_Pos); /* trigger event selection (no) */
tr2 |= (0 << DMA_CTR2_TRIGM_Pos); /* trigger mode */
tr2 |= (1 << DMA_CTR2_BREQ_Pos); /* block hw req (burst level ?) */
tr2 |= (0 << DMA_CTR2_DREQ_Pos); /* dest hw req (by a periph?) */ // 1: dest req hw, 0: src """"
tr2 |= (0 << DMA_CTR2_SWREQ_Pos); /* software req */
tr2 |= (125 << DMA_CTR2_REQSEL_Pos); /* gpdma hw req selection (jpeg_tx) */
sar = (uint32_t) &enc->jpeg->DOR; // fixed
enc->out.mem_addr = (uint32_t) &dst_buf[0];
dar = enc->out.mem_addr; // to be updated
enc->out.nodes = &cfg_nodes_out[0][0];
la = ((uint32_t) enc->out.nodes) & 0xfffc;
llr = la | (1 << DMA_CLLR_ULL_Pos) | (0b11111 << DMA_CLLR_UDA_Pos); // (UT1-UT2-UB1-USA-UDA) update fields, LBA->LLR
// Link first node to register for init
enc->out.gpdma->CLLR = llr;
enc->out.gpdma->CLBAR = ((uint32_t) enc->out.nodes) & 0xffff0000;
// Link 2d node to the first one for update
la = ((uint32_t) &cfg_nodes_out[1][0]) & 0xfffc;
llr = la | (1 << DMA_CLLR_ULL_Pos) | (0b00001 << DMA_CLLR_UDA_Pos);
// init first node with full parameters
// tr1, tr2, br1, sar, dar, llr
cfg_nodes_out[0][0] = tr1;
cfg_nodes_out[0][1] = tr2;
cfg_nodes_out[0][2] = br1;
cfg_nodes_out[0][3] = sar;
cfg_nodes_out[0][4] = dar;
cfg_nodes_out[0][5] = llr;
return;
}
void cfg_jpeg(struct cfg_encoder_s *enc, uint8_t quality) {
enc->length_encoded = 0;
uint32_t tmp;
tmp = enc->jpeg->CR & ~(0x787f);
tmp |= (1 << JPEG_CR_OFF_Pos); /* flush output fifo */
tmp |= (1 << JPEG_CR_IFF_Pos); /* flush input fifo */
tmp |= (0 << JPEG_CR_ODMAEN_Pos); /* en out dma */ // disable dma for cfg
tmp |= (0 << JPEG_CR_IDMAEN_Pos); /* en in dma */
tmp |= (0 << JPEG_CR_HPDIE_Pos); /* en header parsing done interrupt */
tmp |= (1 << JPEG_CR_EOCIE_Pos); /* en end of conversion interrupt */
tmp |= (0 << JPEG_CR_OFNEIE_Pos); /* en output fifo not empty interrupt */
tmp |= (0 << JPEG_CR_OFTIE_Pos); /* en output fifo threshold interrupt */
tmp |= (0 << JPEG_CR_IFNFIE_Pos); /* en input fifo not empty interrupt */
tmp |= (0 << JPEG_CR_IFTIE_Pos); /* en input fifo not full interrupt */
tmp |= (1 << JPEG_CR_JCEN_Pos); /* en jpeg core */
enc->jpeg->CR = tmp;
enc->jpeg->CONFR0 &= ~(1 << JPEG_CONFR0_START_Pos);
tmp = enc->jpeg->CONFR1 & ~(0xffff01fb);
tmp |= (RES_Y << JPEG_CONFR1_YSIZE_Pos); /* YSIZE nb lines in src img */
tmp |= (0 << JPEG_CONFR1_HDR_Pos); /* enable header gen/parsing */
tmp |= (0b00 << JPEG_CONFR1_NS_Pos); /* reg = [nb compo. - 1] for scan hdr */
tmp |= (0b00 << JPEG_CONFR1_COLORSPACE_Pos); /* color space */ // 00 for grayscale, 01 for YUV
tmp |= (0 << JPEG_CONFR1_DE_Pos); /* 0 encode mode, 1 decode */
tmp |= (0 << JPEG_CONFR1_NF_Pos); /* reg = [nb of color components - 1] */
enc->jpeg->CONFR1 = tmp;
// max 67108863 (1<<26)-1
tmp = enc->jpeg->CONFR2 & ~(0x3ffffff);
tmp |= ((N_MCU_TOTAL - 1) << JPEG_CONFR2_NMCU_Pos); // reg[0:25] = [nb mcu to encode - 1]
enc->jpeg->CONFR2 = tmp;
tmp = enc->jpeg->CONFR3 & ~(0xffff << 16);
tmp |= (RES_X << JPEG_CONFR3_XSIZE_Pos); // nb pixel per line (16 bit value)
enc->jpeg->CONFR3 = tmp;
// Y8 1 block of 8x8 data
tmp = enc->jpeg->CONFR4 & ~0xffff;
tmp |= (1 << JPEG_CONFR4_HSF_Pos); /* hori sampling factor */ // 1 -> no subsampling
tmp |= (1 << JPEG_CONFR4_VSF_Pos); /* vert sampling factor */
tmp |= (0 << JPEG_CONFR4_NB_Pos); /* nb of blocks to a particular color in the MCU (Y-1) */
tmp |= (0b00 << JPEG_CONFR4_QT_Pos); /* Quant table used */
tmp |= (0 << JPEG_CONFR4_HA_Pos); /* select ac huff table (0 or 1) */
tmp |= (0 << JPEG_CONFR4_HD_Pos); /* select dc huff table (0 or 1) */
enc->jpeg->CONFR4 = tmp;
// unused CONFRx, reset them
enc->jpeg->CONFR5 &= ~0xffff;
enc->jpeg->CONFR6 &= ~0xffff;
enc->jpeg->CONFR7 &= ~0xffff;
set_tables(enc, quality); // from ITU CCITT T.81 JPEG ISO/IEC 10918-1 guidelines
return;
}
static void init_node(uint32_t nodes[8], uint32_t config[8], uint8_t curr_idx, uint8_t n) {
for(int i = 0; i < n; i++)
nodes[i] = config[i];
return;
}
// destination buffer is fixed, jpeg->dir
void cfg_gpdma_jpeg_in_nodes(struct cfg_encoder_s *enc) {
const uint32_t br1 = cfg_nodes_in[0][2];
uint32_t sar;
uint32_t llr;
uint32_t la;
uint32_t top_idx;
for(top_idx = 1; top_idx < N_NODE_IN - 1; top_idx++) {
sar = enc->in.mem_addr + (((top_idx) * br1) % L_BUF_IN);
la = ((uint32_t) &cfg_nodes_in[top_idx+1][0]) & 0xfffc;
llr = la | (0b00010 << DMA_CLLR_UDA_Pos) | (1 << DMA_CLLR_ULL_Pos); // (UT1-UT2-UB1-USA-UDA) update fields order
init_node(cfg_nodes_in[top_idx], (uint32_t [8]) {sar, llr}, top_idx, 2); // only sar and llr updated
}
// last node + circular mode
sar = (uint32_t) enc->in.mem_addr;
la = ((uint32_t) &cfg_nodes_in[1][0]) & 0xfffc;
llr = la | (0b11111 << DMA_CLLR_UDA_Pos) | (1 << DMA_CLLR_ULL_Pos);
init_node(cfg_nodes_in[top_idx], (uint32_t [8]){sar, llr}, top_idx, 2);
return;
}
void cfg_gpdma_jpeg_out_nodes(struct cfg_encoder_s *enc) {
const uint32_t br1 = L_NODE_OUT & 0xffff;
uint32_t dar;
uint32_t llr;
uint32_t la;
uint8_t top_idx;
for(top_idx = 1; top_idx < N_NODE_OUT - 1; top_idx++) {
dar = (uint32_t) enc->out.mem_addr + (((top_idx) * br1) % L_BUF_OUT);
la = ((uint32_t) &cfg_nodes_out[top_idx+1][0]) & 0xfffc;
llr = la | (0b00001 << DMA_CLLR_UDA_Pos) | (1 << DMA_CLLR_ULL_Pos); // (UT1-UT2-UB1-USA-UDA) update fields
init_node(cfg_nodes_out[top_idx], (uint32_t [8]) {dar, llr}, top_idx, 2); // only dar and llr updated
}
// last node + circular mode
dar = (uint32_t) enc->out.mem_addr;
la = ((uint32_t) &cfg_nodes_out[1][0]) & 0xfffc;
llr = la | (0b11111 << DMA_CLLR_UDA_Pos) | (1 << DMA_CLLR_ULL_Pos);
init_node(cfg_nodes_out[top_idx], (uint32_t [8]){dar, llr}, top_idx, 2);
return;
}
void jpeg_start(struct cfg_encoder_s *enc) {
// move flush I/OFIFO from cfg here
enc->jpeg->CONFR0 |= (1 << JPEG_CONFR0_START_Pos);
enc->out.gpdma->CCR |= (1 << DMA_CCR_EN_Pos);
enc->in.gpdma->CCR |= (1 << DMA_CCR_EN_Pos);
enc->jpeg->CR |= (1 << JPEG_CR_ODMAEN_Pos) | (1 << JPEG_CR_IDMAEN_Pos);
return;
}
//supposed to be put in EOC as a callback function
void jpeg_end(struct cfg_encoder_s *enc) {
if(!(enc->jpeg->SR & (1 << JPEG_SR_OFNEF_Pos)))
return;
// suspend transfer when fifo dma empty, wait for it to be suspended, then manually extract data from fifo gpdma
while((enc->out.gpdma->CSR & (0b11111111 << DMA_CSR_FIFOL_Pos)) != 0); // wait until there is no data left in fifo
enc->out.gpdma->CCR |= (1 << DMA_CCR_SUSP_Pos); // suspend
while(enc->out.gpdma->CSR & (1 << DMA_CSR_SUSPF_Pos)); // wait until suspend
enc->length_encoded += L_NODE_OUT - (enc->out.gpdma->CBR1 & 0xffff); // sub to the remaining item transfers in BNDT
while((enc->jpeg->SR & (1 << JPEG_SR_OFNEF_Pos)) != 0) { // 4 bytes=1 word
*(uint32_t *) (enc->out.mem_addr + enc->length_encoded % L_BUF_OUT) = enc->jpeg->DOR;
enc->length_encoded += 4;
}
enc->jpeg->CR |= (1 << JPEG_CR_OFF_Pos) | (1 << JPEG_CR_IFF_Pos);
enc->in.gpdma->CCR |= (1 << DMA_CCR_RESET_Pos);
enc->out.gpdma->CCR |= (1 << DMA_CCR_RESET_Pos);
}
==== main.h ====
#include "img.h"
#include "cfg.h"
#include "stm32u5xx_hal.h"
#define JPEG_IN_DMA_IRQn ((IRQn_Type) GPDMA1_Channel0_IRQn)
#define JPEG_IN_DMA_CH ((DMA_Channel_TypeDef *) GPDMA1_Channel0_BASE_NS)
#define JPEG_OUT_DMA_IRQn ((IRQn_Type) GPDMA1_Channel1_IRQn)
#define JPEG_OUT_DMA_CH ((DMA_Channel_TypeDef *) GPDMA1_Channel1_BASE_NS)
void Error_Handler(void);
void dma_in_cplt(DMA_HandleTypeDef *hdma);
void dma_out_cplt(DMA_HandleTypeDef *hdma);
extern uint32_t jpeg_cplt;
extern uint32_t dma_in_error;
extern uint32_t dma_out_error;
extern struct cfg_encoder_s *jpeg_enc;
==== main.c ====
extern uint8_t img[IMG_RES] __attribute__((aligned(4))) ;
extern uint8_t img_jpeg[IMG_RES] __attribute__((aligned(4))) ;
int main(void)
{
HAL_Init();
SystemPower_Config();
SystemClock_Config();
__HAL_RCC_GPDMA1_CLK_ENABLE();
__HAL_RCC_JPEG_CLK_ENABLE();
HAL_NVIC_SetPriority(JPEG_IN_DMA_IRQn, 6, 0);
HAL_NVIC_SetPriority(JPEG_OUT_DMA_IRQn, 7, 0);
HAL_NVIC_SetPriority(JPEG_IRQn, 8, 0);
HAL_NVIC_EnableIRQ(JPEG_IN_DMA_IRQn);
HAL_NVIC_EnableIRQ(JPEG_OUT_DMA_IRQn);
HAL_NVIC_EnableIRQ(JPEG_IRQn);
MX_ICACHE_Init();
jpeg_enc->in.gpdma = (DMA_Channel_TypeDef*) JPEG_IN_DMA_CH;
jpeg_enc->out.gpdma = (DMA_Channel_TypeDef*) JPEG_OUT_DMA_CH;
jpeg_enc->jpeg = (JPEG_TypeDef*) JPEG;
cfg_jpeg(jpeg_enc, 75);
cfg_gpdma_jpeg_in(jpeg_enc, (uint32_t*) img);
cfg_gpdma_jpeg_in_nodes(jpeg_enc);
cfg_gpdma_jpeg_out(jpeg_enc, (uint32_t*) img_jpeg);
cfg_gpdma_jpeg_out_nodes(jpeg_enc);
jpeg_start(jpeg_enc);
while (!jpeg_cplt && !dma_out_error && !dma_in_error) {}
}
void dma_in_cplt(DMA_HandleTypeDef *hdma) {
dma_in_cnt++;
}
void dma_out_cplt(DMA_HandleTypeDef *hdma) {
dma_out_cnt++;
jpeg_enc->length_encoded += L_NODE_OUT;
}
==== stm32u5xx_it.c ====
extern uint32_t jpeg_cplt;
extern uint32_t dma_in_error;
extern uint32_t dma_out_error;
extern struct cfg_encoder_s *jpeg_enc;
union dma_sr_u { // just to see regs without using SFRs
uint32_t data;
struct {
uint32_t idlef:1;
uint32_t reserved_0:7;
uint32_t tcf:1;
uint32_t htf:1;
uint32_t dtef:1;
uint32_t ulef:1;
uint32_t usef:1;
uint32_t suspf:1;
uint32_t tof:1;
uint32_t reserved_1:1;
uint32_t fifol:8;
uint32_t reserved_2:8;
};
};
void GPDMA1_Channel0_IRQHandler(void)
{
jpeg_in_dma_irq();
return;
}
void GPDMA1_Channel1_IRQHandler(void)
{
jpeg_out_dma_irq();
return;
}
static void jpeg_in_dma_irq() {
DMA_Channel_TypeDef *dma = (DMA_Channel_TypeDef*)JPEG_IN_DMA_CH;
union dma_sr_u sr = {.data=dma->CSR};
if(sr.tcf) {
dma->CFCR |= (1 << DMA_CFCR_TCF_Pos);
dma_in_cplt(&handle_GPDMA1_Channel1);
}
if(sr.htf)
dma->CFCR |= (1 << DMA_CFCR_HTF_Pos);
if(sr.usef)
dma->CFCR |= (1 << DMA_CFCR_USEF_Pos);
if(sr.ulef)
dma->CFCR |= (1 << DMA_CFCR_ULEF_Pos);
if(sr.dtef)
dma->CFCR |= (1 << DMA_CFCR_DTEF_Pos);
if(sr.tof)
dma->CFCR |= (1 << DMA_CFCR_TOF_Pos);
return;
}
static void jpeg_out_dma_irq() {
DMA_Channel_TypeDef *dma = (DMA_Channel_TypeDef*)JPEG_OUT_DMA_CH;
union dma_sr_u sr = {.data=dma->CSR};
if(sr.tcf) {
dma->CFCR |= (1 << DMA_CFCR_TCF_Pos);
dma_out_cplt(&handle_GPDMA1_Channel0);
}
if(sr.htf)
dma->CFCR |= (1 << DMA_CFCR_HTF_Pos);
if(sr.usef)
dma->CFCR |= (1 << DMA_CFCR_USEF_Pos);
if(sr.ulef)
dma->CFCR |= (1 << DMA_CFCR_ULEF_Pos);
if(sr.dtef)
dma->CFCR |= (1 << DMA_CFCR_DTEF_Pos);
if(sr.tof)
dma->CFCR |= (1 << DMA_CFCR_TOF_Pos);
return;
}
Solved! Go to Solution.
2025-09-30 8:25 AM
Hello @Khaled_DHIF,
It's working now, with just 2 more lines of code lol, basically doing the same the HAL_JPEG_Pause/Resume functions.
On a node IRQ complete transfer, when disabling the dma input of the jpeg hardware at the beginning and enable it at the end of the interrupt, the next nodes are executed as normal. Re-enabling the jpeg Input DMA fires the jpeg_rx request (when transfer granularity is aligned) and life is good again.
static void jpeg_in_dma_irq() {
DMA_Channel_TypeDef *dma = (DMA_Channel_TypeDef*)JPEG_IN_DMA_CH;
union dma_sr_u sr = {.data=dma->CSR};
if(sr.tcf) {
JPEG->CR &= ~(1 << JPEG_CR_IDMAEN_Pos); // disable input dma
dma->CFCR |= (1 << DMA_CFCR_TCF_Pos);
dma_in_cplt();
}
if(sr.htf)
dma->CFCR |= (1 << DMA_CFCR_HTF_Pos);
if(sr.usef)
dma->CFCR |= (1 << DMA_CFCR_USEF_Pos);
if(sr.ulef)
dma->CFCR |= (1 << DMA_CFCR_ULEF_Pos);
if(sr.dtef)
dma->CFCR |= (1 << DMA_CFCR_DTEF_Pos);
if(sr.tof)
dma->CFCR |= (1 << DMA_CFCR_TOF_Pos);
JPEG->CR |= (1 << JPEG_CR_IDMAEN_Pos); // re-enable dma
return;
}
2025-09-26 7:51 AM
Hello @guigrogue ,
The DMA controller treats each linked-list node as a complete set of configuration registers that it loads before starting the transfer. If registers such as BNDT, CTR1, CTR2, or DAR are left uninitialized or zeroed in the input DMA nodes, the DMA may misconfigure itself and stall after the first node.
It might be worth fully initializing all relevant registers in each input DMA node to see if that fixes the issue.
Kind regards,
DHIF Khaled
2025-09-26 9:09 AM
Hi @Kheled_DHIF,
Thank your for your answer,
I didn't mention it but previously tried without a compact configuration LL by initializing BR1, TR1, TR2, SAR, DAR and LLR for each node.
Unfortunately, it didn't change the outcome.
I never had struggle with GPDMA LL Peripheral-to-Memory (P2M) in the past, and it's the first time I set a Memory-to-Peripheral (M2P) LL.
Except for all the ports allocation, hardware request and DREQ (omitting the SAR/DAR and other addresses configurations), changing depending on the direction of the transfer, is there a important difference between P2M and M2P that I could have missed requiring particular care (for JPEG use or even globally)? Any hint on where the issue could be (possibly unrelated to the previous question)?
In the u5g9 DK example, for an RGB 240x320 pixels image, the HAL_JPEG_GetDataCallback is called 15 times, the HAL_JPEG_DataReadyCallback is called 3 times (called on input/output dma transfer completion respectively) but the JPEG_EncodeInputHandler and JPEG_EncodeOutputHandler are called 960 times for both (1920 total). The use of a LL GPDMA transfer is mostly motivated by wanting to get rid of the in/output handler in favor of a more automated system (ideally only keep track of the lenght of the encoded output unknown in advance through output dma completion). The higher the resolution of the image, the more polling check "wasting" CPU cycles.
2025-09-29 2:45 AM
Hello @guigrogue ,
Thank you for sharing your detailed setup and efforts so far.
Based on the behavior described and the STM32U5 reference manual, a key cause of the input DMA linked-list stalling might be the large transfer size (BNDT) per node.
The JPEG input FIFO issues DMA requests in fixed bursts of 4 words (16 bytes), as you correctly identified, so it is important that the linked-list node transfer size is aligned to this granularity.
I recommend reducing the BNDT value in each linked-list node to smaller chunks, such as 256 or 512 words per node, instead of large values like 9600. This better matches the peripheral’s DMA request size and helps prevent the DMA from stalling while waiting for requests.
Best regards,
DHIF Khaled
2025-09-29 8:16 AM
Hello @Khaled_DHIF (sorry for the previous typo),
When reducing the BNDT value and keeping the granularity aligned, still no progress unfortunately.
Basically, the issue revolves around the incapacity of the 2d input node to be executed (eventhough loaded, in a compact format or not). The bigger BNDT is, the more data I can feed to the JPEG encoder (thus, the more encoded data outputed).
The Peripheral's DMA definitely stalls between 2 new automatic transfers, but as long the BNDT is aligned, I don't think the length impacts the system or why it would cause an issue.
What is the behavior when a jpeg_rx request fires but the input node is transitioning to the new one thus isn't ready yet ? Could it be a reason for the stall ?
The example disables dma (JPEG_CR_IDMA) to pause further transfer and re-enable it when launching a new dma transfer. It might be to solve this issue, but for standard dma transfer.
Yours faithfully,
2025-09-30 8:25 AM
Hello @Khaled_DHIF,
It's working now, with just 2 more lines of code lol, basically doing the same the HAL_JPEG_Pause/Resume functions.
On a node IRQ complete transfer, when disabling the dma input of the jpeg hardware at the beginning and enable it at the end of the interrupt, the next nodes are executed as normal. Re-enabling the jpeg Input DMA fires the jpeg_rx request (when transfer granularity is aligned) and life is good again.
static void jpeg_in_dma_irq() {
DMA_Channel_TypeDef *dma = (DMA_Channel_TypeDef*)JPEG_IN_DMA_CH;
union dma_sr_u sr = {.data=dma->CSR};
if(sr.tcf) {
JPEG->CR &= ~(1 << JPEG_CR_IDMAEN_Pos); // disable input dma
dma->CFCR |= (1 << DMA_CFCR_TCF_Pos);
dma_in_cplt();
}
if(sr.htf)
dma->CFCR |= (1 << DMA_CFCR_HTF_Pos);
if(sr.usef)
dma->CFCR |= (1 << DMA_CFCR_USEF_Pos);
if(sr.ulef)
dma->CFCR |= (1 << DMA_CFCR_ULEF_Pos);
if(sr.dtef)
dma->CFCR |= (1 << DMA_CFCR_DTEF_Pos);
if(sr.tof)
dma->CFCR |= (1 << DMA_CFCR_TOF_Pos);
JPEG->CR |= (1 << JPEG_CR_IDMAEN_Pos); // re-enable dma
return;
}