STM32H7 JPEG encoder MCU blocks - corrupted JPEG image
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-06-16 11:59 PM
I have an issue to understand the MCU blocks for JPEG encoder. I have an stm32h743 MCU which is connected to a video decoder with dcmi interface. In ram I have the captured 8-bit ITU-R BT.656 YCrCb 4:2:2 output. Saving the captured data from ram to a debug file, I can see the image is captured properly. The byte stream looks like: Cb0-1, Y0,Cr0-1, Y1, Cb2-3, Y2, CR2-3, Y3 .... total 720Y and 360CB and 360 CR, total 1440 bytes + 8 byte blanking per line. And I have 240 lines so the resolution that I am capturing is 720x240.
As described in the AN4996, each 4:2:2 MCU contains 256 bytes organized as two 8×8 Y blocks plus one 8×8 Cb block plus one 8×8 Cr block. For a 720×240 image, I should need 45 horizontal MCUs × 30 vertical MCUs = 1,350 total MCUs. My confusion is the two Y blocks. I implemented the two Y blocks as:
Y1 first row:
y0, y1 ... y7
Y1 second row:
y16, y17...y23
Y2 first row:
y8, y9, ... y15
Y2 second row:
y24, y25 .. y31
Is this correct?
So Y1 contains the first 8 pixel columns, and the Y2 contains the second 8 pixel columns?
Here is my code that processes the ycrcb byte stream:
#define MCU_BLOCK_ROWS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_COLS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_SIZE (MCU_BLOCK_ROWS * MCU_BLOCK_COLS) // 8x8 block = 64 bytes
#define MCU_SIZE 256
__attribute__((section(".axiram"))) static uint8_t mcuBuffers[2][MCU_SIZE];
static void ExtractMcuToBuffer(uint8_t* src, uint8_t* dest, uint32_t block_cnt) {
uint8_t* y1 = dest; // Y block 1
uint8_t* y2 = dest + 64; // Y block 2
uint8_t* cb = dest + 128; // Cb block
uint8_t* cr = dest + 192; // Cr block
// 720x240 → 720 / 16 bytes, 240/8 bytes -> 45x30 = 1350 MCUs
uint32_t mcuRowIdx = block_cnt / 45;
uint32_t mcuColIdx = block_cnt - (mcuRowIdx * 45);
//the number of bytes in one line, left offset + right offset + width = 1576,
// 4 byte was 1 pixel with this offset
uint32_t lineOffset = 1448; //how much bytes are in a line -> determined by trial and error, this is how many samples are captured by dcmi, including blanking
// Optimized MCU extraction for 4:2:2 UYVY format
uint32_t rowIdx = 0;
uint32_t colIdx = 0;
uint32_t byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) //we need to skip MCU_BLOCK_ROWS of lines times mcuRowIdx
+ (lineOffset * 0) //we need to do offset lines based on the row idx, in the begining we are at 0 line, for second line of MCU we need to get the second line of the ycrcb buffer
+ mcuColIdx * 32; // we need an offset based on in which MCU block we are working on, 0, 1... One MCU block consist of 32 bytes,16xY, 8xCr and 8xCb 32 x 45 -> 1440 bytes, max(mcuColIdx) = 44, 32 * 44 + 32 -> 1440 samples
uint32_t* crycby = (uint32_t*)&src[byteOffset];
for(int i = 0; i < MCU_BLOCK_SIZE;i++) {
uint8_t Y1 = (uint8_t) ((*crycby & 0x000000FF) >> 0);
uint8_t Cr = (uint8_t) ((*crycby & 0x0000FF00) >> 8);
uint8_t Y2 = (uint8_t) ((*crycby & 0x00FF0000) >> 16);
uint8_t Cb = (uint8_t) ((*crycby & 0xFF000000) >> 24);
crycby++; //increase four bytes in the address
*cr++ = Cr; // cr
*cb++ = Cb; // cb
//building Y1
if(colIdx < MCU_BLOCK_COLS/2) {
*y1++ = Y1; //yn
*y1++ = Y2; // yn+1
} else {//building Y2
*y2++ = Y1; //yn
*y2++ = Y2; // yn+1
}
if(colIdx < MCU_BLOCK_COLS - 1) {
colIdx++;
} else { //here we switch line
colIdx = 0;
rowIdx++;
byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) + rowIdx * lineOffset + mcuColIdx * 32;
crycby = (uint32_t*)&src[byteOffset];
}
}
}
The block_cnt goes from 0 to 1349.
With this code, I got this jpeg image:
The file header, resolution looks okay, but the content is corrupted. And I am not sure why.
Any idea?
Solved! Go to Solution.
- Labels:
-
STM32H7 series
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-06-18 5:51 AM
I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:
ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu);
// Critical: Clean cache after buffer generation
SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-06-17 6:46 AM
Hello @rob-bits
Please refer to the JPEG example below:
Saket_Om
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-06-18 12:40 AM
Hello @Saket_Om
Thanks, I have already tried to interpret the example codes for my case. However in the JPEG_Encode_DMA() funciton, the MCU blocks are created with pRGBToYCbCr_Convert_Function(), which might call the JPEG_ARGB_MCU_YCbCr422_ConvertBlocks() fun. However I have YCrCb data. I do not have RGB. I do not want to do any conversion, just encode it into JPEG. As I understand properly, the YCrCb is the format that is needed for JPEG. So please guide me, how to resolve this issue. Do you have an example/tutorial for a YCrCB 4:2:2 input?
Here is the code that you suggested:
uint32_t JPEG_Encode_DMA(JPEG_HandleTypeDef *hjpeg, uint32_t RGBImageBufferAddress, uint32_t RGBImageSize_Bytes, uint32_t *jpgBufferAddress )
{
pJpegBuffer = jpgBufferAddress;
uint32_t DataBufferSize = 0;
/* Reset all Global variables */
MCU_TotalNb = 0;
MCU_BlockIndex = 0;
Jpeg_HWEncodingEnd = 0;
Output_Is_Paused = 0;
Input_Is_Paused = 0;
/* Get RGB Info */
RGB_GetInfo(&Conf);
JPEG_GetEncodeColorConvertFunc(&Conf, &pRGBToYCbCr_Convert_Function, &MCU_TotalNb);
/* Clear Output Buffer */
Jpeg_OUT_BufferTab.DataBufferSize = 0;
Jpeg_OUT_BufferTab.State = JPEG_BUFFER_EMPTY;
/* Fill input Buffers */
RGB_InputImageIndex = 0;
RGB_InputImageAddress = RGBImageBufferAddress;
RGB_InputImageSize_Bytes = RGBImageSize_Bytes;
DataBufferSize= Conf.ImageWidth * MAX_INPUT_LINES * BYTES_PER_PIXEL;
if(RGB_InputImageIndex < RGB_InputImageSize_Bytes)
{
/* Pre-Processing */
MCU_BlockIndex += pRGBToYCbCr_Convert_Function((uint8_t *)(RGB_InputImageAddress + RGB_InputImageIndex), Jpeg_IN_BufferTab.DataBuffer, 0, DataBufferSize,(uint32_t*)(&Jpeg_IN_BufferTab.DataBufferSize));
Jpeg_IN_BufferTab.State = JPEG_BUFFER_FULL;
RGB_InputImageIndex += DataBufferSize;
}
...
You can see, it is for RGB images.
Thanks!
Rob
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-06-18 5:51 AM
I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:
ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu);
// Critical: Clean cache after buffer generation
SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);
