2025-06-16 11:59 PM
I have an issue to understand the MCU blocks for JPEG encoder. I have an stm32h743 MCU which is connected to a video decoder with dcmi interface. In ram I have the captured 8-bit ITU-R BT.656 YCrCb 4:2:2 output. Saving the captured data from ram to a debug file, I can see the image is captured properly. The byte stream looks like: Cb0-1, Y0,Cr0-1, Y1, Cb2-3, Y2, CR2-3, Y3 .... total 720Y and 360CB and 360 CR, total 1440 bytes + 8 byte blanking per line. And I have 240 lines so the resolution that I am capturing is 720x240.
As described in the AN4996, each 4:2:2 MCU contains 256 bytes organized as two 8×8 Y blocks plus one 8×8 Cb block plus one 8×8 Cr block. For a 720×240 image, I should need 45 horizontal MCUs × 30 vertical MCUs = 1,350 total MCUs. My confusion is the two Y blocks. I implemented the two Y blocks as:
Y1 first row:
y0, y1 ... y7
Y1 second row:
y16, y17...y23
Y2 first row:
y8, y9, ... y15
Y2 second row:
y24, y25 .. y31
Is this correct?
So Y1 contains the first 8 pixel columns, and the Y2 contains the second 8 pixel columns?
Here is my code that processes the ycrcb byte stream:
#define MCU_BLOCK_ROWS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_COLS 8 // 8x8 block = 64 bytes
#define MCU_BLOCK_SIZE (MCU_BLOCK_ROWS * MCU_BLOCK_COLS) // 8x8 block = 64 bytes
#define MCU_SIZE 256
__attribute__((section(".axiram"))) static uint8_t mcuBuffers[2][MCU_SIZE];
static void ExtractMcuToBuffer(uint8_t* src, uint8_t* dest, uint32_t block_cnt) {
uint8_t* y1 = dest; // Y block 1
uint8_t* y2 = dest + 64; // Y block 2
uint8_t* cb = dest + 128; // Cb block
uint8_t* cr = dest + 192; // Cr block
// 720x240 → 720 / 16 bytes, 240/8 bytes -> 45x30 = 1350 MCUs
uint32_t mcuRowIdx = block_cnt / 45;
uint32_t mcuColIdx = block_cnt - (mcuRowIdx * 45);
//the number of bytes in one line, left offset + right offset + width = 1576,
// 4 byte was 1 pixel with this offset
uint32_t lineOffset = 1448; //how much bytes are in a line -> determined by trial and error, this is how many samples are captured by dcmi, including blanking
// Optimized MCU extraction for 4:2:2 UYVY format
uint32_t rowIdx = 0;
uint32_t colIdx = 0;
uint32_t byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) //we need to skip MCU_BLOCK_ROWS of lines times mcuRowIdx
+ (lineOffset * 0) //we need to do offset lines based on the row idx, in the begining we are at 0 line, for second line of MCU we need to get the second line of the ycrcb buffer
+ mcuColIdx * 32; // we need an offset based on in which MCU block we are working on, 0, 1... One MCU block consist of 32 bytes,16xY, 8xCr and 8xCb 32 x 45 -> 1440 bytes, max(mcuColIdx) = 44, 32 * 44 + 32 -> 1440 samples
uint32_t* crycby = (uint32_t*)&src[byteOffset];
for(int i = 0; i < MCU_BLOCK_SIZE;i++) {
uint8_t Y1 = (uint8_t) ((*crycby & 0x000000FF) >> 0);
uint8_t Cr = (uint8_t) ((*crycby & 0x0000FF00) >> 8);
uint8_t Y2 = (uint8_t) ((*crycby & 0x00FF0000) >> 16);
uint8_t Cb = (uint8_t) ((*crycby & 0xFF000000) >> 24);
crycby++; //increase four bytes in the address
*cr++ = Cr; // cr
*cb++ = Cb; // cb
//building Y1
if(colIdx < MCU_BLOCK_COLS/2) {
*y1++ = Y1; //yn
*y1++ = Y2; // yn+1
} else {//building Y2
*y2++ = Y1; //yn
*y2++ = Y2; // yn+1
}
if(colIdx < MCU_BLOCK_COLS - 1) {
colIdx++;
} else { //here we switch line
colIdx = 0;
rowIdx++;
byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) + rowIdx * lineOffset + mcuColIdx * 32;
crycby = (uint32_t*)&src[byteOffset];
}
}
}
The block_cnt goes from 0 to 1349.
With this code, I got this jpeg image:
The file header, resolution looks okay, but the content is corrupted. And I am not sure why.
Any idea?