cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 JPEG encoder MCU blocks - corrupted JPEG image

rob-bits
Associate II

I have an issue to understand the MCU blocks for JPEG encoder. I have an stm32h743 MCU which is connected to a video decoder with dcmi interface. In ram I have the captured 8-bit ITU-R BT.656 YCrCb 4:2:2 output. Saving the captured data from ram to a debug file, I can see the image is captured properly. The byte stream looks like: Cb0-1, Y0,Cr0-1, Y1, Cb2-3, Y2, CR2-3, Y3 .... total 720Y and 360CB and 360 CR, total 1440 bytes + 8 byte blanking per line. And I have 240 lines so the resolution that I am capturing is 720x240.

As described in the AN4996, each 4:2:2 MCU contains 256 bytes organized as two 8×8 Y blocks plus one 8×8 Cb block plus one 8×8 Cr block. For a 720×240 image, I should need 45 horizontal MCUs × 30 vertical MCUs = 1,350 total MCUs. My confusion is the two Y blocks. I implemented the two Y blocks as:

Y1 first row:
y0, y1 ... y7
Y1 second row:
y16, y17...y23
Y2 first row:
y8, y9, ... y15
Y2 second row:
y24, y25 .. y31
Is this correct?

So Y1 contains the first 8 pixel columns, and the Y2 contains the second 8 pixel columns?

Here is my code that processes the ycrcb byte stream:

#define MCU_BLOCK_ROWS 8 // 8x8 block = 64 bytes #define MCU_BLOCK_COLS 8 // 8x8 block = 64 bytes #define MCU_BLOCK_SIZE (MCU_BLOCK_ROWS * MCU_BLOCK_COLS) // 8x8 block = 64 bytes #define MCU_SIZE 256 __attribute__((section(".axiram"))) static uint8_t mcuBuffers[2][MCU_SIZE];
 
static void ExtractMcuToBuffer(uint8_t* src, uint8_t* dest, uint32_t block_cnt) { uint8_t* y1 = dest; // Y block 1 uint8_t* y2 = dest + 64; // Y block 2 uint8_t* cb = dest + 128; // Cb block uint8_t* cr = dest + 192; // Cr block // 720x240 → 720 / 16 bytes, 240/8 bytes -> 45x30 = 1350 MCUs uint32_t mcuRowIdx = block_cnt / 45; uint32_t mcuColIdx = block_cnt - (mcuRowIdx * 45); //the number of bytes in one line, left offset + right offset + width = 1576, // 4 byte was 1 pixel with this offset uint32_t lineOffset = 1448; //how much bytes are in a line -> determined by trial and error, this is how many samples are captured by dcmi, including blanking // Optimized MCU extraction for 4:2:2 UYVY format uint32_t rowIdx = 0; uint32_t colIdx = 0; uint32_t byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) //we need to skip MCU_BLOCK_ROWS of lines times mcuRowIdx + (lineOffset * 0) //we need to do offset lines based on the row idx, in the begining we are at 0 line, for second line of MCU we need to get the second line of the ycrcb buffer + mcuColIdx * 32; // we need an offset based on in which MCU block we are working on, 0, 1... One MCU block consist of 32 bytes,16xY, 8xCr and 8xCb 32 x 45 -> 1440 bytes, max(mcuColIdx) = 44, 32 * 44 + 32 -> 1440 samples uint32_t* crycby = (uint32_t*)&src[byteOffset]; for(int i = 0; i < MCU_BLOCK_SIZE;i++) { uint8_t Y1 = (uint8_t) ((*crycby & 0x000000FF) >> 0); uint8_t Cr = (uint8_t) ((*crycby & 0x0000FF00) >> 8); uint8_t Y2 = (uint8_t) ((*crycby & 0x00FF0000) >> 16); uint8_t Cb = (uint8_t) ((*crycby & 0xFF000000) >> 24); crycby++; //increase four bytes in the address *cr++ = Cr; // cr *cb++ = Cb; // cb //building Y1 if(colIdx < MCU_BLOCK_COLS/2) { *y1++ = Y1; //yn *y1++ = Y2; // yn+1 } else {//building Y2 *y2++ = Y1; //yn *y2++ = Y2; // yn+1 } if(colIdx < MCU_BLOCK_COLS - 1) { colIdx++; } else { //here we switch line colIdx = 0; rowIdx++; byteOffset = (lineOffset * MCU_BLOCK_ROWS * mcuRowIdx ) + rowIdx * lineOffset + mcuColIdx * 32; crycby = (uint32_t*)&src[byteOffset]; } } }
View more

The block_cnt goes from 0 to 1349.

With this code, I got this jpeg image:

robbits_0-1750143337325.png

 

The file header, resolution looks okay, but the content is corrupted. And I am not sure why.

Any idea?

1 ACCEPTED SOLUTION

Accepted Solutions
rob-bits
Associate II

I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:

ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu); // Critical: Clean cache after buffer generation SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);

View solution in original post

3 REPLIES 3
Saket_Om
ST Employee

Hello @rob-bits 

Please refer to the JPEG example below: 

STM32CubeN6/Projects/STM32N6570-DK/Examples/JPEG/JPEG_EncodingFromOSPI_DMA at main · STMicroelectronics/STM32CubeN6 · GitHub

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
Saket_Om

Hello @Saket_Om 

 

Thanks, I have already tried to interpret the example codes for my case. However in the JPEG_Encode_DMA() funciton, the MCU blocks are created with pRGBToYCbCr_Convert_Function(), which might call the JPEG_ARGB_MCU_YCbCr422_ConvertBlocks() fun. However I have YCrCb data. I do not have RGB. I do not want to do any conversion, just encode it into JPEG. As I understand properly, the YCrCb is the format that is needed for JPEG. So please guide me, how to resolve this issue. Do you have an example/tutorial for a YCrCB 4:2:2 input?

Here is the code that you suggested:

uint32_t JPEG_Encode_DMA(JPEG_HandleTypeDef *hjpeg, uint32_t RGBImageBufferAddress, uint32_t RGBImageSize_Bytes, uint32_t *jpgBufferAddress ) { pJpegBuffer = jpgBufferAddress; uint32_t DataBufferSize = 0; /* Reset all Global variables */ MCU_TotalNb = 0; MCU_BlockIndex = 0; Jpeg_HWEncodingEnd = 0; Output_Is_Paused = 0; Input_Is_Paused = 0; /* Get RGB Info */ RGB_GetInfo(&Conf); JPEG_GetEncodeColorConvertFunc(&Conf, &pRGBToYCbCr_Convert_Function, &MCU_TotalNb); /* Clear Output Buffer */ Jpeg_OUT_BufferTab.DataBufferSize = 0; Jpeg_OUT_BufferTab.State = JPEG_BUFFER_EMPTY; /* Fill input Buffers */ RGB_InputImageIndex = 0; RGB_InputImageAddress = RGBImageBufferAddress; RGB_InputImageSize_Bytes = RGBImageSize_Bytes; DataBufferSize= Conf.ImageWidth * MAX_INPUT_LINES * BYTES_PER_PIXEL; if(RGB_InputImageIndex < RGB_InputImageSize_Bytes) { /* Pre-Processing */ MCU_BlockIndex += pRGBToYCbCr_Convert_Function((uint8_t *)(RGB_InputImageAddress + RGB_InputImageIndex), Jpeg_IN_BufferTab.DataBuffer, 0, DataBufferSize,(uint32_t*)(&Jpeg_IN_BufferTab.DataBufferSize)); Jpeg_IN_BufferTab.State = JPEG_BUFFER_FULL; RGB_InputImageIndex += DataBufferSize; } ...
View more

You can see, it is for RGB images.

Thanks!

Rob

 

rob-bits
Associate II

I found a solution. Basically my implementation was correct. The example code brings too much complexity, it is not easy to integrate... Anyway, the issue that I was facing is related to cache issue with dma. I had to clean the dchache each time I created an MCU block. Something like this:

ExtractMcuToBuffer(inputPtr, mcuBuffers[bufferIndex], currentMcu); // Critical: Clean cache after buffer generation SCB_CleanDCache_by_Addr((uint32_t*)mcuBuffers[bufferIndex], MCU_SIZE);