cancel
Showing results for 
Search instead for 
Did you mean: 

Is using DMA2D to convert YCbCr 4:2:2 to ARGB8888 with STM32H7 and DCMI as input possible ?

Viony
Associate II

Hello everyone,

The question has already been asked here :

https://community.st.com/s/question/0D50X00009XkfEhSAJ/stm32h7-and-ycbcr?t=1534420367411

but I don't think the answer is complete.

Anyway, I have a PAL camera. Its signal is converted to YCbCr 4:2:2. If I connect those signals to the DCMI input of a STM32H7 MCU, may I use afterwards the DMA2D that can, with the H7 Line, convert the YCbCr to RGB to display it on a screeen ?

I am asking this because in the AN4996 it is said "the Chrom-Art Accelerator™ peripheral (also called DMA2D) allows to convert YCbCr blocks (output of the JPEG decoder) to RGB pixels ready for display." But is this DMA2D conversion possible if I don't use the JPEG decoder ?

The chain would be :

PAL Camera -> PAL-to-YCbCr -> DCMI -> DMA2D YCbCr to RGB -> Display

Is is possible ? What latency may I expect ?

Thanks !

Viony.

9 REPLIES 9

I don't know but reading the 'H7 RM I'd say no, or not in a simple way - the DMA2D appears to accept the YCbCr data in the 8x8 pixel format as described in the YCbCr subchapter, while the camera/convertor most probably outputs progressive-scan data. So some processor involvement appears to be inevitable. Then it's questionable whether it's worth to shuffle the data and then use the DMA2D, or to perform the whole conversion by processor. I'd say, the latter.

JW

This is one of these real-world situations/applications which really should have a solid worked example. Perhaps we can get one of the video leaning FAEs to weigh in with some sage advice and perhaps an app note and code. @STOne-32​ 

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Viony
Associate II

Thanks for your answers. I guess I have to do the YCbCr to RGB conversion through software.

What I am afraid of is if the processor will be fast enough to make the conversion. Regarding this older post :

https://community.st.com/s/question/0D50X00009XkbA7SAJ/stm32f429i-using-dcmi-in-ycbcr-mode-with-the-tvp5150-and-tft-lcd

It seems difficult. First I have to up-sample the YCbCr 4:2:2 to 4:4:4, then make the YCbCr 4:4:4 to RGB conversion, probably (but not sure) followed by an upscale of the frame. In this older post the author uses a STM32F429, maybe the H7 line would be efficient enough.

Viony.

> First I have to up-sample the YCbCr 4:2:2 to 4:4:4, then make the YCbCr 4:4:4 to RGB conversion,

I wouldn't do that in two steps. The two pixels with common chroma are after each other, aren't they. So I'd read in to registers the two luma and two chroma bytes - that's one single 32-bit read. Then I'd decompose the two chroma and one luma into separate registers and perform either the multiple-adds, or lookup (lookup may turn out to be faster or slower, but that depends on particular placing of the lookup table), store the resulting RGB pixel, and then repeat with the remaining luma. This all should fit into the registers; if the compiler can't cope with it then help it with a bit of inline asm. I'd say, as a very rough guestimate, three to five dozens of instructions, so on the 'F4 say some 50 cycles per pixel. 'H7 is much more complex, I have no estimate there but wouldn't really hold my breath to see twofold improvement.

On second thought, the repacking into the 8x8 and running it through DMA might be faster, especially if working through fast memory/cache. SDRAM is definitively a performance killer; caches in the 'H7 may help but they won't help with DMA and LCDC.

JW

Viony
Associate II

Yeah you are right about the YCbCr 4:2:2 to YCbCr 4:4:4 up-sampling but I am still worried about the time it will involve. In fact, if I undestood well in YCbCr 4:2:2 mode, when the DCMI_DR register is full (32-bit : Cb1 Y1 Cr1 Y2), a DMA request is generated. I should then transfer this data into the DMA2 FIFO (up to 4 word) then wait the FIFO to be complete to transfer all into the memory (burst transfer ; for example into SDRAM).

I made some math (maybe I am completely wrong) :

My input PIXCLK is 54 MHz and the YCbCr 4:2:2 data is output as 8-bit parallel, which means I need 4 x PIXCLK cycle to fill the DCMI_DR register (74 ns). Afterward a DMA request is generated. According to the AN4031 on the DMA, Chapter 3 : How to predict DMA latencies, the transfer between the DCMI_DR register (peripheral) to the DMA FIFO should take 4 AHB cycles (the DMA2 access to DCMI is done through bus matrix). For the STM32F429 the AHB2 clock max clock is 180MHz which leads to 22ns.

Being so, a DMA burst occurs every 318ns (4 x 74ns + 22ns) and this burst last also 4 AHB cycles. So, at this step, to get 4 x 32-bit YCbCr 4:2:2 data into the SDRAM would take 340ns.

Am i wrong if i say, regarding the previous brainwork, that I only have 340ns to convert the YCbCr 4:2:2 into RGB (using a 'F429) ? Every burst allows to transfer 4 x 32-bit, so 8 pixels actually.

Regarding the repacking into 8x8, i have to see how it can be done. But I guess it involves to have a certain amount a line of the frame already written into the SDRAM, or maybe to work on frame N when frame N+1 is written into another buffer.

> My input PIXCLK is 54 MHz

Whoa. What PAL camera outputs 27Mpixels/sec? Depending on particular sub-standard, the audio subcarrier in PAL is at 5.5-6.5MHz so the luma bandwidth is around 5-6MHz.

Most of the time you don't need to be bothered by the DMA details at all.

The general idea is, that you capture a whole frame and then process it, whatever time it takes; more frames may then pass by until you get to the next one.

If you'd want to process motion picture on-the-go, then your processing time per pixel is roughly equal to the pixel period (you can buffer a bit and use the backtrace/sync time, but that's what, 10-20%). 27Mpx/s at roughly 200M instructions per second converts to cca 6-7 instructions per pixel, and this is clearly impossible to do by software.

JW

Viony
Associate II

I expected to use an ADV7280A from Analog Devices. The nominal output frequency is 27MHz but using the interlace-to-progressive feature, this frequency becomes 54Mhz.

Since I have to respect some latency between the input (PAL) and what is displayed, I assume I won't be able to achieve it with this solution. The FPGA seems to be the solution (I am crying).

Anyway, thanks for your clear answer, I learnt a lot !

Is there no PAL-to-RGB converter, similar to the mentioned one?

JW

gvigelet
Associate II

Hi All,

Sorry for the late reply, but here ya go. The H7 will do YCbCr color conversion to RGB, I attached a full project to this post. I did this on a nucleo H7 with no display. I flashed a YCbCr frame to location 0x08180000. The YCbCr 4:2:2 frame is included in the y422.zip which was the result of the jpeg decoder decoding a jpeg image. The conversion is to RGB565 but you can take a look at the MX_DMA2D_Init function and change the parameters to suite your needs. The converted frame will reside in sram at 0x24000000. The image is 640x400 and can be saved from the debugger

save y422_r565.ihex 0x24000000, 0x2407D000

The RGB565 converted frame result is in the y422_r565.zip. Hope this helps you out.

- George