STM32H743 JPEG Decoder Performance
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-07-30 07:10 AM
Hello,
I'm trying to use the HW JPEG decoder on an STM32H743 MCU. When looking at the AN4996 Application Note, I found the following table:
However, I'm unable to come even close to these values... This is what I measured:
JPEG decode | DMA2D YCbCr | |
320x240 | 2,5 ms | 2,5 ms |
640x480 | 10 ms | 12 ms |
I checked my code against the Examples from the Firmware Package:
- JPEG_MJPEG_VideoDecoding
- JPEG_DecodingFromFLASH_DMA
All my settings look similar. I'm decoding the JPEG image from DTCM RAM to external SDRAM. Compiler optimization has no effect (everything is done by DMA anyway)...
Is there anything else that has to be configured for the JPEG peripheral? Where are the values in the application note coming from?
Solved! Go to Solution.
- Labels:
-
STM32H7 Series
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-07 04:11 AM
Hello @d.zipperle.5512730769682947E12,
The performance measurements in the application note were obtained under the conditions presented in the below table:
For that please make sure that you have used the same conditions to obtain the same performance::
-1- Board: STM32H743I-EVAL that comes with an 32 bits SDRAM
-2- MDMA output channel destination data increment and size : WORD (hmdmaOut.Init.DestinationInc: MDMA_DEST_INC_WORD; hmdmaOut.Init.DestDataSize: MDMA_DEST_DATASIZE_WORD))
-3- DMA2D format RGB565 (and same for LTDC format)
-4- LCD display turned off during the JPEG operations to reduce the contention on the SDRAM between the LTDC and the DMA2D mainly
-5- The image used is a 4:2:0 image
I hope this help you.
Kaouthar
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-07 04:11 AM
Hello @d.zipperle.5512730769682947E12,
The performance measurements in the application note were obtained under the conditions presented in the below table:
For that please make sure that you have used the same conditions to obtain the same performance::
-1- Board: STM32H743I-EVAL that comes with an 32 bits SDRAM
-2- MDMA output channel destination data increment and size : WORD (hmdmaOut.Init.DestinationInc: MDMA_DEST_INC_WORD; hmdmaOut.Init.DestDataSize: MDMA_DEST_DATASIZE_WORD))
-3- DMA2D format RGB565 (and same for LTDC format)
-4- LCD display turned off during the JPEG operations to reduce the contention on the SDRAM between the LTDC and the DMA2D mainly
-5- The image used is a 4:2:0 image
I hope this help you.
Kaouthar
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-07 09:03 AM
Hello Kaouthar,
thanks for Your valuable explanations.
-1- Board: STM32H743I-EVAL that comes with an 32 bits SDRAM
We also have 32 Bit SDRAM with FMC running @ 250 MHz (SDCLK = 125 MHz)
-2- MDMA output channel destination data increment and size : WORD (hmdmaOut.Init.DestinationInc: MDMA_DEST_INC_WORD; hmdmaOut.Init.DestDataSize: MDMA_DEST_DATASIZE_WORD))
Confirmed
-3- DMA2D format RGB565 (and same for LTDC format)
Confirmed
-4- LCD display turned off during the JPEG operations to reduce the contention on the SDRAM between the LTDC and the DMA2D mainly
I'm not exactly sure what this means: are we supposed to write during VSYNC blanking to the SDRAM?
-5- The image used is a 4:2:0 image
Confirmed.
AFAICS everything looks as suggested, plus we're clocking the CPU @ 480 MHz and SDRAM clock is 25% faster @ 125 MHz. Nevertheless we cannot reach the measured performance and we get display distortion.
There must be somthing we're overseeing...
Thanks,
Osama
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-08 12:56 AM
Hi @osama2 ,
-4- LCD display turned off during the JPEG operations to reduce the contention on the SDRAM between the LTDC and the DMA2D mainly
I'm not exactly sure what this means: are we supposed to write during VSYNC blanking to the SDRAM?
Please try to:
- call BSP_LCD_DisplayOff(0); before starting the decode
- call BSP_LCD_DisplayOn(0); after the end of the DMA2D copy to re-enable the LTDC display
Note that the performance can be optimized more by placing the JPEG YCbCr output buffer at the internal AXI-SRAM @0x24000000.
Could you please test the performance with the same frequencies conditions as motioned in the below table:
Could you please share the performance values you have obtained?
I hope this help you.
Thank you.
Kaouthar
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-12 02:43 AM
Hi @KDJEM.1,
thank you for your feedback.
I was able to identify and fix the problem and my performance now looks like this:
JPEG decode | DMA2D YCbCr (LTDC disabled) | DMA2D YCbCr (LTDC enabled) | |
640x480 | 4,4 ms | 6,8 ms | 10,8 ms |
I tested with the same frequency conditions as the table from the application note and now the values look plausible...
As mentioned in my first question, I looked at both example projects from the STM32CubeMX Repository (V1.11.2). As you can see, in both examples the MDMA output destination size + increment is configured for BYTE size.
Looking for this problem did cost a lot of time and I wasn't able to find any reference (other than the example code) for the correct / optimal MDMA configuration...
I would suggest to fix the example code to clarify this for future use. Or is there any specific reason why the MDMA in the example projects isn't configured for optimal performance?
Regards,
Dominik
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2024-08-12 03:48 AM
Hello @d.zipperle.5512730769682947E12 ,
Glad to know that the issue is fixed and thank you for confirming the source of the problem and for sharing the fix.
So, to give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.
Your proposal is tracked internally as a request to enhance performance for this example. I assume that this wasn't the main focus for this example but it is important to consider it.
Internal ticket number: 188510(This is an internal tracking number and is not accessible or usable by customers).
Thank you for your contribution in STCommunity.
Kaouthar
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.