2024-05-09 10:23 AM
I am encoding a 200x300 image using the JPEG codec available in the STM32H743. The input format is ARGB8888, with the input buffer having a size of 240000 bytes (200*300*4). As I do not know what the output size of the JPEG codec will be , I have allocated also a 240000 byte output buffer. The quality is set to 100, but I found the issue to occurr with other quality settings as well.
My problem is that when using the JPEG accelerator using the CPU to feed and retrieve data, the output size of the JPEG image (i.e., the size from the beginning of the output buffer to the last non-null byte) is 19116, both for optimization levels -O0 and -O3, which is ok. When using JPEG codec using DMA, more specifically using the HAL_JPEG_Encode_DMA HAL function, and setting the optimization level to -O0, the output size is 25796. I do not understand the difference in size to the CPU fed alternative, but apart from that, this size also seems to be ok. When using DMA and -O3 however, the output size becomes 200260. This happens because the last 20 bytes are shifted to later in the output buffer, like so:
=====
25776 bytes of data --
174464 bytes of 0x00 (empty) space --
20 bytes of data --
more empty space
====
I checked that the last 20 bytes are exactly the same for -O0 and -O3, so the output from the JPEG codec seems to be identical. It is just the position of the last 20 bytes in the output buffer that changes. Anyone have any idea what could cause this behaviour?
Solved! Go to Solution.
2024-09-03 10:47 AM - edited 2024-09-03 10:47 AM
It seems I was using the JPEG codec wrong all together. Before sending the ARGB data to the codec you must first convert it to the YCbCr format, and then organize it in MCUs. This can be done using the jpeg_utils module provided by stm. Only then can the data be encoded.
2024-05-22 03:39 AM
Hello @LFerr.8
At higher optimization levels, the compiler may assume that certain data structures are aligned to specific boundaries (e.g., 4-byte, 8-byte, 16-byte boundaries) to generate more efficient code. The -O3 optimization level enables more aggressive optimizations that can change how memory is accessed and managed. This could inadvertently affect DMA operations, if the DMA setup code or the JPEG codec relies on behavior that is altered by these optimizations.
Also, it can be an Cache Coherency issue, the STM32H743 has a cache that can affect DMA operations because the DMA operates independently of the CPU and may read from or write to the main memory directly, bypassing the cache. If the cache is not properly managed, the CPU and DMA may have different views of the data, leading to inconsistencies. Different optimization levels can change how frequently the CPU accesses memory and how it uses the cache. For example: At lower optimization levels (-O0), the compiler may generate code that accesses memory more frequently, which could inadvertently keep the cache and memory in sync. At higher optimization levels (-O3), the compiler may optimize memory accesses to reduce the number of reads and writes, relying more on the cache. This can lead to situations where the cache is not synchronized with the memory when DMA operations occur.
Another thing to look at is the interrupt handling. If the JPEG encoding process is interrupted, the way these interrupts are handled could be affected by the optimization level, which could mean that data gets written to the wrong place in memory.
2024-09-03 10:47 AM - edited 2024-09-03 10:47 AM
It seems I was using the JPEG codec wrong all together. Before sending the ARGB data to the codec you must first convert it to the YCbCr format, and then organize it in MCUs. This can be done using the jpeg_utils module provided by stm. Only then can the data be encoded.