2026-01-28 6:47 PM - edited 2026-01-28 7:00 PM
Product:
MCU/SoC: STM32MP2
Description:
In our ST Linux environment, the hantro-vpu driver registers:
/dev/video5 -> VDEC (H.264/VP8 Decoder)
/dev/video6 -> VENC (H.264/VP8 Encoder)
We want to implement direct V4L2 API usage for H.264/VP8 encoding and decoding, without using GStreamer.
Specific Questions:
2026-03-24 1:21 AM
Hi @SullyNiu ,
There are no official examples or demos using pure V4L2 API for H.264/VP8 encoding/decoding delivered by STMicroelectronics.
The wiki page https://wiki.st.com/stm32mpu/wiki/V4L2_video_codec_overview just highlights in its chapter “2.2. APIs description” the official Video for Linux API especially https://www.kernel.org/doc/html/v6.6/userspace-api/media/v4l/v4l2.html in which there are some examples of API usage.
https://www.kernel.org/doc/html/v6.6/userspace-api/media/v4l/dev-stateless-decoder.html is also interesting but I assume that you already know this documentation.
However, hope this information helps.
Regards,
JC.
2026-05-06 5:47 AM
Hi Sully,
This point depends a lot on the camera or video source you are using, especially the format and resolution it outputs.
On STM32MP2, the Hantro VPU is exposed through V4L2 mem2mem devices, so direct V4L2 usage is possible in principle. However, for H.264 and VP8 decoding, one important limitation must be kept in mind: the VDEC side supports H.264/VP8 up to around 1080p, more precisely 1920x1088.
So if the camera outputs a stream beyond this limit, or a portrait-oriented stream such as 720x1280, it may not be accepted by the H.264/VP8 decoder even if the total number of pixels looks reasonable.
For camera-based use cases, I would first check what the camera can output natively:
1. Raw YUV, for example NV12/YUYV
2. MJPEG/JPEG
3. H.264
4. VP8
If the goal is to go beyond the H.264/VP8 decode limit, the JPEG/MJPEG path is worth checking. The Hantro VPU supports JPEG/MJPEG at higher resolutions, up to 4K depending on the exact format and BSP capabilities. So the best architecture depends on the source format provided by the camera, not only on the VPU block itself.
For pure V4L2 usage, I would recommend starting with capability checks on the target:
v4l2-ctl -d /dev/videox -D
v4l2-ctl -d /dev/videox --list-formats-out-ext
v4l2-ctl -d /dev/videox --list-formats-ext
Regarding DMA, the recommended approach is to avoid userspace copies and use V4L2 streaming buffers, ideally DMABUF when the full pipeline supports it. In practice, this means validating that the producer and consumer devices support DMABUF import/export, using V4L2_MEMORY_DMABUF where applicable, and keeping compatible formats such as NV12 or other supported GPU/display formats when possible.
The main constraints to check are:
1. Camera output format and resolution
2. H.264/VP8 decoder limit on VDEC
3. JPEG/MJPEG capability if higher resolution is required
4. Pixel format compatibility between camera, VPU, display, or further processing
5. DMABUF support across the complete pipeline, not only on the VPU node
There is also an interesting related discussion here:
It is not exactly the same topic as Hantro VPU encoding/decoding, but it gives useful feedback for camera preview optimization on STM32MP2. In that case, the working solution was not based on GStreamer, but on a direct zero-copy path:
libcamera -> DMA-BUF FD -> EGLImage using EGL_LINUX_DMA_BUF_EXT -> GL_TEXTURE_EXTERNAL_OES -> Qt/OpenGL rendering
The important points from that feedback are:
1. Frames are not copied back to CPU
2. The DMA-BUF FD is retrieved from libcamera requestCompleted()
3. The FD is passed to the UI thread
4. EGLImage is created once per FD and reused
5. glEGLImageTargetTexture2DOES is used to bind the image to an OpenGL texture
However, this also has constraints:
1. The format must be compatible with GPU/EGL
2. EGL/OpenGL work must be done on the UI thread
3. EGLImage creation should be cached per FD to avoid overhead
So I would not assume that a stream is supported only because the V4L2 device node exists. The safest path is to query the exact V4L2 capabilities on the target BSP, check the camera output modes, and then decide whether the best path is H.264/VP8 through Hantro, MJPEG/JPEG through Hantro, raw DMA-BUF to GPU/display, or a mixed pipeline.