2026-02-03 2:16 AM
Dear all,
We are using the STM32N6570-DK to encode a video stream (800x600px) and store that at a reasonable data rate to an external sd-card.
I have extended the VENC_SDCARD_ThreadX example to a certain extent to cover the obvious timing differences between the video capture loop and a disk write loop. The sample code is a great way to go forward, but at the moment we reach a data write rate of about 25kb/s when using FileX.
We have initially used the HAL implementation but I have changed the fx_stm32_sd_driver_glue.c code slightly to use the BSP layer for card speed negotiation that is not present in the HAL layer; debug output shows that we're at least using 4-bit wide data transfers>:
CardInfo type 1, version 1, class 1461, spd 0
Inst PWR 00000013 CLKCR 00004004
It seems that just before reaching down to the HAL layer, the single transaction (~200blocks at once, typically), is split into single sector writes in the sd_write_data function (fx_stm32_sd_driver.c):
Is there a reference implementation showing how this can be set up in the right way?
Being in very early project stages, I can subsitute filex with fatfs or jump to zephyr if this can be done with reasonable effort.
Btw, the current state is here: https://github.com/svogl/venc-sdcard-threadx/tree/feature/filex-bsp-integration
Some implementation notes:
The BSP layer (stm32n6570_discovery_sd.c) seems to be lacking an implementation of the 1V8 voltage switch present on the board (i.e. HAL_SD_DriveTransceiver_1_8V_Callback); I have implemented this in fx_stm32_sd_driver_glue.c
Thanks for your help with this great board,
Simon
Solved! Go to Solution.
2026-02-06 2:29 AM
Responding to myself,
as a counter-proof and work-around, we have implemented a memory fifo buffer that accumulates subsequent (file-wise unaligned) writes and triggers DMA'able requests. This effectively eliminates this bottleneck, from an low-level OS developer's view, I would have expected this to be present in the driver stack.
As we do this for fun AND business I am happy to share the code but would like to have a solution that works for the rest of the developer community.
Simon
2026-02-04 1:37 AM - edited 2026-02-04 1:38 AM
Hi,
to get useful speed for video data , the sd-card should be written in 16 KB ...64KB blocks .
Basically every disc/file "handling system" should do this....so if you have FileX working now,
look at the settings (in Cube) first, maybe there is something that sets max. block size to 512 ;
check and change the cache settings...give it BIG buffers. Try...generate code and check;
if still on single block writes , why not change this yourself...if its working fine then.
If you prefer to move to FATFS or Zephyr...i cannot promise it will have instantly big block writes, never tried.
Or on FileX : ->
2026-02-04 5:21 AM
Hi, thanks for coming back to this on short notice. We have been working from the example code on, there is no CubeMX support file, but I think we have enabled the right settings in the code.
If you could have a glimpse at the stack trace, you can see that the application code sends 100-300 sectors in one go (160 in this example) that are passed up through the filex stack as expected.
It is the stm-provided fx_stm32_sd_driver.c file that splits it into single-buffer writes in a loop, which is not what it should be doing. Could you point us to the latest version of that file?
Also, for the N6, I don't see a specific ThreadX support package - which one would be the one to use from:
?
Thanks a lot
Simon
2026-02-04 9:37 AM - edited 2026-02-04 10:50 AM
Hi,
as the N6xx is M55 core - this should be the package you need.
(Dont see it in your Cube pic...so look on git -> M55)
https://github.com/STMicroelectronics/stm32-mw-threadx
+
see example with filex -> on N6570-DK
https://github.com/STMicroelectronics/STM32CubeN6
-- but look: is there multi/block write - or not.
+
I looked on my H743 project, using FATFS : same as you found, multi sector called, but then 1 sector writes called:
901: disk_write(fs->drv, fs->win, wsect, 1);
ed
BUT i found: multi sector read+write in ff.c : cc is sector count :
if (disk_write(fs->drv, wbuff, sect, cc) != RES_OK) ABORT(fs, FR_DISK_ERR)btw I only doing fast read, thats working, fast as it should; so i never looked before, how its doing write...
So if you dont need RTOS , i would try just using FATFS.
+
Ai -> its possible:
/* USER CODE BEGIN PV */FX_FILE my_file; uint8_t write_buffer[512 * 4]; // Buffer for 4 sectors (2KB)ULONG bytes_written;/* USER CODE END PV */ // ... Inside a thread ... // 1. Open filefx_file_create(&sdio_disk, "multi_sec.bin"); fx_file_open(&sdio_disk, &my_file, "multi_sec.bin", FX_OPEN_FOR_WRITE); // 2. Perform multi-sector writefx_file_write(&my_file, write_buffer, sizeof(write_buffer), &bytes_written); // 3. Close filefx_file_close(&my_file);
So seems the fast multi-sector write is only done with DMA (and some extra settings "...flush" ).
2026-02-04 11:14 AM
Well, I did speed tests with a bare-metal + fatfs implementation earlier on, I got results in the range of 500 - 700 kb/s, still a factor 10 away from what would be expected/needed.
As I see it, the single-block writes to the sdcard keep the card controller busy & block the overall write performance. Who wrote the original stm driver code - maybe he has an idea?
Thanks for looking into this,
Simon
2026-02-04 11:37 AM
Hmm...what is your "700 kb/s" ? kbit/s ?
I did only read, test it also : got about 16MB/s (Mbyte/s) , no DMA used. (Didnt get DMA working :( )
SD unit at 100MHz clock (div 1 set), 4 bit mode. (Otherwise no hi speed anyway.)
What you set/use ?
2026-02-05 2:35 AM
Hi AScha.3,
Read is easy - writing is a fundamentally different operation as the SDCard controller needs to write to flash blocks.
I have isolated the behaviour in a minimally changed example code based on Fx_uSD_File_Edit, plz find the code here (CubeMX generated, no other edits except starting the filex demo code):
https://github.com/svogl/stm32-fx-usd-file-edit
the code replicates the VENC behavior - write an mp4 header first (45 bytes), then write video packets to file, found in app_filex.c lines 290-294 (buffers dma-aligned ~lines 78).
put a breakpoint at fx_stm32_sd_write_blocks, you can see single-block writes all over; commenting out the header part gives one big DMA write as expected.
The file system code is fed with dma-aligned buffers; I would have expected that the filesystem copies to its internal sector cache and propagates that to the write function in as big blocks as possible? Apparently this is not happening. A
Simon
2026-02-06 2:29 AM
Responding to myself,
as a counter-proof and work-around, we have implemented a memory fifo buffer that accumulates subsequent (file-wise unaligned) writes and triggers DMA'able requests. This effectively eliminates this bottleneck, from an low-level OS developer's view, I would have expected this to be present in the driver stack.
As we do this for fun AND business I am happy to share the code but would like to have a solution that works for the rest of the developer community.
Simon