2026-04-25 8:36 AM
Apologies in advance, this is a long one.
We are developing a camera application on an STM32H747. We have 32GB of SDRAM configured through FMC.
In summary: camera -> [A1,A2,A3] -> unpack and debayer -> [B] -> scale -> [C] -> DMA2D (combine with [D]) -> [E] -> LTDC+DSI processes and sends to display
This all works, mostly, except for a horizontal band of flickering pixels across the middle of the display which we call "disco". I created tools to pull data from any of the framebuffers and view it, so that we can figure out which link in the chain is causing the problem.
Here's where it gets weird. Disco appears in *every* framebuffer, [A1,2,3], [B], [C], and [E], which suggests that the camera is generating the issue -- which would be weird because this camera hardware+driver was proven out before. BUT, when I turn off of the DMA2D->LTDC+DSI->display sequence, so that it is only [A1,2,3]->[B]->[C] as fast as possible, I see the problem *nowhere*, disco is completely gone from all framebuffers [A],[B],[C] which suggests that camera and unpacking/debayering/scaling are fine. Until I turn display back on.
So the display processing is causing the issue, but the display processing doesn't even *touch* the upstream framebuffers [A] and [B] in which the problem appears. So... what's happening?
The only two theories I have are (1) hardware, the display wires are electrically affecting the camera wires or (2) the addition of the display processing prevents the camera from properly writing to SDRAM, maybe overloading SDRAM?. Note that caching is enabled on our SDRAM MPU region, and we are cleaning the cache with SCB_CleanDCache_by_Addr on the source buffer before [A] to [B] and [B] to [C] and before lighting off [C] -> DMA2D.
Additional observation: the "disco" starts to recede, then go away, and we display correctly if I slow the unpack/debayer/display sequence *way* down.
Is it possible for us to overload the SDRAM via reading with DMA2D, exceed its bandwidth or simply disrupt concurrent writes, to produce data corruption? How should we avoid this? Are there any other reasons this might be happening?
Thanks for any ideas -- this was a lot of work to have it glitch somewhere, feels kind of like a needle in a haystack.
Solved! Go to Solution.
2026-04-25 11:01 AM
I can confirm SDRAM and FMC can be a bottleneck. I worked with a similar dual-core H7 series which had LCD on 16-bit LTDC interface and SDRAM for frame buffers. We had a lot of similar issues when trying simultaneously draw backbuffer in SDRAM while LTDC was streaming frontbuffer from SDRAM. AN4861 from page 28 and forward explains what's going on and where the limits are. Basically it's a challenge to put two trucks on a narrow two-way street. We suspected electronic glitches also, but couldn't find any. If you suspect that, then make a test procedure. Our final solution was to ditch SDRAM completely and use internal AXI SRAM for single 8-bit buffer with a palette. LTDC peripheral can turn 8-bit palette into 16-bit color space on-the-fly. Lost some color accuracy and had to do tricks to time the frame updates onto blanking periods, but otherwise it worked very well.
For your case I have some ideas:
1. Review DMA transfer priorities. Obviously SDRAM to LTDC is the highest. SPI -> SDRAM could be next. Unpack and debayer is probably done in SW only so could be the last priority (if not using DMA, then it probably is anyway).
2. Review cache line and FMC memory alignments. LTDC prefers transfers to be 1KB aligned. But this only helps a little, if any.
3. Reduce amount of read & write back operations from SDRAM. E.g. combine camera frame upscaling with unpack and de-bayer operations.
4. Move something into internal SRAM (you have combined 1MB of it). Maybe use it for packed camera frames?
5. If your LCD isn't 24-bit and is 16-bits, then maybe reduce camera frame color depth already at de-bayer phase to reduce size of transfers required? Unless you save camera image or do something else.
6. Not an idea, but a comment about using D-cache on frame buffers. LTDC doesn't care about it. DMA2D doesn't care about it. Only if you write frame buffer with CPU pixel-by-pixel it gives benefit as it flushes written data with a larger chunk. But since cache is so small compared to the frame buffer, I think you just waste the cache for that short-lived data. Rather not cache frame buffers at all and let frequently used RAM data to live in cache. If you want to speed up CPU drawing operations, use scratch area and transfer that with DMA2D.
2026-04-25 11:01 AM
I can confirm SDRAM and FMC can be a bottleneck. I worked with a similar dual-core H7 series which had LCD on 16-bit LTDC interface and SDRAM for frame buffers. We had a lot of similar issues when trying simultaneously draw backbuffer in SDRAM while LTDC was streaming frontbuffer from SDRAM. AN4861 from page 28 and forward explains what's going on and where the limits are. Basically it's a challenge to put two trucks on a narrow two-way street. We suspected electronic glitches also, but couldn't find any. If you suspect that, then make a test procedure. Our final solution was to ditch SDRAM completely and use internal AXI SRAM for single 8-bit buffer with a palette. LTDC peripheral can turn 8-bit palette into 16-bit color space on-the-fly. Lost some color accuracy and had to do tricks to time the frame updates onto blanking periods, but otherwise it worked very well.
For your case I have some ideas:
1. Review DMA transfer priorities. Obviously SDRAM to LTDC is the highest. SPI -> SDRAM could be next. Unpack and debayer is probably done in SW only so could be the last priority (if not using DMA, then it probably is anyway).
2. Review cache line and FMC memory alignments. LTDC prefers transfers to be 1KB aligned. But this only helps a little, if any.
3. Reduce amount of read & write back operations from SDRAM. E.g. combine camera frame upscaling with unpack and de-bayer operations.
4. Move something into internal SRAM (you have combined 1MB of it). Maybe use it for packed camera frames?
5. If your LCD isn't 24-bit and is 16-bits, then maybe reduce camera frame color depth already at de-bayer phase to reduce size of transfers required? Unless you save camera image or do something else.
6. Not an idea, but a comment about using D-cache on frame buffers. LTDC doesn't care about it. DMA2D doesn't care about it. Only if you write frame buffer with CPU pixel-by-pixel it gives benefit as it flushes written data with a larger chunk. But since cache is so small compared to the frame buffer, I think you just waste the cache for that short-lived data. Rather not cache frame buffers at all and let frequently used RAM data to live in cache. If you want to speed up CPU drawing operations, use scratch area and transfer that with DMA2D.
2026-04-25 12:33 PM
Thank you Mikk! I will try each of your ideas. One big picture question, it comes up when I describe the problem to coworkers, and I don't have a good answer: this is a pretty powerful processor and a small-ish display... how are we running into throughput issues at all, SDRAM or not? Aside from using SDRAM, are we doing something wrong architectually? I've been trying to design for speed all the way, although image quality also important. I've looked through AN4861 (it's why we spliced in DMA2D, which solved some severe shifting), but maybe need to read it again, I don't quite have the mental model yet as far as throughputs and capacities.
2026-04-25 2:58 PM
Well, there are two things - bandwidth and timing. The system may have the mathematical performance to do everything that you want over certain time, but not at the same moment. AFAIK H7 has quite short FIFOs on SPI, LTDC and pretty much every peripheral. When LTDC is fetching its block of pixels through bus matrix, DMA has lined up and finally SPI shows up with its few words, it then it has to wait and due to its tiny FIFO, overrun is quick to arrive. To verify that, check the overrun flags. The "disco" should then mean that old pixels remain on new frame.
I'll add one more idea: try to orchestrate the timing of operations. Use LTDC line interrupt to start camera image processing after last visible line. That's when LTDC is releasing SDRAM and just sending blanking lines - I guess there are some amount of them?
2026-04-28 12:10 AM
As an update... I looked at the priorities and they were already very high, checked alignments and improved in a couple of places. I tried turning off caching and that slowed my CPU based debayering *way* down so that was a no-go.
After a bunch of experiments turning things on and off to see what seemed to reduce load enough (it was perhaps camera SPI/DMA and display DMA2D that didn't like each other), I was able to *just barely* squeeze two raw camera buffers into RAM, and then went from a three buffer system to a two buffer system. So far that seems to be working (knock on wood, need to integrate this into production and see what happens).
Thanks for the ideas Mikk! Thanks to you we treated it as an SDRAM bottleneck issue right away, and pulled the trigger on moving things to RAM.
Still a little weirded out by this issue though. During all of my experiments, the rhyme and reason of what was causing the most load and which reads/writes would disrupt each other and which wouldn't, never really made sense. I don't have a better approach for next time as far as predicting this problem (or if this problem will recur here), I'll just test for this early/often I guess.
2026-04-28 1:58 AM - edited 2026-04-28 2:06 AM
Hi @Chris Rice . I'm glad you found some ways to improve it.
I need to correct my comment about caching. Not using cache at all wasn't accurate. For de-bayering, areas [A1-A3] surely need to be cached for read operations because de-bayering algorithm that you probably run in software, reads each mosaiced pixel multiple times. You wrote that you clean (flush) the data cache and that was what I wanted to comment. Cleaning makes sense when you write into SDRAM with CPU because cleaning will transfer data that was actually first written into cache, from cache into SDRAM. So I thought maybe you write de-bayered pixels one-by-one and want to avoid expensive SDRAM access of couple of bytes and let cleaning do it in bigger chunks. Thus cache gives boost, but maybe there's a better use of cache elsewhere and instead of writing through cache into SDRAM, write those pixels into scratch area (small 2D area, e.g. single 460 pixel line) and let DMA2D transfer it into SDRAM. Alternative is that you configure [B] and [C] areas with MPU to be write-through read-allocate.
Another issue that I spotted right now - you wrote that you clean cache "on the source buffer before [A] to [B] and [B] to [C]". This is a bit weird. Clean makes sense after write by CPU, invalidate makes sense before read with CPU. There is also clean & invalidate function that does both, but it only makes sense when frame buffer is modified....
2026-04-28 8:30 AM
Good catch on the clean vs invalidate, writing that down I also realized I had the clean where I should have had an invalidate, and fixed it -- should have updated my question. Was hopeful that would fix the problem but it didn't have a noticeable effect. Moving things to RAM seems to have done the trick.