2024-07-18 03:45 AM - edited 2024-07-18 08:09 AM
Hi everyone,
I am observing some data corruption in an external SDRAM connected to an STM32 F7 FMC peripheral and I am trying to understand what is happening and how to prevent it.
I forgot to mention in the initial description above that if the "SDRAM write 1" is removed, then the data read back matches the "SDRAM write 2".
Kind regards
Solved! Go to Solution.
2024-07-19 08:19 AM - edited 2024-07-20 12:39 AM
I never discard the possibility of a bug in the code but this time it not the case.
We developed several boards with two types of SDRAMs: one 8MB and the other 32MB, but both SDRAMs share the same pinout.
The issue only manifested itself on the boards with the smaller 8MB capacity (not on the 32MB) and two factors may have contributed to it (number 1 below definitely... number 2, not sure):
Once the "number of rows" and "number of columns" were corrected to 12 and 8, the application worked as expected.
Still curious to know whether it would have worked had the A12 pins not been routed/connected in the board!?
Sorry for the spam and thank everyone for the support.
Everyone's input steered me in the right direction.
Kind regards.
2024-07-18 04:06 AM
The write buffers aren't that deep
The cache would be the thing implementing write-back vs write-thru the former occurring at line eviction.
2024-07-18 04:29 AM
Hi Tesla,
sorry for my ignorance.
If I understand you correctly, my theory is wrong and the issue I am facing is not related to concurrency / timing access from the FMC to the external SDRAM but caching issues. And also that if the caching is set to "write-through" then the issue should not be observable
Could you share how do we control the FMC caching in the STM32 F7 MCU?
How do we check which "write-through" vs "write-back" is set and how do we change it to "write-through"?
2024-07-18 06:04 AM
Is this repeatable? Is the same memory region 0xFF after reading it back? If not, I would expect this is a hardware or signal integrity issue. Is this a custom board? Does not look like a cache issue because it appears in the middle of a large buffer.
2024-07-18 08:08 AM - edited 2024-07-18 08:10 AM
Hi TDK,
Yes. The behavior is repeatable. When reading back the buffer, the 0xFF are always observable in the same position in the buffer.
Yes. The board is a custom one.
In an attempt to check whether it could be a cache issue, I tried to configure the FMC/SDRAM MPU region as non-cacheable, but the behavior remained the same. (For context, the application is Zephyr RTOS based and I changed this setting using Zephyr's device tree: changed "zephyr,memory-attr = <( DT_MEM_ARM_MPU_RAM )>;" to "zephyr,memory-attr = <( DT_MEM_ARM_MPU_RAM_NOCACHE)>;". I could not go into the kernel weeds to confirm if this change did what I was looking for as at the moment I am not familiar with it). So either this update didn't do what I was looking for or this is really not a cache issue.
One, perhaps important, detail that I forgot to mention in my initial post (apologies for that) is that if the "SDRAM write 1" is removed, then the data read back matches the "SDRAM write 2".
Have you got any suggestions on how I could continue this investigation?
2024-07-18 11:36 AM - edited 2024-07-18 11:37 AM
Sure it's not a code bug? Weird behavior happens after value 0x80, which is nice and even in terms of binary data. Perhaps show a complete program that exhibits the problem. I'm not sure how the OS would be involved in an SDRAM write. Surely that wouldn't be getting buffered by the OS.
Change starting address by 4 bytes, what effect does that have?
Also have the confounding factor that your screenshot shows "1 messages dropped". Unclear what that means. It'd be best to examine the memory directly using the debugger.
2024-07-18 01:12 PM - edited 2024-07-18 08:19 PM
As long as all memory access goes through the cache, and there is no concurrent access (e.g. multicores) the values you "read" from "memory" should always reflect the last value written. The difference in cache strategies only determines the policy for when writes are flushed to SDRAM. There should be no observable effect on data reads from the software point of view.
The data you read from "memory" may actually originate from the cache instead of actual SDRAM, but this is what you want, since that's exactly how caches improve performance.
2024-07-19 08:19 AM - edited 2024-07-20 12:39 AM
I never discard the possibility of a bug in the code but this time it not the case.
We developed several boards with two types of SDRAMs: one 8MB and the other 32MB, but both SDRAMs share the same pinout.
The issue only manifested itself on the boards with the smaller 8MB capacity (not on the 32MB) and two factors may have contributed to it (number 1 below definitely... number 2, not sure):
Once the "number of rows" and "number of columns" were corrected to 12 and 8, the application worked as expected.
Still curious to know whether it would have worked had the A12 pins not been routed/connected in the board!?
Sorry for the spam and thank everyone for the support.
Everyone's input steered me in the right direction.
Kind regards.
2024-07-19 09:08 AM
Thanks for coming back with the answer.