Strangest problem ever seen - Specific value in flash causes DMA underrun. Any ideas?

PWint · ‎2021-12-06

I have the strangest problem ever seen and I would not believe it has it not been this repeatable:

I have a single DMA controller running 3x streams. Two are for ADC and one for DAC. All three can run simultaneously, the DAC at 1 Msps and the ADCs both at 40 ksps with the ADC DMA only constantly and the DAC DMA running for about 1 second on command.

After implementing our embedded loader that writes to flash, I started getting DMA FIFO errors on the DAC stream.

After A LOT of trial-and-error I found that when the word at address 0x0810000C was programmed, then I would see this problem. The rest of the flash in sector 8 is still erased.

The following process is 100% repeatable, even on different boards (same CPU, different IO configuration, same code):

Use STM32CubeProgramer for all steps below.
Check that entire sector 8 is erased (all 0xFFFFFFFF).
Write flash @0x0810000C with 0x08000000.
Perform Hard reset and run.
Perform a DMA access on DAC while ADC stream running.
Triggers a FIFO error on DAC stream.
I can now repeat steps 4 and 5 above and consistently get the same result.
Write flash @0x0810000C with 0xFFFFFFFF.
Perform Hard reset and run.
Perform a DMA access on DAC while ADC stream running.
DAC DMA transfer completes normally.
I can now repeat steps 9 and 10 and consistently get the same results but I can also repeat steps 1 to 11 and consistently get the same results.

I've used a hardware breakpoint to make sure none of my code ever access flash in sector 8.

It gets stranger: To check that my hardware breakpoint was set correctly, I wrote some test code to read @0x0810000C. This worked, but when I programmed my board a second time it would not come out of reset. Again, I had a VERY LONG investigation. The RESET pin on my board was toggling at 13 kHz and the debugger would in no way re-connect.

My solution was to change my BOOT0 pin from low to high so that my test code would never execute. I could then use the CLI of the cube programmer (neither the IDE nor GUI worked) to do a mass erase ONLY when I did not connect under reset. This managed to recover the board.

Any idea why this flash address is special? I have considered that I may have erroneous code but my code runs flawlessly for days on end until I write that one flash address. Not all values work. Writing it to all zero, 0xA5A5A5A5 would not cause the same problem.

TDK · ‎2021-12-06

Include your chip number. There are thousands of different STM32 parts.

> The RESET pin on my board was toggling at 13 kHz and the debugger would in no way re-connect.

Indicates the chip is resetting itself. Examine SCB->CSR to determine why.

If you feel a post has answered your question, please click "Accept as Solution".

waclawek.jan · ‎2021-12-06

There is some reason why you write various values to flash @0810000C, so I assume this value is used somewhere else in your program, and that's what causes your problem.

Reduce your program by removing features and observe when does the problem disappear.

JW

PWint · ‎2021-12-06

I am using STM32H745.

When I was in this last state (where chip kept on resetting itself) I could not examine CSR because I could not connect to the CPU.

PWint · ‎2021-12-06

I am using bank 1 when upgrading software and swopping the banks after download is complete. This is how I originally found my problem: It would only occur on boards that has been programmed, so it had data written in bank 1 of Flash. I never read that memory. It is only written when doing software upgrade.

It is through reducing functionality that I found that only that one specific address has a role to play. I agree most reasonable cause is something in my code changing because of that value being written but my code does not use it.

I am hoping that someone knows that the ROM code may be using it prior to my code even running?