cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F746 Firmware and Debugger Lockup

JD1800
Associate II
Posted on August 03, 2017 at 19:09

We are seeing a lockup issue with a STM32F746 design.  On the surface it appears like a normal firmware lockup, then the watchdog triggers a reset and it recovers.  But if I disable the watchdog and try to use the debugger to see what has happened, I cannot get the debugger to connect.  Every time it hangs, the debug port is also inoperable until the processor is reset.  From my testing, it appears that the code is completely stopped.  Using GPIO toggles, I cannot detect any interrupt activity, but the LCD controller keeps refreshing the LCD and is reading the display data from SDRAM.

The issue can be quite infrequent.  The device uses a CAN connection, and I was not able to see it lockup until we had several other CAN devices on the network.  It also seems to be worse in the end application where there is much more CAN traffic, but if I generate CAN traffic using a USB-CAN adapter, I cannot get it to fail more frequently.  It sometimes will lockup a few times within an hour, or sometimes takes more than a day to occur.  Yet it does not seem dependent on the actual CAN traffic, since if I have two of the boards connected on the same CAN network, they do not lockup at the same time.

I have tried using the debugger ITM interface to output various debug info, but it seems to prevent or at least make the lockup much less frequent.  Most recently I created a trace buffer that gets dumped over ITM following a watchdog reset, but it has now run for 2 days without lockup.

Has anyone else seen something like this?  Does anyone know if there is anything firmware can do that could cause the debugger to fail to connect?  I did use the GPIO lock register to lock the configuration for the SWD pins, but the debugger still fails to connect once the board locks up.  I did not see any other registers that looked like they could prevent the debugger from working.  The product is already being produced installed in the end application, so any assistance would be appreciated.

#stm32f7 #debugger-connection-failure #stm32f7-lockup
12 REPLIES 12
Andrew Berry
Associate II
Posted on April 19, 2018 at 22:27

I'm having a similar issue on an STM32F777 with a single-threaded application that also uses QSPI.  For reference, in case it brings up other commonalities with your applications, I'm also using the Ethernet MAC, LCD via FMC, USB Host (using CubeMX-provided stack), as well as the other basic peripherals such as UARTs, SPI, etc.

During the hang, no interrupts seem to run at all, even at maximum priority.  Even the WWDG EWI does not run prior to the WWDG resetting the system.

I did not suspect the QSPI itself, but I do have the timeout enabled, and come to think of it, it was around the time I added functionality that hits the QSPI a lot harder that I started seeing the hangs.  I will give this a try now, and report back.

When I added a trace buffer in the backup RAM that could be dumped after reset, the problem stopped. 

Did you implement this using ETM, or your own logging system?  I've done the latter, but if you have any guidance or suggested reading on using ETM I'd be grateful to hear it  -- I haven't seen any other practical guidance on this.  When I increased the amount of logging my DIY trace system does it seems to have increased the frequency of the hangs.  Before the frequency ranged from not once in ten hours to four times an hour, now it's almost every five minutes--which makes this a great time to try the QSPI timeout fix I guess!

Andrew Berry
Associate II
Posted on April 21, 2018 at 09:10

I've now tested this on both my application hardware and on a Nucleo-F767ZI, and can confirm that the QSPI timeout feature is the culprit. I've created a minimal program that sweeps the LPTR register value while repeatedly reading from QSPI in memory mapped mode and it reliably locks up the MCU. Scoping nCS and a toggled IO quite clearly shows that it happens when a QSPI read is attempted right about when nCS is released after the timeout occurs.

This definitely looks like something that needs to get added to the errata sheet.

If anyone cares to experiment with it themselves, the minimal demonstration is attached as an Atollic project. The actual program itself is quite simple and can be found entirely in main.c--and much of that file is QSPI config defines and GPIO macros to avoid dragging in any HAL or CUBE baggage. It doesn't use any peripherals aside from QSPI and GPIO, and it is not necessary to connect an actual QSPI memory .

________________

Attachments :

STM32F7 QSPI fault.zip : https://st--c.eu10.content.force.com/sfc/dist/version/download/?oid=00Db0000000YtG6&ids=0680X000006Hxfh&d=%2Fa%2F0X0000000b0m%2FYT5zu8VVi.DDZsvwxGitZE.uBZ5Fo9hXlnkKydYcZN4&asPdf=false
Jaroslav BECKA
ST Employee

Dear STM32 users,

I am sorry to hear about your problems, but I think I may offer you a solution.

I am convinced that this is caused by speculative read accesses performed by the Cortex-M7 core to Normal memory regions. This is related to dynamic branch prediction, the processor speculatively prefetches from branch target addresses. The Cortex-M7 processor makes extensive use of these speculative read accesses to Normal memory areas. In addition to that it can also perform re-ordering of memory accesses in these areas in order to optimize them.

If the processor is trying to read from an address that does not respond with a completion of the transfer on the bus, as there is no physical memory on this address, it can cause high latency or system errors.

If any external memory is used in the application (External RAM memory area 0x6000 0000 - 0X9FFF FFFF is by default Normal), the MPU must be configured properly according to the memory map of the application.

Your problem will be probably solved by creating a background region for the whole QSPI memory area that will be configured as Device or Strongly-ordered. Than - based on the size of your memory - you create a subregion with memory attributes according to the purpose of this memory. The processor cannot perform speculative reads to Device or Strongly-ordered memories.

You can find more about this in AN4861: LCD-TFT display controller (LTDC) on STM32 MCUs

or AN4838: Managing memory protection unit (MPU) in STM32 MCUs or in Cortex-M7 programming manual (PM0253).

Configuring the MPU properly is a must in Cortex-M7 based microcontrollers.

I hope it will help you to solve the issue.

Good luck!

Best regards,

Jaroslav