Random Usage faults running Hal code on 32F410

nrose · ‎2019-09-17

I have a fairly simple bit of code running in a CubeMX generated scheme, using HAL code to communicate over SPI. I run a loopback test a thousand times - sometimes it passes, other times I get an INVSTATE Usage Fault which appears (according to Keil) to be caused by the first line of the following code:

{

/* Transfer loop */

while(hspi->RxXferCount > 0U)

{

/* Check the RXNE flag */

if(__HAL_SPI_GET_FLAG(hspi, SPI_FLAG_RXNE))

{

*((uint16_t*)pData) = hspi->Instance->DR;

pData += sizeof(uint16_t);

hspi->RxXferCount--;

}

else

{

/* Timeout management */

if((Timeout == 0U) || ((Timeout != HAL_MAX_DELAY) && ((HAL_GetTick()-tickstart) >= Timeout)))

{

errorcode = HAL_TIMEOUT;

goto error;

}

Timeout is HAL_MAX_DELAY, so HAL_GetTick() is never called.

I'm particularly confused by the fact that this error appears only to be caused by trying to execute non-Thumb code. Can't see how this can happen unless it is mis-reading the flash randomly and rarely. The clock is 16MHz to give the flash an easy life.

waclawek.jan · ‎2019-09-17

Is this a "known good" hardware such as Nucleo/Disco, or your own board? In the latter case, can you try the same code on some of the Nucleo/Disco boards?

JW

Tesla DeLorean · ‎2019-09-17

For Keil check you have some appropriate stack depth (startup.s) for your interrupt and call-back routines to function without trashing things.

While I have alignment issues with the code shown, it really shouldn't exhibit random faults.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

nrose · ‎2019-09-18

My first thought, so I doubled the stack space - it now has 2K, and probably only needs a few hundred.

Tesla DeLorean · ‎2019-09-18

On the F4 I'd double check voltage and capacitors on VCAP pin(s)

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

nrose · ‎2019-09-18

Alas not. The cap is 4.7u ceramic, as per data sheet. The voltage is 1.134V. VOS is 3.

nrose · ‎2019-09-18

Has to be my hardware - it's part of a complex system, so using an EVK would be tough.

But, I'd figure a single chip micro running from internal flash with an internal clock and a good supply and ground, withe manufacturer's rated caps, should be pretty sure to work, otherwise what's the point ?

nrose · ‎2019-09-19

Fixed it, although I don’t understand everything.

There is a bug in (some or all of) the STM32 processors where the SPI BSY status can occasionally be in the wrong state.

This is mentioned in the chip errata, along with some workarounds.

The HAL SPI code I was using implements one of the workarounds, but the included timeout was too short, so it generates an error, and leaves the SPI in a bad state. For reasons I don’t (and probably don’t need to) understand, this leads to the Usage Fault. My guess is that the bad state resulted in an internal bus overflow at some later point, causing the fault.

Before changing the timeout, I updated to the latest HAL code (1.24.1). . Guess what – the timeout has been increased by a factor of 10 ! Obviously I’m not the only person to have this problem.

There are other problems with the new code – in particular I have to do a dummy read before the main read, but this may be a factor of my application.

waclawek.jan · ‎2019-09-19

With the shorter timeout, wasn't there a conflict on the SPI with some slave, possibly two chips outputting different levels on the same wire at the same time?

JW

nrose · ‎2019-09-19

I only had two devices on the SPI - the STM32 was the slave, and it signalled the host when it was ready for data, so probably preventing any conflict. Note also that the BSY problem is somewhat infrequent.