Random hardfault on STM32F4

DerekSavage · ‎2024-03-30

Hello,

I have been struggling to make this post because I wanted to post a very clear question after I had found the source of the problem. However, I have been working on this problem nightly for two weeks now and even still I can't pinpoint precisely what / where my program is going wrong. However, no matter what I seem to do, trying to get my STM32F4 to do anything useful will cause a fault anywhere between 5 seconds to 5 minutes later.

Code:

- https://github.com/DerekSavage1/Word-Clock-Rev-3

Hardware:

- STM32F411CE on custom circuit board (can provide schematics)

- 24 Mhz external crystal

- 32.768kHz external RTC crystal

The steps that I have taken:
- Commented out sections of code until it worked. The program will only work if the while loop is empty or only declares a variable.

- Converted most arrays to switch statements to avoid memory errors

- Enabled all warnings with -WPedantic
- Rewrote the matrix logic in a file on my computer without HAL calls and checked with all warning flags and ASan

- Stepped through the code in debug mode. Never found the source of the crashes as it takes multiple loops to cause a fault.

- Ordered a STM32F4 on amazon to see if it will hard fault on a differently designed board. It will come in within a few days.

- Increased stack size from 0x400 to 0x800, and 0x1200, and 0x10000. Same issue
- Always looked at fault analyzer and stack trace. They almost always look like this:

#0 HardFault_Handler () at ../Core/Src/stm32f4xx_it.c:87

#1 <signal handler called>

#2 0x00000000 in ?? ()

#3 0x08001324 in activateDigit (digit=113 'q') at ../Drivers/Numeric_Display/Numeric_Display.c:32

Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Some nights I would get an idea. What if its because of X or I haven't looked at Y.
However, last night I had only one thing in my loop: A function call. I had tested the function on my own computer with no errors. I try again today and the function appeared to work fine and would only fault if the function is called in conjunction with HAL_RTC_GetTime() and getDate().

While it is possible that I could strip out more code to illustrate a minimum viable fault, this is an example of how little is in my main function:

int main(void)
{
  HAL_Init();
  SystemClock_Config();

  MX_GPIO_Init();
  MX_DMA_Init();
  MX_TIM1_Init();
  MX_RTC_Init();
  MX_TIM3_Init();

  HAL_TIM_Encoder_Start(&htim3, TIM_CHANNEL_ALL); // Start the encoder interface

  while (1)
  {
	HAL_RTC_GetTime(&hrtc, &sTime, RTC_FORMAT_BIN);
	HAL_RTC_GetDate(&hrtc, &sDate, RTC_FORMAT_BIN);

	displayTime(sTime.Hours, sTime.Minutes, color, brightness);
//	DMA_Send(&htim1);
  }
}

Even though I had tested displayTime() on my machine I decided to comment out most of the function and it would still fault:

displayTime(uint8_t sTime.Hours, uint8_t sTime.Minutes, uint32_t color, uint8_t brightness) {
   color = 0x404040;
   //all else is commented out
}

The reason I have been saying "fault" instead of specifying which type of fault is because that, too, is different each time. Invalid instruction, stack error, etc. I can get a list of them if needed.

Is it possible that I have configured something incorrectly or there is an error in my pre-generated code that is creating a memory error? My guess is that the code has been stomped on by an initialization which causes actions like reading encoder values, function calls, and RTC calls to go wrong.

I have an oscilloscope if you would like me to probe something.

Any help would be greatly appreciated.

AScha.3 · ‎2024-03-30

VCAP should be 4,7u.

If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

STOne-32 · ‎2024-03-30

Dear @DerekSavage ,

May be I overlooked the GitHub files , can you please share the schematics and PCB , in particular power pins / VCAP and associated capacitors and crystal datasheet : 32KHz. I see the system clock is set to PLL using HSI and not HSE .

Cheers,

ST1

AScha.3 · ‎2024-03-30

Hi,

just you didnt tell: can you make a small loop, toggle an output ? (with LED or look with scope)

So simple, small program running fine ? (for hours ?)

Just to be sure, its not a hardware problem, like spikes on supply...

If ok, your problem seem to be the LSE clock.

Try: set using LSI , see if it changes anything.

- leave out any RTC or LSE things . To test , it has to do with this.

If you feel a post has answered your question, please click "Accept as Solution".

DerekSavage · ‎2024-03-30

Yes I can. I am at work at the moment, but I will be home in a few hours and will share them.

DerekSavage · ‎2024-03-30

I was able to get home early.
It won't let me upload .SchDoc files so I hope these screenshots will suffice:

AScha.3 · ‎2024-03-30

VCAP should be 4,7u.

If you feel a post has answered your question, please click "Accept as Solution".

DerekSavage · ‎2024-03-30

I have one somewhere, let me find it and solder it on.

STOne-32 · ‎2024-03-30

Another useful feature also to activate at startup before LSE is on the High drive mode to check , after reset it is set to low drive . To suspect any stability issue with your selected crystal and the 0 ohm resistor that is not necessary and I saw in other designs as source of issue .

Our goal is to eliminate any Hardware issue before going to software . Good debug !

Ciao

ST1

DerekSavage · ‎2024-03-30

I had said many times "I've fixed it this time" only for it to fail some minutes later. That being said, it is behaving as expected right now. I am going to keep it on with my debugger active and if the clock is still functioning a few hours later this might just be the answer