2024-03-30 12:34 PM
Hello,
I have been struggling to make this post because I wanted to post a very clear question after I had found the source of the problem. However, I have been working on this problem nightly for two weeks now and even still I can't pinpoint precisely what / where my program is going wrong. However, no matter what I seem to do, trying to get my STM32F4 to do anything useful will cause a fault anywhere between 5 seconds to 5 minutes later.
Code:
- https://github.com/DerekSavage1/Word-Clock-Rev-3
Hardware:
- STM32F411CE on custom circuit board (can provide schematics)
- 24 Mhz external crystal
- 32.768kHz external RTC crystal
The steps that I have taken:
- Commented out sections of code until it worked. The program will only work if the while loop is empty or only declares a variable.
- Converted most arrays to switch statements to avoid memory errors
- Enabled all warnings with -WPedantic
- Rewrote the matrix logic in a file on my computer without HAL calls and checked with all warning flags and ASan
- Stepped through the code in debug mode. Never found the source of the crashes as it takes multiple loops to cause a fault.
- Ordered a STM32F4 on amazon to see if it will hard fault on a differently designed board. It will come in within a few days.
- Increased stack size from 0x400 to 0x800, and 0x1200, and 0x10000. Same issue
- Always looked at fault analyzer and stack trace. They almost always look like this:
#0 HardFault_Handler () at ../Core/Src/stm32f4xx_it.c:87
#1 <signal handler called>
#2 0x00000000 in ?? ()
#3 0x08001324 in activateDigit (digit=113 'q') at ../Drivers/Numeric_Display/Numeric_Display.c:32
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Some nights I would get an idea. What if its because of X or I haven't looked at Y.
However, last night I had only one thing in my loop: A function call. I had tested the function on my own computer with no errors. I try again today and the function appeared to work fine and would only fault if the function is called in conjunction with HAL_RTC_GetTime() and getDate().
While it is possible that I could strip out more code to illustrate a minimum viable fault, this is an example of how little is in my main function:
int main(void)
{
HAL_Init();
SystemClock_Config();
MX_GPIO_Init();
MX_DMA_Init();
MX_TIM1_Init();
MX_RTC_Init();
MX_TIM3_Init();
HAL_TIM_Encoder_Start(&htim3, TIM_CHANNEL_ALL); // Start the encoder interface
while (1)
{
HAL_RTC_GetTime(&hrtc, &sTime, RTC_FORMAT_BIN);
HAL_RTC_GetDate(&hrtc, &sDate, RTC_FORMAT_BIN);
displayTime(sTime.Hours, sTime.Minutes, color, brightness);
// DMA_Send(&htim1);
}
}
Even though I had tested displayTime() on my machine I decided to comment out most of the function and it would still fault:
displayTime(uint8_t sTime.Hours, uint8_t sTime.Minutes, uint32_t color, uint8_t brightness) {
color = 0x404040;
//all else is commented out
}
The reason I have been saying "fault" instead of specifying which type of fault is because that, too, is different each time. Invalid instruction, stack error, etc. I can get a list of them if needed.
Is it possible that I have configured something incorrectly or there is an error in my pre-generated code that is creating a memory error? My guess is that the code has been stomped on by an initialization which causes actions like reading encoder values, function calls, and RTC calls to go wrong.
I have an oscilloscope if you would like me to probe something.
Any help would be greatly appreciated.
Solved! Go to Solution.
2024-03-30 02:09 PM
VCAP should be 4,7u.
2024-03-30 01:17 PM
Dear @DerekSavage ,
May be I overlooked the GitHub files , can you please share the schematics and PCB , in particular power pins / VCAP and associated capacitors and crystal datasheet : 32KHz. I see the system clock is set to PLL using HSI and not HSE .
Cheers,
ST1
2024-03-30 01:35 PM
Hi,
just you didnt tell: can you make a small loop, toggle an output ? (with LED or look with scope)
So simple, small program running fine ? (for hours ?)
Just to be sure, its not a hardware problem, like spikes on supply...
If ok, your problem seem to be the LSE clock.
Try: set using LSI , see if it changes anything.
- leave out any RTC or LSE things . To test , it has to do with this.
2024-03-30 01:37 PM
Yes I can. I am at work at the moment, but I will be home in a few hours and will share them.
2024-03-30 02:01 PM
I was able to get home early.
It won't let me upload .SchDoc files so I hope these screenshots will suffice:
2024-03-30 02:09 PM
VCAP should be 4,7u.
2024-03-30 02:23 PM
I have one somewhere, let me find it and solder it on.
2024-03-30 02:50 PM
Another useful feature also to activate at startup before LSE is on the High drive mode to check , after reset it is set to low drive . To suspect any stability issue with your selected crystal and the 0 ohm resistor that is not necessary and I saw in other designs as source of issue .
Our goal is to eliminate any Hardware issue before going to software . Good debug !
Ciao
ST1
2024-03-30 04:00 PM
I had said many times "I've fixed it this time" only for it to fail some minutes later. That being said, it is behaving as expected right now. I am going to keep it on with my debugger active and if the clock is still functioning a few hours later this might just be the answer