2024-05-29 01:40 AM - edited 2024-05-29 02:03 AM
Hi!
I'm working on a STM32H7 running with FreeRTOS.
I'm facing a quite particular bug. One specific variable, which is basically a counter, sometimes get its value corrupted.
This happens inside an specific task, which is actually quite simple, and overall looks something like:
//Task initialization
...
variable_of_interest = 0;
//Task infinite loop
while (1) {
//do some simple stuff
variable_of_interest++;
}
I see that (apparently randomly) variable_of_interest sometimes just goes back to value '1' and stays like that, even if going through the loop again.
First I suspected about some overflow so I checked all memory definitions for the FreeRTOS tasks, as well as system stack, heap, everything. There seems to be no issue and actually all stacks and heaps are quite free.
Then I defined the variable as volatile, suspecting that it might be some optimization issue. Didn't work either.
I also disabled cache, also didn't help.
I defined the variable statically, outside the task, so to be able to see it in the memory map file. I see that it is here:
.bss.variable_of_interest
0x24022618 0x4 _xxx/release/yyy.o
I checked variables around it and before it there is another counter, and after it there is a filepath name which is a string that is just being used in the task, but not being modified whatsoever.
I'm not being able to reproduce it when debugging, so I did this basic check in my code to validate some values regarding memory contents:
if (variable_of_interest == 1) { // This condition is met only when the bug appears
uint32_t* variable_address = &variable_of_interest;
uint32_t memory_map_address = 0x24022618;
uint32_t variable_address_data = *((uint32_t*)variable_address );
uint32_t current_variable_of_interest = variable_of_interest;
uint32_t memory_map_address_data = *((uint32_t*)memory_map_address );
printf("variable_address: %p memory_map_address: 0x%08lX\r\n", variable_address , memory_map_address);
printf("variable_address_data: 0x%08lX\r\n", variable_address_data );
printf("current_variable_of_interest: 0x%08lX\r\n", current_variable_of_interest);
printf("memory_map_address_data: 0x%08lX\r\n", memory_map_address_data);
}
Until the bug happens, both address are the same, as well as 3 data values. Then I was getting:
variable_address: 0x24022618 memory_map_address: 0x24022618
variable_address_data: 0x00000001 current_variable_of_interest: 0x00000001 memory_map_address_data: 0x00000003
Being 3 the actual correct value. This means that the memory content of the variable is actually correct, however my program it's somehow getting it from somewhere else and thinks is 1. I'm aware that optimizations during compilation might make this last piece of code check different things from what I'm expecting. And actually if I disable all optimization, so far I was not able to reproduce the bug. However, if volatile keyword is not helping, I'm not sure how optimization might be bugging me.
Could it be something related to stuff getting corrupted in the FreeRTOS stack? Although I'm sure that the task is getting all needed stack memory.
What could be another reason why this happens? anything I might try to do differently? Sometimes just making many changes in the code I get rid of the bug, but that is not helping me identifying the root cause...
Thanks!
2024-05-29 08:05 AM
Hello @TVare.1
The issue you're having with variable_of_interest getting corrupted is tricky. It seems to happen at random during debugging. Here are a few potential causes for this issue:
If the issue persists, Could you please share the code that was used, to reproduce the issue in order to allow a better analysis of the problem?
2024-05-30 12:21 AM
Hi,
I already checked the individual stack size and high water mark of the tasks, and all of them are fine and with even a couple of thousands of bytes free even when the bug happens.
Regarding optimizations, yes, when removing all of them I was not able to reproduce the bug. However this is not helpful as I do need the optimizations to fit everything in memory and reach performance goals.
At the moment, the workaround has been to move some dynamic allocated buffers outside the tasks. By doing this, the bug has not appeared again (yet). But since I have not identified the original root cause, I'm not fully satisfied with the solution, cause I don't know if it's really solved or if it has been a coincidence.
At the moment I'm not able to share the code, but I'll try to share more information if this issue gets bigger.
Thanks