2021-09-01 03:25 AM
We experience a stalled CPU when accessing (writing) the internal FLASH memory
taskENTER_CRITICAL();
__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_EOP | FLASH_FLAG_OPERR | FLASH_FLAG_PROGERR | FLASH_FLAG_WRPERR | FLASH_FLAG_PGAERR | FLASH_FLAG_SIZERR | FLASH_FLAG_PGSERR | FLASH_FLAG_MISERR | FLASH_FLAG_FASTERR | FLASH_FLAG_RDERR | FLASH_FLAG_OPTVERR | FLASH_FLAG_PEMPTY);
// Search me why this bit is checked by STM HAL LIBRARY prior to writing, but it should be 0, otherwise writing cannot happen (thus, clear the bit)
SET_BIT (FLASH->SR, (FLASH_SR_PEMPTY));
// Loop writing until end address is reached
int bufferIndex = 0;
while((_address < _endAddress) && (FLASHStatus == HAL_OK))
{
uint64_t data;
data = (uint64_t)buffer[bufferIndex++];
data = data | (uint64_t)buffer[bufferIndex++] << 32;
FLASHStatus = HAL_FLASH_Program(FLASH_TYPEPROGRAM_DOUBLEWORD, _address, data);
// 64bit data register requires increment by 8 for next word address
_address = _address + 8;
}
if (FLASHStatus != HAL_OK)
{
returnValue = ERROR;
}
}
taskEXIT_CRITICAL();
This function leads to a stalled CPU (does not restore either, Watchdog must reset the CPU), when I have my UART logger running in parallel. The UART logger is a dedicated task printing strings via UART1, interrupt-based (TXE).
In my code, I disable all interrupts (taskENTER_CRITICAL) prior to accessing the FLASH. But what I see in the registers when the stallment occurs (I access the CPU via STM CubeProgrammer after that happend), is that the TXE and TC bits in the UART1 are set.
So, my guess is, that the logger is busy, the TXE interrupt is serviced (putting a byte on TX pin via the periphery). The ISR is left, and the FLASH write function disables all interrupts. Right after, the UART periphery is finished putting the byte out and raises TXE and TC. What happens at that point?
We are already in contact with an STM field engineer, but up to no, to no avail. This is really driving me nuts
2021-09-01 11:11 AM
Where precisely does it "stall"? Show a stack trace. If CPU is powered, it's probably executing code somewhere. Do you unlock FLASH prior to this somewhere?
2021-09-01 11:55 PM
Yes, unlock of FLASH is done. When not using the logger, everything works just fine.
I will try to make a stack trace later, I am currently into a different bug that needs fixing, first. And, I have to figure out how to exactly do a stack trace, never did that on this platform
Thanks for your reply anyway, I will come back to it as soon as I can
2021-09-10 03:17 AM
OK, I got my hands on a JLINK debugger, put debugging level to MAX and let the CPU ran into the problem. I cannot actually put a breakpoint on the issue, so I let the CPU run into the stallment (whatever that is) and hit the PAUSE button
This is the stack that the debug viewer puts out:
Thread #9 536870932 (DEVTask : Running [P: 1]) (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)
USART1_IRQHandler() at isr.c:122 0x80075f8
<signal handler called>() at 0xfffffffd
prvPortStartFirstTask() at port.c:267 0x8006c30
xPortStartScheduler() at port.c:379 0x8006ef6
The issue comes from the CLEAR function (not the write, as mentioned earlier)
taskENTER_CRITICAL();
__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_EOP | FLASH_FLAG_OPERR | FLASH_FLAG_PROGERR | FLASH_FLAG_WRPERR | FLASH_FLAG_PGAERR | FLASH_FLAG_SIZERR | FLASH_FLAG_PGSERR | FLASH_FLAG_MISERR | FLASH_FLAG_FASTERR | FLASH_FLAG_RDERR | FLASH_FLAG_OPTVERR | FLASH_FLAG_PEMPTY);
SET_BIT (FLASH->SR, (FLASH_SR_PEMPTY));
// Use 0 as bank parameter - the current CPU has no banks
FLASH_EraseInitTypeDef pEraseInit;
pEraseInit.Banks = FLASH_BANK_1;
pEraseInit.NbPages = 1;
pEraseInit.Page = page;
pEraseInit.TypeErase = FLASH_TYPEERASE_PAGES;
uint32_t PageError;
HAL_StatusTypeDef result;
result = HAL_FLASHEx_Erase(&pEraseInit, &PageError);
taskEXIT_CRITICAL();
LOGGER_ERROR("Error clearing: %X", HAL_FLASH_GetError());
I have put breakpoints on #15 and #19 - while #15 is reachable, #19 is not. When I hit the breakpoint on #15 and step further, the return from HAL_FLASHEx_Erase is HAL_OK. This indicates that the FLASH page has been cleared and clearing is actually finished (the busy flag is polled within that function)
The ST field engineer mentioned the following yesterday:
"After writing the last data in the USART_TDR register, it is mandatory to wait for TC=1
before disabling the USART or causing the microcontroller to enter the low-power mode"
I do not disable the USART, I only disable the interrupts in basePRI and re-enable them afterwards.
basePRI is 0 at the moment I hit PAUSE, so re-enabling of the interrupts did occur.
The USART1 ISR register is 0x006210DA
I think that there is an interrupt active that is not handled.
I will update later on with more information
2021-09-10 03:28 AM
SysTick?
Does the USART overrun?
It could continously reenter the USART IRQ handler if errors are flagging, and not cleared.
See overrun, framing, parity, noise flags on reception.
2021-09-10 05:24 AM
Overrun error was the problem. I only do not understand WHY it occurs
The USART does not receive anything, the RX line is not even attached somewhere.
Yes, turning off RXNE is the most simple way to remove the issue, but I'd like to understand why there is this Overrun Error all of a sudden
2021-09-10 06:07 AM
Found it
The UART PCB routing appears to be poor - The RX is reading parts of the TX data when the pin is floating.
2021-09-10 08:51 AM
RX and TX both need external pullups to prevent this. They should never be floating.
2021-09-10 09:13 AM
Or more critically RXNE shouldn't be enable on a dead port, and perhaps not even enable the RX side of the U(S)ART, as these can be enabled independently.
Watch power routing, and having sufficient bulk capacitance. The Write/Erase uses a charge pump to develop programming voltages internally, might be stressing things in ways normal operation does not.
2021-09-10 10:38 AM
the hardware designer routed TX and RX directly from the MCU IOs to the onboard connector without any pull-up or pull-down resistors. I have to admit that I assumed them to be there.
I added them in the MCU