2018-08-21 05:27 AM
I am trying to use FLASH to emulate EEPROM on stm32F767 based on AN3969. My FLASH is organised as a single bank, so executing from FLASH I would expect execution to stall while writing / erasing. The longest stall is when erasing a sector, which might take 0.5 s for a 32k sector at PSIZE=32 according to the data sheet.
However, with APB1 clocking at 36 MHz, I need to service the WWDG at least every 58 ms, so a stall of 0.5 s causes a WWDG reset, which is what I see.
I have tried placing the relevant routines in ITCM RAM but I still get the WWDG reset. I have disabled all interrupts, so my FLASH_WaitForLastOperation() polls the SysTick interrupt to increment its timeout counter. And similarly I tried to feed the WWDG in the event of a pending WWDG interrupt.
Does anyone know of any "gotchas" as to why I am still getting a stall, even though I'm executing from ITCM RAM?
My current work-around is to slow APB1 so much that the WWDG only needs servicing roughly once a second, and to do that I needed to slow AHB a long way as well. I don't like doing this but it seems to work (even if I leave my FLASH-writing code in FLASH).
One suspected bug I encountered is that the routine EE_VerifyPageFullyErased() seems to check from the specified starting address up to PAGE0_END_ADDRESS - so it cannot work when checking PAGE1.
Any comments?
2018-08-21 12:16 PM
Any read to FLASH during programming will stall the processor. This includes reads of (constant) data - are you sure the routine moved to ITCM RAM does not read constants from FLASH? (hint: examine disassembly or assembler listing; or step through in debugger observing the target addresses from which ldr read).
In F7, I am not sure if the processor+cache may perform significant amount of prefetch, which would run in parallel with execution from ITCM RAM for long enough to cause this sort of problems, but if you find no other culprit, it may be worth to try to go along this path too.
JW
2018-08-22 02:39 AM
The source-code around the fault (from stm32f7xx_hal_flash_ex.c) is as below. I have added the suffix _RAM to routines that are placed in ITCM RAM:
/* Erase by sector by sector to be done*/
for(index = pEraseInit->Sector; index < (pEraseInit->NbSectors + pEraseInit->Sector); index++)
{
FLASH_Erase_Sector_RAM(index, (uint8_t) pEraseInit->VoltageRange);
/* Wait for last operation to be completed */
status = FLASH_WaitForLastOperation_RAM((uint32_t)FLASH_TIMEOUT_VALUE);
/* If the erase operation is completed, disable the SER Bit and SNB Bits */
CLEAR_BIT(FLASH->CR, (FLASH_CR_SER | FLASH_CR_SNB));
if(status != HAL_OK)
{
/* In case of error, stop erase procedure and return the faulty sector*/
*SectorError = index;
break;
}
}
Single-stepping disassembly is (too small for me to see):
Or as a code-snippet without the addresses
9B01 ldr r3, [sp, #4]
689B ldr r3, [r3, #8]
9302 str r3, [sp, #8]
E01E b 0x0000010C
9B01 ldr r3, [sp, #4]
691B ldr r3, [r3, #16]
B2DB uxtb r3, r3
4619 mov r1, r3
9802 ldr r0, [sp, #8]
F000F830 bl 0x0000013C <FLASH_Erase_Sector_RAM>
F24C3050 movw r0, #0xC350
F000F886 bl 0x000001F0 <FLASH_WaitForLastOperation_RAM>
4603 mov r3, r0
F88D300F strb.w r3, [sp, #15]
4B12 ldr r3, =0x40023C00
691B ldr r3, [r3, #16]
4A11 ldr r2, =0x40023C00
F02303FA bic r3, r3, #0xFA
6113 str r3, [r2, #16]
F89D300F ldrb.w r3, [sp, #15]
If I execute by single-step the "9802 ldr r0, [sp, #8]" then I get the fault. So by implication we're getting enough speculative-execution of enough of "bl <FLASH_Erase_Sector_RAM>" to start the write. I've specified FLASH_VOLTAGE_RANGE_3 so there's quite a lot of code to execute in:
//void RUN_IN_RAM_SECTION FLASH_Erase_Sector_RAM(uint32_t Sector, uint8_t VoltageRange)
//{
B086 sub sp, sp, #24
9001 str r0, [sp, #4]
460B mov r3, r1
F88D3003 strb.w r3, [sp, #3]
//--- stm32f7xx_hal_flash_ex.c -- 583 ------------------------
//uint32_t tmp_psize = 0;
2300 movs r3, #0
9305 str r3, [sp, #20]
//--- stm32f7xx_hal_flash_ex.c -- 585 ------------------------
///* Check the parameters */
//assert_param(IS_FLASH_SECTOR(Sector));
//assert_param(IS_VOLTAGERANGE(VoltageRange));
//if(VoltageRange == FLASH_VOLTAGE_RANGE_1)
F89D3003 ldrb.w r3, [sp, #3]
2B00 cmp r3, #0
D102 bne 0x00000158
//--- stm32f7xx_hal_flash_ex.c -- 590 ------------------------
//{
//tmp_psize = FLASH_PSIZE_BYTE;
2300 movs r3, #0
9305 str r3, [sp, #20]
E012 b 0x0000017E
//--- stm32f7xx_hal_flash_ex.c -- 592 ------------------------
//}
//else if(VoltageRange == FLASH_VOLTAGE_RANGE_2)
F89D3003 ldrb.w r3, [sp, #3]
2B01 cmp r3, #1
D103 bne 0x00000168
//--- stm32f7xx_hal_flash_ex.c -- 594 ------------------------
//{
//tmp_psize = FLASH_PSIZE_HALF_WORD;
F44F7380 mov.w r3, #0x100
9305 str r3, [sp, #20]
E00A b 0x0000017E
//--- stm32f7xx_hal_flash_ex.c -- 596 ------------------------
//}
//else if(VoltageRange == FLASH_VOLTAGE_RANGE_3)
F89D3003 ldrb.w r3, [sp, #3]
2B02 cmp r3, #2
D103 bne 0x00000178
//--- stm32f7xx_hal_flash_ex.c -- 598 ------------------------
//{
//tmp_psize = FLASH_PSIZE_WORD;
F44F7300 mov.w r3, #0x200
9305 str r3, [sp, #20]
E002 b 0x0000017E
//--- stm32f7xx_hal_flash_ex.c -- 600 ------------------------
//}
//else
//{
//tmp_psize = FLASH_PSIZE_DOUBLE_WORD;
F44F7340 mov.w r3, #0x300
9305 str r3, [sp, #20]
//--- stm32f7xx_hal_flash_ex.c -- 604 ------------------------
//}
///* Need to add offset of 4 when sector higher than FLASH_SECTOR_11 */
//if(Sector > FLASH_SECTOR_11)
9B01 ldr r3, [sp, #4]
2B0B cmp r3, #11
D902 bls 0x0000018A
//--- stm32f7xx_hal_flash_ex.c -- 608 ------------------------
//{
//Sector += 4;
9B01 ldr r3, [sp, #4]
3304 adds r3, #4
9301 str r3, [sp, #4]
//--- stm32f7xx_hal_flash_ex.c -- 610 ------------------------
//}
///* If the previous operation is completed, proceed to erase the sector */
//FLASH->CR &= CR_PSIZE_MASK;
4B18 ldr r3, =0x40023C00
691B ldr r3, [r3, #16]
4A17 ldr r2, =0x40023C00
F4237340 bic r3, r3, #0x300
6113 str r3, [r2, #16]
//--- stm32f7xx_hal_flash_ex.c -- 614 ------------------------
//FLASH->CR |= tmp_psize;
4B15 ldr r3, =0x40023C00
691A ldr r2, [r3, #16]
4914 ldr r1, =0x40023C00
9B05 ldr r3, [sp, #20]
4313 orrs r3, r2
610B str r3, [r1, #16]
//--- stm32f7xx_hal_flash_ex.c -- 615 ------------------------
//CLEAR_BIT(FLASH->CR, FLASH_CR_SNB);
4B12 ldr r3, =0x40023C00
691B ldr r3, [r3, #16]
4A11 ldr r2, =0x40023C00
F02303F8 bic r3, r3, #0xF8
6113 str r3, [r2, #16]
23F8 movs r3, #0xF8
9304 str r3, [sp, #16]
//--- cmsis_gcc.h -- 531 -------------------------------------
//{
//uint32_t result;
//#if (__CORTEX_M >= 0x03U) || (__CORTEX_SC >= 300U)
//__ASM volatile ("rbit %0, %1" : "=r" (result) : "r" (value) );
9B04 ldr r3, [sp, #16]
FA93F3A3 rbit r3 ,r3
9303 str r3, [sp, #12]
//--- cmsis_gcc.h -- 544 -------------------------------------
//s--;
//}
//result <<= s; /* shift when v's highest bits are zero */
//#endif
//return(result);
9B03 ldr r3, [sp, #12]
//--- stm32f7xx_hal_flash_ex.c -- 612 ------------------------
///* If the previous operation is completed, proceed to erase the sector */
//FLASH->CR &= CR_PSIZE_MASK;
//FLASH->CR |= tmp_psize;
//CLEAR_BIT(FLASH->CR, FLASH_CR_SNB);
//FLASH->CR |= FLASH_CR_SER | (Sector << POSITION_VAL(FLASH_CR_SNB));
FAB3F383 clz r3, r3
9A01 ldr r2, [sp, #4]
FA02F303 lsl.w r3, r2, r3
F0430202 orr r2, r3, #2
4B08 ldr r3, =0x40023C00
691B ldr r3, [r3, #16]
4907 ldr r1, =0x40023C00
4313 orrs r3, r2
610B str r3, [r1, #16]
//--- stm32f7xx_hal_flash_ex.c -- 617 ------------------------
//FLASH->CR |= FLASH_CR_STRT;
4B05 ldr r3, =0x40023C00
691B ldr r3, [r3, #16]
4A04 ldr r2, =0x40023C00
F4433380 orr r3, r3, #0x10000
6113 str r3, [r2, #16]
//--- cmsis_gcc.h -- 429 -------------------------------------
//It completes when all explicit memory accesses before this instruction complete.
//*/
//__attribute__((always_inline)) __STATIC_INLINE void __DSB(void)
//{
//__ASM volatile ("dsb 0xF":::"memory");
F3BF8F4F dsb sy
//--- stm32f7xx_hal_flash_ex.c -- 618 ------------------------
///* Data synchronous Barrier (DSB) Just after the write operation
//This will force the CPU to respect the sequence of instruction (no optimization).*/
//__DSB();
//}
BF00 nop
B006 add sp, sp, #24
4770 bx lr
BF00 nop
40023C00 .word 0x40023C00
So I'm not convinced of that amount of prefetch/speculative execution.