2021-10-25 02:09 AM
Hi, I have been conducting some memory benchmarks to try and reason about how long it takes to program the flash during runtime. The procedure is to erase the page in the flash, then write to it using the flashProgram and flashErase functions provided in this library I am using https://github.com/GrumpyOldPizza/arduino-STM32L4/blob/master/cores/stm32l4/STM32.h.
I confirmed that the procedure is working properly because I can scan through the flash to read what I have programmed to confirm that it has been programmed correctly.
However, I am observing flash erasing and programming times roughly one order of magnitude better than what is included in the datasheet (under section 6.3.10 of the STM32L432KC datasheet). I wanted to ask why there is such a large difference and if it is to be expected. I am observing programming single page takes around 1ms, whereas the datasheet states it should take around 20.91ms. The same is observed for erasing, which took me around 1ms but according to the datasheet should take 22.02ms.
Thanks.
2021-10-25 06:11 AM
I wouldn't expect the typical time to be off by an order of magnitude. How specifically are you measuring these? Code would be helpful.
2021-10-25 06:19 AM
// save some unsigned ints
const PROGMEM unsigned int update_diff[] = {65000, 32796, 16843, 10, 11234};
void flash_to_flash(unsigned int number_update_blocks,unsigned int words_per_block)
{
unsigned int destinations[number_update_blocks];
// Create random destinations in flash
srand(micros());
for (int block = 0; block < number_update_blocks; block++) {
destinations[block] = (((rand() % 16000) & (~7U)) + 0x08000000 + 0x00020000) & (~2047U);
}
unsigned int erase_starts[number_update_blocks];
unsigned int erase_ends[number_update_blocks];
unsigned int program_starts[number_update_blocks];
unsigned int program_ends[number_update_blocks];
for (unsigned int block = 0; block < number_update_blocks; block++) {
erase_starts[block] = micros();
STM32.flashErase(destinations[block], words_per_block * 4);
erase_ends[block] = micros();
program_starts[block] = micros();
STM32.flashProgram(destinations[block], update_diff+block*words_per_block, words_per_block*4);
program_ends[block] = micros();
}
Serial.println("flash_to_flash");
for (unsigned int block = 0; block < number_update_blocks; block++)
{
Serial.print("Block ,");
Serial.print(block+1);
Serial.print(", of ,");
Serial.print(number_update_blocks);
Serial.print(", at ,");
Serial.print(words_per_block);
Serial.print(",words per block took ,");
Serial.print(erase_ends[block] - erase_starts[block]);
Serial.print(", to erase and ,");
Serial.print(program_ends[block] - program_starts[block]);
Serial.print(", to write");
Serial.println();
}
Serial.println();
Serial.println();
}
Here is a snippet of the code, there are lots of aspects you can ignore, such as the number_update_blocks function parameter (assume it is just 1). The key measurements are inside the second for loop, by measuring the time before the erase process begins, after the erase process begins and likewise with the time to program the memory. This gives us the time to erase a page and write a page if we use 500 as the words_per_block function parameter, which gives us 500 words (corresponding to one page).
2021-10-25 06:49 AM
The code within STM32.flashErase and STM32.flashProgram would be relevant here. Code otherwise seems okay except for:
> which gives us 500 words (corresponding to one page).
There are 2 kB per page, which is 512 words.
2021-10-25 06:54 AM
Hi, yes thanks. The code for the STM32.flashErase and STM32.flashProgram can be found here: https://github.com/GrumpyOldPizza/arduino-STM32L4/blob/master/cores/stm32l4/STM32.h. But I included the flashErase and the various functions that it uses.
Thanks!
bool STM32Class::flashErase(uint32_t address, uint32_t count)
{
if (address & 2047) {
return false;
}
count = (count + 2047) & ~2047;
if ((address < FLASHSTART) || ((address + count) > FLASHEND)) {
return false;
}
stm32l4_flash_unlock();
stm32l4_flash_erase(address, count);
stm32l4_flash_lock();
return true;
}
bool stm32l4_flash_erase(uint32_t address, uint32_t count)
{
bool success = true;
const uint32_t flash_base = FLASH_BASE;
uint32_t primask, flash_acr;
if (FLASH->CR & FLASH_CR_LOCK)
{
return false;
}
do
{
primask = __get_PRIMASK();
__disable_irq();
flash_acr = FLASH->ACR;
FLASH->ACR = flash_acr & ~(FLASH_ACR_ICEN | FLASH_ACR_DCEN);
{
FLASH->CR = FLASH_CR_PER | ((((address - flash_base) / 2048) << 3) & FLASH_CR_PNB);
}
stm32l4_flash_do_erase();
FLASH->ACR = (flash_acr & ~(FLASH_ACR_ICEN | FLASH_ACR_DCEN)) | (FLASH_ACR_ICRST | FLASH_ACR_DCRST);
FLASH->ACR = flash_acr;
__set_PRIMASK(primask);
if (FLASH->SR & (FLASH_SR_PROGERR | FLASH_SR_SIZERR | FLASH_SR_PGAERR | FLASH_SR_PGSERR | FLASH_SR_WRPERR | FLASH_SR_MISERR | FLASH_SR_FASTERR | FLASH_SR_RDERR))
{
success = false;
break;
}
address += 2048;
count -= 2048;
}
while (count);
FLASH->SR = (FLASH_SR_PROGERR | FLASH_SR_SIZERR | FLASH_SR_PGAERR | FLASH_SR_PGSERR | FLASH_SR_WRPERR | FLASH_SR_MISERR | FLASH_SR_FASTERR | FLASH_SR_RDERR);
return success;
}
bool stm32l4_flash_unlock(void)
{
uint32_t primask;
if (!(FLASH->CR & FLASH_CR_LOCK))
{
return true;
}
primask = __get_PRIMASK();
__disable_irq();
FLASH->KEYR = 0x45670123;
FLASH->KEYR = 0xcdef89ab;
__set_PRIMASK(primask);
return !!(FLASH->CR & FLASH_CR_LOCK);
}
void stm32l4_flash_lock(void)
{
FLASH->CR |= FLASH_CR_LOCK;
}
static __attribute__((optimize("O3"), section(".rodata2"), long_call)) void stm32l4_flash_do_erase(void)
{
uint32_t flash_sr;
FLASH->CR |= FLASH_CR_STRT;
do
{
flash_sr = FLASH->SR;
}
while (flash_sr & FLASH_SR_BSY);
FLASH->CR = 0;
}
2021-10-25 06:58 AM
I didn't see in your original post that you linked the code. Thank you for pointing that out and including it inline.
Since you're using 500 words per page, the code is likely hitting this and returning immediately:
if (address & 2047) {
return false;
}
It would be easy enough to determine by looking at the return value.
And on flashProgram, possibly this:
if ((address & 7) || (count & 7)) {
return false;
}
2021-10-25 07:07 AM
Hi, I do not think this is it as I just tried with 512 words and still have the same issue (around 1ms to erase and 1.8 ms to write). Additionally, I did not get a false return statement. In the line below in the flash_to_flash function I already make sure that the destination is always the beginning of the page (by ANDing with the complement of 2047, therefore clearing the 11 LSBs). Likewise when writing I make sure to write at least 4 words. Thanks.
destinations[block] = (((rand() % 16000) & (~7U)) + 0x08000000 + 0x00020000) & (~2047U);
2021-10-25 08:04 AM
You're right. I'm not sure. There's a lot of code complexity here that I'm not wanting to dig into. It could be that it's just completed 20x faster than expected, but I find that hard to believe.
Perhaps isolate the issue and simplify the code to erasing/writing a specific known page of flash.
2021-10-25 12:16 PM
You cannot measure the FLASH erase/programming times with micros(), millis(), HAL ticks or OS ticks. All of those are based on software counters, which are incremented in a timer (typically SysTick) interrupt. During FLASH erase/program operations reads and code execution from the FLASH are stalled. The timer interrupt doesn't execute and ticks are not incremented. Still the interrupt becomes pending and executes once immediately after the flash operation is complete - that's why you measure 1 tick (ms). In addition you are disabling interrupts globally, which would prohibit the tick increments even if the flash would not be stalled.
I recommend using DWT->CYCCNT or some other hardware timer for such purposes.
2021-10-25 12:36 PM
Thanks a lot for your help. I’ll investigate further.