cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F767ZI Nucleo board works from Flash on ITCM slower if ART is on

BSchm.0
Associate II

I detected that ART accelerator and prefetch is not enabled.

Why for loop gets slower if if I switch both on ?

  /* USER CODE BEGIN SysInit */
 
#if defined(FLASH_ART_ON)
  // enable ART for Flash (x8 Booster)
  int FLASH_ACR_REG=0;
  volatile unsigned int *FLASH_ACR   = (volatile unsigned int *)0x40023C00;
  FLASH_ACR_REG = *FLASH_ACR;
  //FLASH_ACR_REG = 0; //zero wait states (broken)
  FLASH_ACR_REG = FLASH_ACR_REG | (3<<8);
  *FLASH_ACR = FLASH_ACR_REG;
#endif
  // Benchmark
  int i=0;
  volatile z=0;
  volatile unsigned int *DWT_CYCCNT   = (volatile unsigned int *)0xE0001004;
  CcountFLASHstart = *DWT_CYCCNT;
  for (i;i<1000;i++){
#if defined(REGISTER_BM)  //internal register benchmark
	  z=i;
#else //C7 peripheral to globale variable (SRAM)
	  CcountFLASHstop = *DWT_CYCCNT;
#endif
  }
  CcountFLASHstop = *DWT_CYCCNT;
 
  /* USER CODE END SysInit */

7 REPLIES 7
BSchm.0
Associate II
  • The ITCM RAM 16KB is connected to the ITCM bus (I-Cache 8KB)
  • The ACCEL/CACHE is connected to ITCM bus too
  • The CUBE MX generated inker file uses SRAM with DTCM RAM as RAM start address

/*
*****************************************************************************
**
 
**  File        : LinkerScript.ld
**
**  Abstract    : Linker script for STM32F767ZITx Device with
**                2048KByte FLASH, 512KByte RAM
**
**                Set heap size, stack size and stack location according
**                to application requirements.
**
**                Set memory bank area and size if external memory is used.
**
**  Target      : STMicroelectronics STM32
**
**
**  Distribution: The file is distributed as is, without any warranty
**                of any kind.
**
**  (c)Copyright Ac6.
**  You may use this file as-is or modify it according to the needs of your
**  project. Distribution of this file (unmodified or modified) is not
**  permitted. Ac6 permit registered System Workbench for MCU users the
**  rights to distribute the assembled, compiled & linked contents of this
**  file as part of an application binary file, provided that it is built
**  using the System Workbench for MCU toolchain.
**
*****************************************************************************
*/
 
/* Entry Point */
ENTRY(Reset_Handler)
 
/* Highest address of the user mode stack */
_estack = 0x20080000;    /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x2000;      /* required amount of heap  */
_Min_Stack_Size = 0x400; /* required amount of stack */
 
/* Specify the memory areas */
MEMORY
{
RAM (xrw)      : ORIGIN = 0x20000000, LENGTH = 512K
FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 2048K
}
 
/* Define output sections */
SECTIONS
{
  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH
 
  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
    *(.eh_frame)
 
    KEEP (*(.init))
    KEEP (*(.fini))
 
    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH
 
  /* Constant data goes into FLASH */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    . = ALIGN(4);
  } >FLASH
 
  .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
  .ARM : {
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;
  } >FLASH
 
  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
  .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
  .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT(.fini_array.*)))
    KEEP (*(.fini_array*))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH
 
  /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);
 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data : 
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
 
    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >RAM AT> FLASH
 
  
  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)
 
    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM
 
  /* User_heap_stack section, used to check that there is enough RAM left */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >RAM
 
  
 
  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }
 
  .ARM.attributes 0 : { *(.ARM.attributes) }
}
 
 

Who is using DTCM RAM ?

1) Is it only a fast access if your code data is inside the first 16KB ?

If 1) is true how to get all my IRQ handlers inside DTCM RAM by linker file ?

BSchm.0
Associate II

At STM32F4xxx family you can enable data and instruction cache in the FLASH_ACR. You can reset both flash caches in the FLASH_ACR by a bit for each too

looks like they have replaced the singe on/off and reset to ARTEN and ARTRST to do the same.

Any one an idea why the loop get slower if ART is enabled ?

And you run this code from ITCM or from AXIM?

By changing the code you change alignment of instructions and possibly constant data.

JW

Of course ITCM ( as mentioned in the headline 😉 ) cause on AXIM I always have the 7wait states.

Maybe the accelerator is enabled by default. If it works 100% that makes no sense to switch it of at reset. And the enable bit is wrong documented and its an disable. Can you add the code snippet and confirm it that it will get slower if enabled ?

/* USER CODE BEGIN SysInit */
 
#if defined(FLASH_ART_ON)
  // enable ART for Flash
  int FLASH_ACR_REG=0;
  volatile unsigned int *FLASH_ACR   = (volatile unsigned int *)0x40023C00;
  FLASH_ACR_REG = *FLASH_ACR;
  //FLASH_ACR_REG = 0; //zero wait states latency (broken)
  FLASH_ACR_REG = FLASH_ACR_REG | (3<<8);
  *FLASH_ACR = FLASH_ACR_REG;
#endif
  // Benchmark
  int i=0;
  volatile z=0;
  volatile unsigned int *DWT_CYCCNT   = (volatile unsigned int *)0xE0001004;
  CcountFLASHstart = *DWT_CYCCNT;
  for (i;i<1000;i++){
#if defined(REGISTER_BM)  //internal register benchmark
	  z=i;
#else //C7 peripheral to globale variable (SRAM)
	  CcountFLASHstop = *DWT_CYCCNT;
#endif

* Private user code ---------------------------------------------------------*/
/* USER CODE BEGIN 0 */
 
#if defined(TRACE)
// enables trace counter and control
void trace(void){
  volatile uint32_t *DWT_CONTROL = (uint32_t *) 0xE0001000;
  volatile uint32_t *DWT_CYCCNT = (uint32_t *) 0xE0001004;
  volatile uint32_t *DEMCR = (uint32_t *) 0xE000EDFC;
  volatile uint32_t *LAR  = (uint32_t *) 0xE0001FB0;   // <-- added lock access register
 
  *DEMCR = *DEMCR | 0x01000000;     // enable trace
  *LAR = 0xC5ACCE55;                // <-- added unlock access to DWT (ITM, etc.)registers
  *DWT_CYCCNT = 0;                  // clear DWT cycle CcountUART5er
  *DWT_CONTROL = *DWT_CONTROL | 1;  // enable DWT cycle CcountUART5er
}
#endif

But your linker script does not indicate any ITCM FLASH in use...

Post disasm.

JW

The linker script was made by CubeMX code generator

You are right: Flash Memory on ITCM interface is mapped to 0x00200000-0x003F0000

but if I replace the FLASH (rx) : ORIGIN = 0x00200000, LENGTH = 2048K

then the load of debugger fails because elf could not be programmed on mirror address only at physical address by openocd

How the code comes from FLASH to ITCM by Linker ? No, Its mirrored or remmaped from 0x8000000 -> 0x00200000 so first 16KB of FLASH is in ITCM

Vector table 👍

IRQ handler 👎  

Can you share your ITCM linker script enable ?

Maybe is it possible to link all IRQ handlers into the first 16KB of FLASH ? by section in C-code ?

/* Specify the memory areas */
MEMORY
{
RAM (xrw)      : ORIGIN = 0x20000000, LENGTH = 512K
FLASH (rx)      : ORIGIN = 0x8000000, LENGTH = 2048K
}

Error in final launch sequence
Failed to execute MI command:
load /home/one/eclipse-workspace/FreeRTOSF767ZI/build/FreeRTOSF767ZI.elf
Error message from debugger back end:
Load failed
Failed to execute MI command:
load /home/one/eclipse-workspace/FreeRTOSF767ZI/build/FreeRTOSF767ZI.elf
Error message from debugger back end:
Load failed
Load failed

> Can you share your ITCM linker script enable ?

I'm not using 'F7, sorry.

JW