There is a bug in TIM's work on stm32h750, but on stm32f407 everything works fine. Please check the results on your controller.

ignatyy · ‎2019-12-30

#include "stm32h7xx_hal.h"

extern void SystemClock_Config_16MHz(void);

uint32_t DBG32[10];

//======================

void delay(uint32_t wait){

while(wait--);

}

//==================

void Test_TIM(){

HAL_Init();

SystemClock_Config_16MHz(); // my quarc = 16MHz => pll1=400MHz

TIM_HandleTypeDef TimHandle;

__HAL_RCC_TIM2_CLK_ENABLE();

TimHandle.Instance = TIM2;

TimHandle.Init.Period = (uint32_t)-1;

TimHandle.Init.Prescaler =0;// == 200MHz;

TimHandle.Init.ClockDivision = 0;

TimHandle.Init.CounterMode = TIM_COUNTERMODE_UP;

TIM_Base_SetConfig(TimHandle.Instance, &TimHandle.Init);

__HAL_TIM_ENABLE(&TimHandle);

// --------

__disable_irq();

TIM2->CNT =0;

DBG32[0]=TIM2->CNT;

delay(1);

DBG32[1]=TIM2->CNT;

delay(1);

DBG32[2]=TIM2->CNT;

delay(1);

DBG32[3]=TIM2->CNT;

delay(1);

DBG32[4]=TIM2->CNT;

delay(3);

DBG32[5]=TIM2->CNT;

delay(3);

DBG32[6]=TIM2->CNT;

delay(3);

DBG32[7]=TIM2->CNT;

delay(3);

DBG32[8]=TIM2->CNT;

while(1);

// DBG32[0..8]=2,42,94,134,172,236,284,310,342

// delta= DBG32[i]-DBG32[i-1]== 40,52,40,38, 64,48,26,32 !! nonsense !!

// in stm32f407 == all OK, but in stm32h750 == bug !!!

Tesla DeLorean · ‎2019-12-30

Not like the superscaler CPU is running time backward.

Make the loop iterator volatile so compiler doesn't fold/remove the loop.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

ignatyy · ‎2019-12-30

Thanks for the answer. The assembler listing shows that the compiler does not delete anything and step-by-step debugging also points to this. The reason is something else.

Tesla DeLorean · ‎2019-12-30

>>The reason is something else.

Like superscalar, cache-line width, branch prediction?

What's your issue here? That your software delay doesn't produce consistent numbers, based on where and how the function is called?

Running out of FLASH or ITCM RAM?

GNU tools?

The CM4F and CM7 are decidedly different architectures.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

waclawek.jan · ‎2019-12-30

High execution speed in the Cortex-M7 is achieved by a combination of caching, pipelining, branch prediction and - as Clive said above - superscalar execution. Each one of these result in execution time of individual instructions to depend heavily on context, being far, far from being constant - and of course these latencies add up.

This is not just a simply clocked-up microcontroller.

JW

ignatyy · ‎2019-12-30

Сaching is disabled.

DBG32 [5..8] the code is absolutely identical, and the score is 2 times different!

waclawek.jan · ‎2019-12-30

> Сaching is disabled.

You've excluded one of the many sources of latency/jitter. There are more - FLASH latency, prefetch vs. fetch match, ART if you are executing through it. AXI latencies/arbitration. Access to TIM goes through AXI-to-AHB and AHB-to-APB bridges, there may be different clocks, and there are resynchronisations.

> DBG32 [5..8] the code is absolutely identical,

It's not, for example it is running from different addresses. And, as I've told you above, most of the latencies sources are context (i.e. history, mutual relationships etc.) dependent.

What you see is normal. High processing power comes at the cost of loss of control. Accept it.

JW

RMcCa · ‎2019-12-30

Did you turn the optimization all the way off?

I wonder what the purpose of this code is. I wouldn't rely on reading the timer counter register on the fly like that and have always tried to avoid using blocking delays. The vast complement of timers and dmas and dmamux and nvic on the h7 is what gives it the real time behavior, not the other way around.

It's not an 8bit pic or avr.

ignatyy · ‎2020-01-03

I post part of the assembler listing: to execute the subroutine

"delay (1)" should be up to 6 CPU cycles (this is the worst case!). Plus 8 CPU cycles for calling and reading _ writing TIM. Total should be no more than 14 CPU cycles. The value in DBG [1] = 46 indicates the real 92 CPU cycles

#include "stm32h7xx_hal.h"
 
extern void SystemClock_Config_16MHz(void);
 
uint32_t DBG32[20];
 
 
//======================
void delay(uint32_t wait){
  while(wait--);
}
 
//==================
// iar optimize= none
//===============
void Test_TIM(){
  
  //SCB_DisableICache();
  //SCB_DisableDCache();
  
  HAL_Init(); 
   
  SystemClock_Config_16MHz(); // my quarc = 16MHz =>  pll1=400MHz
  
  TIM_HandleTypeDef    TimHandle;
   
  __HAL_RCC_TIM2_CLK_ENABLE();
  
  TimHandle.Instance           = TIM2;
  TimHandle.Init.Period        = (uint32_t)-1; 
  TimHandle.Init.Prescaler     =0;// == 200MHz;
  TimHandle.Init.ClockDivision = 0;
  TimHandle.Init.CounterMode   = TIM_COUNTERMODE_UP;
  TIM_Base_SetConfig(TimHandle.Instance, &TimHandle.Init);
      
  __HAL_TIM_ENABLE(&TimHandle);
  
  // -------- 
  __disable_irq();
  TIM2->CNT =0;
  
  // -- delay(1) ----
  DBG32[0]=TIM2->CNT; 
   delay(1);
  DBG32[1]=TIM2->CNT; 
   delay(1);
  DBG32[2]=TIM2->CNT; 
   delay(1);
  DBG32[3]=TIM2->CNT;
   delay(1);  
  DBG32[4]=TIM2->CNT;
  
  // -- delay(10) ----
   delay(10);
  DBG32[5]=TIM2->CNT; 
   delay(10);
  DBG32[6]=TIM2->CNT; 
   delay(10);
  DBG32[7]=TIM2->CNT;
   delay(10);  
  DBG32[8]=TIM2->CNT;
  
  //  delta= DBG32[i]-DBG32[i-1]
  for(uint8_t i=1; i<=8; i++) DBG32[i-1]= DBG32[i] - DBG32[i-1]; 
  DBG32[8]= 0;
  
  while(1);
  
  // rezult: delta= DBG32[i]-DBG32[i-1]== 42,46,32,40, 40,40,48,56 !! nonsense !!
   
 
  //========================
  // part of listing asm
  //=========================
  
  //    void delay(uint32_t wait){
  //      while(wait--);
  /*
delay: // 5 == CPU cycles!!
??delay_0:
        MOVS     R1,R0
        SUBS     R0,R1,#+1
        CMP      R1,#+0
        BNE.N    ??delay_0
        BX       LR               ;; return
  
  //------- 
  // delay(1);
  // DBG32[5]=TIM2->CNT; 
  //-------
  
    // -- 8 == CPU cycles ---
    // delay(1);
        MOVS     R0,#+1
        BL       delay
    //  DBG32[1]=TIM2->CNT; 
        LDR      R0,[R5, #+0]
        STR      R0,[R4, #+4]
    
    // summa  CPU cycles = 6+8= 14 !!!
    // ! ! !   value DBG32[1]=46 === 92 CPU cycles ! ! !
   ??? question: where is the conveyor acceleration ???
     
          */
}

. Question: where is the conveyor superscalar?