cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 CPU runs too slowly occasionally

zhuij.1
Associate II

test code as below:

#include <rtthread.h>
#include <rthw.h>
 
#define CPU_USAGE_CALC_TICK    100
#define CPU_USAGE_LOOP        100
 
static rt_uint8_t  cpu_usage_major = 0, cpu_usage_minor= 0;
static rt_uint32_t total_count = 0;
 static rt_uint32_t _count = 0;
 
//static rt_uint32_t loop;
 
static void cpu_usage_idle_hook()
{
	rt_uint32_t count;
    rt_tick_t tick;
	volatile rt_uint32_t loop;
    if (total_count == 0)
    {
        /* get total count */
        rt_enter_critical();
        tick = rt_tick_get();
        while(rt_tick_get() - tick < CPU_USAGE_CALC_TICK)
        {
			count++;
            loop = 0;
            while (loop < CPU_USAGE_LOOP) loop ++;
        }
		total_count = count;
        rt_exit_critical();
    }
 
    count = 0;
    /* get CPU usage */
	rt_enter_critical();
    tick = rt_tick_get();
    while (rt_tick_get() - tick < CPU_USAGE_CALC_TICK)
    {
        count ++;
        loop  = 0;
        while (loop < CPU_USAGE_LOOP) loop ++;
    }
	_count = count;
	rt_exit_critical();
	
    /* calculate major and minor */
    if (count < total_count)
    {
        count = total_count - count;
        cpu_usage_major = (count * 100) / total_count;
        cpu_usage_minor = ((count * 100) % total_count) * 100 / total_count;
    }
    else
    {
        //total_count = count;
 
        /* no CPU usage */
        cpu_usage_major = 0;
        cpu_usage_minor = 0;
	}
}

The same code runs at different speeds,the total_count  = 54580 ,However the is _count = 13633,75% reduction in CPU performance 。

The assembly code is as follows:

0693W00000Y9b8LQAR.png0693W00000Y9b9TQAR.png

4 REPLIES 4
Bubbles
ST Employee

Hi @zhuij.1​,

the core runs exactly as fast as you configured it. What you are incorrectly interpreting as drop in performance is just a matter of what are the CPU cycles used for.

BR,

J

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

zhuij.1
Associate II

The above test code runs under the same clock configuration (480MHz). The _count and total_count in the test code are sometimes the same (_count ~=total_count ~=60000) and sometimes different (_count ~=13000, total_count ~=60000). When it is different, if you add several nop instructions before calculating _count, _count becomes as large as the value of total_count again. When it is abnormal, we have tested the system frequency through MCO and it is normal.

In addition, we do not have this problem when using STM32F407 in our products0693W00000Y9elGQAR.png

Danish1
Lead II

What you might be seeing could be an effect of the instruction alignment, particularly if you don't have the 'H7 caches set as they should be.

Do you know if you're executing directly from FLASH, or has the code been copied into ITCM RAM?

Although the processor can run at 480 MHz, FLASH memory is very much slower, needing up to 4 additional AXI-bus cycles (so 5 in total) to read a memory location. To speed things up, the FLASH memory is read in groups of 256 bits, so that's 16 16-bit processor instructions.

The memory system will prefetch the next 16 instructions, so normally there isn't a delay when moving from one group to the next.

But there can be a delay if the processor has to branch (e.g. in the tight loops in your code), particularly if the new instruction is in a different 256-bit group. That delay will be the entire access time of the FLASH, 4 AXI-bus cycles.

Adding a few NOPs between the two timing-tests alters the relative alignment of those tests, so one might be optimally aligned (minimal fetches from FLASH) while the other might have to make many more fetches giving slower execution.

To minimise this problem, the H7 incorporates a cache, so a recently-fetched instruction shouldn't need to be re-fetched from FLASH. But to benefit from this, you might have to enable it.

Hope this helps,

Danish

zhuij.1
Associate II

thank you very much。The program is executed in flash,we enable instruction cache,the problem goes away.