H7 power dissipation and lifetime, how to manage?

regjoe · ‎2025-07-09

Hello,

I came across this document here https://www.st.com/resource/en/application_note/dm00622045-stm32h7-series-lifetime-estimates-stmicroelectronics.pdf

We have a H75x custom board running the MCU in VOS1 at 400MHz with external 1.2V core power supply.

According to this description here

According to Figure 2, when VOS1, VDD = 3.3 V, VCORE = 1.2 V and operation ratio of 100%. Some examples are
illustrated such as:
• Tj = 105°C the lifetime estimation is > 10 years
• Tj = 125°C the lifetime estimation is 4 years
• Tj = 140°C the lifetime estimation is 2 years
In the same conditions and for an operation ratio of 20%, the lifetime estimation is as following:
• Tj = 125°C the lifetime estimation is 20 years
• Tj = 140°C the lifetime estimation is 10 years

the power consumption, heat dissipation and lifetime is influenced by "operation ratio". What does operation ratio mean and how can I influence/control it? Does this mean that only a part of chip silicon is powered/used? Or time spent in the idle loop, assuming the MCU is running in lowest VOS mode and/or clock frequency during idle time?

Thanks,

Jochen

AScha.3 · ‎2025-07-09

>Never heard about failing systems due to running at CPU maximum clock speed. Is this only STM business?

No, i think its just the chosen chip technology , tiny structures are more sensible to high temperature and migration.

>If all depends on junction temperature, how can we measure it?

Good indication is to check on center/top of package (with a thermal cam you can "see" the chip).

I have my H743 at Vos2 with 200MHz (dont need more speed), but 2x SDMMC + USB , is about +4° over ambient.

>what must be done to make the chip junction temperature rise to a temperature > 100°C.

First question is : whats your ambient temp.?

At work, we have a customer in Indonesia, summer is warm there and big extruder in hall with sheet metal roof,

you cannot touch the metal of the control cabinet, inside the 10kW inverter with the control cpu;

here the ambient might be 90° C , so if the cpu runs on high current (= highest speed), the life time might be a problem. If cpu only heats up 20° , it might get critical. But at 200MHz, no problem (until now, 4 Y or so).

Just see ds , what this kind of chip process doing at high temp:

At 105° you get 550 mA !!! THIS it cannot survive many years.

In this table you see the strong effect of the ambient, so having +40° makes a big difference.

But if ambient not so hot and speed not at max , no problem:

my H743 with all peripherals i need runs at 83 mA , about +4° over ambient.

So the real current consumption and the resulting heat on chip is the indicator, will it get a problem - or not.

Reducing the on chip loss by external power or (some have) SMPS can help also, to keep the chip cool.

If you feel a post has answered your question, please click "Accept as Solution".

regjoe · ‎2025-07-09

@AScha.3 wrote:
Just see ds , what this kind of chip process doing at high temp:
In this table you see the strong effect of the ambient, so having +40° makes a big difference.

This table shows the effect of ambient? Do you mean that current @ Tj=25°C is, for example, measured at Ta=0°C and Tj=105°C is measured at Ta=60°C? The higher the ambient temperature, the higher Tj and the more current is drawn by the chip? How do you explain this effect?

If so, I misunderstood this table, I thought that Tj rises due to higher MCU activities (which rises the question, which activities drew most current).

AScha.3 · ‎2025-07-09

>How do you explain this effect?

Just look at ds of a single mosfet, or any transistor...junction temperature changes almost all parameters.

And -right- its own activity produces loss = heat, so its parameters change also with activity or switching.

hot -> bad on-resistance

hot -> resistance + switching-level -> more bad

switching loss 25° -> 175° about 300% increased !

So same speed switching produces up to 300% more heat at same switching speed, if chip hot !

This depends a lot on the used chip process, maybe you look at arm web pages, to make a chip:

(i did) lets say, we want make a new M33 core : arm gives you 3 tested options, made by TSMC :

- a low loss process, small chip area, but 100MHz max. speed

- or a fast process, high idle loss, high power loss, but up to 500MHz

- or a medium...in between.

So if you want the fastest cpu, it will produce high loss and be more sensitive to high temperature also.

See the new U5xx series by STM, very low power loss, but 150M max or so.

Its all a sum of many decisions and compromises, what you prefer and whats the bad side you get with it.

So >This table shows the effect of ambient? -- and the effects of self heating , that adds to ambient.

At xx MHz the core will always have same loss, running math or wait loop changes (almost) nothing,

but if chip/process is more sensitive to ambient, it will heat up much more , than at cool temp.

If you feel a post has answered your question, please click "Accept as Solution".

regjoe · ‎2025-07-09

ieved with my typical application:
Tj is at about 15°C above Ta
single core running at 400 MHz
audio streaming via ethernet at about 25 Mbps mostly via DMA
CPU awake about 20% (checked quite precisely with CYCCNT)

This is very interesting. So you are able to measure the CPU load? This implies that the CPU is inactive 80% during runtime. How is this acchieved/possible? I guess that the CPU is inactive while waiting for an event e.g. DMA finished IRQ? Is the CPU paused/stopped/halted by an instruction or does this happen automatically? IMHO using RTOS, wait for event means executing useless instructions in an idle loop. Am I wrong?

I always thought that CPU is always executing code, depending on CPU clock frequency. It is only halted if waiting for code (inserting wait states e.g. if code is fetched from slow external memory) or put into a sleep mode.

regjoe · ‎2025-07-09

@AScha.3 wrote:
So if you want the fastest cpu, it will produce high loss and be more sensitive to high temperature also.
At xx MHz the core will always have same loss, running math or wait loop changes (almost) nothing,
but if chip/process is more sensitive to ambient, it will heat up much more , than at cool temp.

Your explanation helped me a lot to understand why there are so many MCUs on the road. I'm just looking for a MCU for a new design and I'll have an eye on the U5 family.

Ozone · ‎2025-07-10

You didn't specifiy your use case and requirements in detail.

But guessing from the performance level a H7 would deliver, you could consider a Cortex A device running Linux.
Most of those processors support dynamic clock throttling, in addition to "power" and "efficiency" cores.

Some devices scale really well, and get only slightly warm to the touch under average loads.

regjoe · ‎2025-07-10

Regarding U5, I think of a future design, low power, graphics, cheap, consumer, uncritical system.

In the current design we settled on the proven and well-known H75x. It's a security system, soft realtime, must be able to run from battery. I currently do HW tests and low level SW layer for the APP SW guys. In this thread I'd like to focus on efficiency in order to find out how HW/SW may improve things here.

Ozone · ‎2025-07-10

Well, I don't know exact details, but a Cortex A & Linux would be a very different environment.
And "well-known H75" suggests you have some experience with it.

> ... must be able to run from battery.

This sounds somewhat softer than the requirement for an ultra-low power application. Supposedly only occasionally, and for limited periods.

Anyway, you mentioned "industrial".
IMHO the lifespan requirements depend on the industry, and 10 years is a very long time in electronics and consumer goods production. Not so much for chemical facilities, water / waste water recycling, powerplants and such.

Flash retention duration is another factor you might consider. Specified values are similiar to mentioned MCU lifespans. But you might cover both with updates and maintainance.

LCE · ‎2025-07-10

@regjoe wrote:
ieved with my typical application:
Tj is at about 15°C above Ta
single core running at 400 MHz
audio streaming via ethernet at about 25 Mbps mostly via DMA
CPU awake about 20% (checked quite precisely with CYCCNT)
This is very interesting. So you are able to measure the CPU load? This implies that the CPU is inactive 80% during runtime. How is this acchieved/possible? I guess that the CPU is inactive while waiting for an event e.g. DMA finished IRQ? Is the CPU paused/stopped/halted by an instruction or does this happen automatically? IMHO using RTOS, wait for event means executing useless instructions in an idle loop. Am I wrong?
I always thought that CPU is always executing code, depending on CPU clock frequency. It is only halted if waiting for code (inserting wait states e.g. if code is fetched from slow external memory) or put into a sleep mode.

I'm not using any OS, just a "simple" state machine in main.

There I have some states with higher priority, checked more often.

When run once through all states, CPU goes to sleep:

in main:
		/* +++++++++++++++++++++++++++++++++++++++++++++++++++++ */
		/* MSTATE_IDLE:
		 * 	CPU enters SLEEP mode to make room for DMA transfers
		 *	CPU wakes on interrupt
		 *		sleep time is measured using cycle counter and SysTick
		 */
			case MSTATE_IDLE:
			{
				CpuSleepMode();

				u8MainState = MSTATE_LWIP;
				break;
			}
...

void CpuSleepMode(void)
{
	/* for sleep statistics */
	/* NOTE:
	 *	__disable_irq() only prevents execution of ISR,
	 *	NOT the interrupt itself
	 */
	__disable_irq();
	u32CycAwakeSumLst = DWT->CYCCNT - u32CycAwakeStart;
	u64CycAwakeSumAll += (uint64_t)u32CycAwakeSumLst;

	/* ensure that all instructions done before entering SLEEP mode */
	__DSB();
	__ISB();

	/* request Wait For Interrupt -> SLEEP */
	__WFI();

	/* end sleep statistics */
	u32CycAwakeStart = DWT->CYCCNT;
	__enable_irq();
}

DMA (ETH, SAI1&4, ADC3) is still running without CPU interference, until any interrupt occurs (e.g. DMA (half-) transfer complete, and SysTick at least every 1 ms) and wakes the CPU (there are several sleep modes).

I was actually quite surprised about the 80% sleeping and checked the time measurement, but I'm pretty sure now it's okay. I'm basically summing all "awake cycles" and compare this sum to SysTick since reset.

regjoe · ‎2025-07-10

@Ozone wrote:
> ... must be able to run from battery.
This sounds somewhat softer than the requirement for an ultra-low power application. Supposedly only occasionally, and for limited periods.

Yes, but our system must run "full-fledged" SW if running on battery. And battery is expensive, the less Ah the better.

@Ozone wrote:
Anyway, you mentioned "industrial".
IMHO the lifespan requirements depend on the industry, and 10 years is a very long time in electronics and consumer goods production. Not so much for chemical facilities, water / waste water recycling, powerplants and such.

In our biz 10 years is absolute minimum lifetime. First generation based on Renesas uC @ 12MHz (max. 16MHz), 300k boards out. Second generation based on STM32F4 @ 168MHz (max. 180MHz), 100k boards out. No serious problems, never heard about dying CPUs or worn out external NOR flash. Next generation now is based on H75x @ 400MHz. This is the first time I ever heard about limited uC lifetime. This made me wonder if newer ST uCs are less reliable if running at nominal CPU speed.

What I've learned now, is that high junction temperatue is the main reason for limited lifetime. I'll have an eye on this and I'll try to measure it via internal ADC. I did than on Microchip E70 before but due unknown reasons the result was sometimes unrealistic high or low. I had to calibrate and correct the results.

Second I try to measure the CPU load in order to get an idea if there is any possibility to free the CPU from heavy load, which can extend battery time. I don't know if and how this can be acchieved, maybe someone can help me out. But I'm afraid that the CPU occupies 100% due to RTOS usage.