2024-11-21 04:32 AM
Hello ST Forum,
We are using an STM32H750I in a current project. This STM communicates via the USART1 with another STM32H750V at a baud rate of 512kBaud. The clock of the STM32H7 runs via an 8MHz HSE (external quartz) at 480MHz. The clock of the USART1 is obtained from the PCLK2.
We have now noticed the following phenomenon. Every now and then we have problems with the UART transmission between the two processors when starting the system. We connected a logic analyzer and found that in these cases "framing errors" and sometimes incorrect characters are sent on the UART. This case occurs particularly when the system was very cold beforehand, for example the device had been standing outside for a while. To test this case, we sprayed the CPU with ice spray to provoke it and then started it. The incorrect transmissions are clearly visible.
Since the UART is supplied via the PLL and we classified the "cold" PLL as a potential source of error, I had the PLL1Q output to the MCO1 while we read the UART. The nominal frequency should be 16MHz here. You can see that the frequency of the MCO1 also fluctuates, especially in the "framing error" places. That would mean that the frequency of the PLL is fluctuating slightly. You can see on the right side of the image that the frequency of the MCO1 sometimes increases to 18MHz. This also explains the “framing errors”.
However, this phenomenon only occurs when the CPU is running at full load. If you switch off some of the processing, especially peripherals such as FPU other UARTs and so on, it is no longer as pronounced. Of course, cooling down with cold spray is not a normal use case, but as I said, it also sometimes occurs when the device has been outside at 5°C. So our question to you:
- Are you familiar with such behavior?
- Is it possible that because the CPU is running at full load, the temperature increase is too extreme and the PLL therefore adjusts too much?
Interestingly, the phenomenon does not occur at all if you use the HSI as the source for the USART1. Here, however, we are concerned that this may become too inaccurate over time. Or do you think this is a suitable workaround?
Brief additional information:
- If you cool down the 8MHz quartz in isolation, these errors do not occur. Really only if you choose the PLL as the source of the UART.
Best regards and thank you in advance,
Eric
2024-11-25 04:39 AM
Hey @NEdom.1,
i really appreciate your reply and all the suggestions from the people here in the chat. Thanks a lot. Yes you are right, the USART is definetly not the right choise. As the hardware has to stay, we have to workaround it somehow. Either with different baudrates or with additional CRC checking and retries. Interesstingly if the systems runs for some time the USART gets super stable. It really seems to be a startup thing. We are still investigating and I will share all results.
Best regards,
Eric
2024-11-25 04:43 AM
Hey @Ozone,
it really looks like what you descibe. If the system runs for 1 or 2 minutes, it is getting super stable. If you then reset the system the USART baudrate is perfectly stable. Really only if you let the system cool down for some hours the issue starts to occur. So it must be something about the rising temperature or the temperature gradient. We are still in the middle of our investigations. Some PLL configurations seem to be way more robust, but nothing which we are 100% sure at the moment. It would be very interessting if ST knows about this issue or limitation.
Best regards,
Eric
2024-11-25 10:31 PM
While having some basic understand of the cause, I can't really come up with a solution.
As soon as you power the device up and current flows, power is dissipated and the silicon warms up. Eventually, the die temperatures reaches an equilibrum between heat generation and dissipation - usually some degrees above environmental temperature. And basically all analogue semiconductor parameters drift with temperature.
I would try either lower baudrates, or a baudrate which promises a better baudrate error tolerance.
I use UART/RS232 mostly for communication with PC hosts, and hardly ever go beyond 115200 baud.
2024-11-26 01:35 AM
Hey @AScha.3,
our hardware guy checked the MCO with a high bandwidth DSO. He told me that he can definetly force a drift on the MCO when he apply ice-spray. This is how we force the temperature issue as tracking this during boottime is almost impossible. But he also mentioned that he sometimes see slight difts during runtime as well without any ice-spray. But i think this is somehow normal as the PLL needs to readjust from time to time. So it really looks like the PLL is the root cause and we wont get around this, with this high baudrate.
@AScha.3 what do you mean with Vos setting?
In parallel i enabled the RCC and CSS interrupt to track either PLLRDY flags and problems with the HSE. I can see that RCC triggers duing boot showing PLLRDY. This is as expected. But when the UART fails both interrupts are not triggering. So it seems there is no way to detect the PLL drift.
Any additonal ideas here?
Best regards,
Eric