HAL_I2C_GetError() reporting error codes

KGryn.1 · ‎2024-05-20

Hello,

We have noticed communication errors with STM32 that is working in our device. The device includes STM32 (master) and TI's battery monitor BQ76952 (slave).

Using the same device and same software we noticed three situations that occur randomly:

After boot write commands are not returning any errors and further communication is correct.
After boot few initial write commands are returning error and further communication is correct. However, we have noticed that some devices lost communication after longer perdiod (for example two months) and we don't know what is the reason for it. Only after rebooting STM32 communication is correct.
Sometimes after boot few initial write commands are returning error and I2C peripheral locks up. In such a case all communication is failing. Only after rebooting STM32 communication is correct.

HAL_I2C_GetError() reports following error codes: 0x02 (ARLO), 0x04 (ACKF).

There are only two devices on I2C bus: stm32 (master), bq76952 (slave). I2C has 100 kHz clock. We're using HAL library for I2C communication.
It happens in two different configurations with STM32L432 and STM32G0B1.

Looking forward to your reply.

Andrew Neil · ‎2024-05-21

Have you used an oscilloscope to see what's happening on the wires? Also check that power supplies are clean & stable before attempting to start comms.

KGryn.1 · ‎2024-05-21

Hello,

Thank you for the reply.

We'll do further electrical measurements and paste them later but from the previous measurements I remember no difference with the I2C electrical signal at the start (when problem occurs) and at the working system (when the problem does not occur).

What can be done to make to disappear these errors: 0x02 (ARLO), 0x04 (ACKF)? Or if it's not possible, how to unblock communication apart from restarting STM?

STM and I2C communication is started only when 3.3V output is with active power good flag from power supply and there's a long delay (>1s) from the start of the power good signal to start the communication.

The most common situation when these errors occurs is at the programming running setup: when the STM is forced to reset and I2C is running.

Andrew Neil · ‎2024-05-21

On recovering from I2C errors:

https://community.st.com/t5/stm32-mcus-products/stm32h745i-i2c-recovery-after-addressing-nonexistent-device/m-p/660263/highlight/true#M240602

@KGryn.1 wrote:
The most common situation when these errors occurs is at the programming running setup: when the STM is forced to reset and I2C is running.

So you could be in the middle of an I2C transaction when this reset occurs? Could well "confuse" the slave...

KGryn.1 · ‎2024-05-21

Thanks for the link: we tried this nine clock pulses tip some time ago but with no posotive results.

But we did implementation on our own - maybe it was faulty - do you have functional code with this I2C recovery method?

Saket_Om · ‎2024-05-22

Hello @KGryn.1

Did you add pull-up resistors to the SDA and SCL lines? The typical values are between 4.7kΩ and 10kΩ

If your question is answered, please close this topic by clicking "Accept as Solution".

Thanks
Omar

KGryn.1 · ‎2024-05-22

Hello,

Today we continued testing and measurements.

Yes there are pull-ups on SDA and SCL, we started testing with 10k and also lower values to the 4k7 with no difference.
We noticed that slave pulls down SDA line to the GND at the reset from time to time (not at all resets). We implemented method with clock pulses prior to the HAL start. We noticed that nine clock pulses improves starting the communication (less errors) but we also tested more than 9 clock pulses: 20 pulses with better results. Is any disadvantage of sending more than 9 pulses or is there any upper limit? Both setups were with implementation of the start sequence and stop sequence. When 20 times is acceptable, we'll stay with it.
HAL during normal operation has a problem with handling communication faults e.g. short between SDA and SCL (e.g. enviromental pollution between pull-ups). Do you have tips for handling this issue?
BTW - the slave IC (BQ) has external reset pin that we can't use because using it causes loosing some starting configuration that we can't loose - we checked it in tests.

Regards

Saket_Om · ‎2024-05-22

Hello @KGryn.1

>HAL during normal operation has a problem with handling communication faults

To recover I2C communication after encountering errors such as 0x02 (ARLO) or 0x04 (ACKF), you can perform an abort, de-initialization, and re-initialization of the I2C peripheral. This process will reset the I2C hardware block within the STM32 microcontroller and clear any error conditions, allowing you to attempt communication again.

If your question is answered, please close this topic by clicking "Accept as Solution".

Thanks
Omar

Tesla DeLorean · ‎2024-05-22

1) Both seem kind of weak. At high speeds 1K5 or 2K7 are probably more common, to get more aggressive rising edge, with higher current/energy consumption, obviously. Weak pull-ups used in low-power implementations, and those with low bus capacitance.

2) Kind of situation where clocking out should remediate, or indicative of the I2C, SPI or HDQ mode not being clearly established to the IC. 9 clocks comes from the address decode and nack, you're attempting to cause slaves to release any perceived ownership/selection.

3) the pull-up should go to the power rail, not sure how you'd move that without a lot of current flowing, definitely shouldn't by acting as a potential divider as one point's fixed/established. Yes STM32's are not going to be happy seeing LOW levels, as it's an indication the bus is busy.

Operating at different supply levels, or all at 3V3 ?

Couple of I2C threads on TI's forum about BQ76952, and some issue with I2C and early steps of IC

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

KGryn.1 · ‎2024-05-23

Hello,

For today's tests we still had 10k pull-ups and 3.3V power supply mainly because signals edges look fine at the scope.

Apart from the pull-ups value can you estimate difference and disadvantages of sending more than 9 clock pulses at the beginning of the transmission? As previously I'm asking because this setup with 20 clock pulses (or generally >= 2x 9 clock pulses) works well so far so question is if it's ok for any other reason.

From my point 3.

"HAL during normal operation has a problem with handling communication faults e.g. short between SDA and SCL (e.g. enviromental pollution between pull-ups). Do you have tips for handling this issue? "

This happens when SCL line is externally shorted to ground. STM32 detects this as arbitration lost (ARLO error) in muli-master setup. This leaves BUSY flag set, waiting for STOP condition from other master. In single master case this never happens and it blocks all next transmissions. According to reference manual, setting PE bit to 0 is the only option to reset this flag if there is no other master that can generate STOP condition.

As a workaround we are toggling PE bit to 0 and back to 1 if we detect transmission error and BUSY flag is still set after transmission.

Is any disadvantages could be caused by this method? As previously this resolved our issues but we're not sure if it's ok for some other aspects and cases.

Regards