I am reporting a hardware bug with the I2C peripheral in the newer STM32 families (STM32F7, STM32H7, STM32L0, STM32L4, and STM32L4+ familes) in I2C multi-master mode (including many SMBus host applications), and a few possible workarounds. The STM32F1, STM32F2, STM32F4, and STM32L1 families are unaffected because they have an older I2C peripheral with its own flaws (like not being able to acknowledge its own address if it lost arbitration during that address). I reported it to ST's technical support Dec 19, 2017 and they have still not confirmed a final workaround, so ST has confirmed today (Apr 5, 2018) this is a hardware bug and is in the process of updating the family errata sheets. I'm sharing with the community in the hopes of saving someone else the trouble I went through.
Most people using the I2C will NOT be affected by this bug. It affects I2C multimaster mode, and in particular with DMA optimized transfers which is unfortunately the lowest power and most CPU efficient way of using the STM32. An example application would be an SMBus host.
The issue is that setting the START bit to request a new master transmission becomes latched into the hardware peripheral during a bus collision where itself is addressed and the peripheral changes to slave receiver to receive a message, and it cannot be aborted! Immediately after the STOP condition is detected, the I2C peripheral will try to send a START + address (which will likely be incorrect), even if the START bit is cleared by setting the ADDRCF bit. There is no way to abort this transfer without resetting the peripheral. See workarounds below.
The reference manual for the STM32L4 series RM0394 on p.1037 and p.1071 indicates that setting the ADDRCF will clear the START bit that is pending. This does clear the START bit, but does NOT clear the pending START request in the hardware peripheral. The documentation needs to be updated to make that clear.
Steps to reproduce
I have attached a demo project that reproduces this bug, but here is a more verbose explanation:
- Setup I2C peripheral with OAR2=something, ADDRIE=1 (so that we may be addressed as a slave) and start a master transmission by configuring the DMA and setting STOPIE=1, TXDMA=1, AUTOEND=1, RELOAD=0, NBYTES=N, START=1.
- At the same time, or a few microseconds before, have another I2C master address this device as a slave. This bus collision is not detected as an arbitration loss (ARLO) when it is during the address phase. The I2C peripheral is automatically configured as slave and the ADDR flag is set.
- When the ADDR flag is set, reconfigure this device for slave reception by configuring the DMA and setting STOPIE=1, SBC=1, RXDMA=1, AUTOEND=0, RELOAD=1, NBYTES=N+1 (anything greater than before). Then set ADDRCF=1 to clear the pending ADDR bit and ACK the address to start. This also clears the START bit in hardware... but the START request is still pending in hardware!!! (this appears to be the root cause of the bug)
- After the transfer is complete and the STOP condition is received, this I2C peripheral exhibits the bug: a START condition is asserted on the bus, and the original slave address set in step 1 is transmit on the bus, even though the START bit has been cleared for the entire last transmission.
- In addition, if we try to work around the unexpected START+address transmission by setting up the TXDMA, it turns out that the NBYTES=N+1 was latched into hardware, so overwriting it with NBYTES=N does NOT have any effect.
There are four possible workarounds. If you're reading this far, you're probably affected by the bug and hoping for an easy fix. Good luck with that. Here are the workarounds I have discovered so far:
- Disable the hardware peripheral when a bus collision occurs during a pending transfer. Unfortunately, the ARLO bit is not always set when this happens, so detecting it will be application-specific. More importantly, disabling the I2C peripheral also means that it may miss receiving another master's START condition during the time it takes to reconfigure the peripheral before re-enabling it. In a real-world application where interrupt latency is non-zero, there may not be sufficient time to disable the peripheral before the STOPF is handled.
- Use byte-wise interrupt transfers instead of DMA. The downsides are CPU-usage and power consumption, and in some applications that do not support clock stretching this may not be possible.
- Accept that the peripheral sends out the START + ADDR immediately after the STOP (give up on trying to wait the SMBus 50us bus idle time between different masters' transmissions), and set the STOP bit to send a START+ADDR+STOP to the bus. This may be illegal for certain I2C protocols. If you're using DMA, it's no good to try to proceed with your original message because the hardware peripheral seems to latch the previous reception's NBYTES.
- Use separate I2C peripherals to implement multi-master, then you can safely allow one to slave receive while the master transmitter is being reset (workaround #1), but who has spare I/O pins?
Demo Project to Reproduce
I have attached a project that can be run easily on a Nucleo development board if you feel like recreating this issue or characterizing your own firmware's susceptibility to this bug.
I am using a Nucleo-64 board with a STM32L443RC swapped, and configured for MCO:
- Replace U5 with STM32L443RC
- Short SB16, SB50, SB54
- Open SB55, R35, R37
In order to connect the I2C1 and I2C2 ports together, the following jumpers must be added:
- Jumper CN10.21/PA9/I2C1_SCL with CN10.30/PB13/I2C2_SCL
- Jumper CN10.33/PA10/I2C1_SDA with CN10.28/PB14/I2C2_SDA
Connect logic analyzer to decode debug output on SPI:
- SPI_NCS: CN10.18/PB11/QSPI_NCS
- SPI_DAT: CN10.24/PB1/QSPI_IO0
- SPI_CLK: CN10.25/PB10/QSPI_CLK
Logic Analyzer Capture
See attached screenshot (001_Pending_START_ADDR.png) for an illustration of the problem.