2017-02-08 04:19 AM
If the master sends the adress , but doesn't send the last clk pulse to receive the ACK from the slave, the slave will not release the SDA line.
The problem is, that the ADDR-Flag only gets set AFTER the ACK was sent. If that clock pulse got lost due to an error, the slave will keep the SDA line low and not trigger any interrupt. As far as I've investigated, there is no way to find out that the STM32 is pulling the SDA line low, if a pulse gets lost during adress reception. (Slave can't pull it all the way to low, because of another resistor in line to see if slave or master pulls line down)The master already resets himself via a timeout, but the slave doesn't, as there is no interrupt triggered. Is there a way to fix this somehow else besides blind software resetting of the I2C periphal if it doesn't receive any data for 2ms?
(There are multiple I2C devices in the system, which get new data every 2ms.)The only flag set in the I2C-slave was the busy bit.
Maybe there is a way to find out that the Adress matched, but the ACK is not sent yet? That would be perfect in this case, as I could reset it superfast if there is no more clk pulse.
This one shows a pin triggered by the ADDR-Interrupt (yellow), as you can see, the interrupt gets triggered right after clk-pulse 9.
When just looking at it it seems like the master notices that the clk was not sent correctly (first image), as it tries to send the last bit again? The adress byte is supposed to be 0b01010010. If the master took all the clk pulses as valid (even the little one), this one would be 0b01010011. Nevertheless, the slave problem needs to get solved itself, as it could crash the bus.
Regards,
Max
2017-02-08 11:39 AM
What are you talking about? Is the STM32F4xx the Master or the Slave?
The shown waveforms are entirely inadequate for I2C, use stronger pullups or lower bitrate.
JW
2017-02-08 01:30 PM
Hey Jan,
I got 2 STM32F405, one master, one slave.
I know that the pullups could be smaller, but I started with 10k and noticed this error after a while.
Not sure if you understood the error, so I will explain again: If the master sends his 9 pulses (8 for adress, 1 for ACK), but the slave only receives 8 of them, it will pull the SDA line low until he gets his 9th puls. (Which will obviously never occur since the master already sent 9 of them). At the same time, there is no indication in any of the slaves registers, that the slave is pulling SDA line low right now. So I don't know how to get the slave to know that he is locking the bus right now and needs to reset. Only blind reset after timeout of no received data would work.
I can use smaller pullups and reduce the chance of that error occuring. However, but if a pulse gets lost in 200hrs of usage, the bus will just lock and not free itself. So I really want to find a solution to free the bus if that error occurs. I implemented a timeout today, which will reset the slave I2C, if it didn't get any new data for 4ms, but like that the slave doesn't know by 100% that it was him who locked the bus. (At the end of this project there will be 8 devices which receive data every 2ms, so if any of those locks, all slaves them will do a software reset, no matter if they were the ones locking the bus or not.)
Regards,
Max
2017-02-08 02:21 PM
Fix the pullups/speed issue first. This is of paramount importance: software should never be used as a primary fix for incorrect hardware design, only as a last-resort safeguard measure - and as such, it does not really matter whether all the slaves reset in such case, or only one/few of them.
Nevertheless, if you insist on resolving hardware-imposed conflicts like these: primarily, it's the master who is in charge of the bus, so it's the master who should attempt to resolve conflicts. It should time out, find the bus being locked, then perform the standard-mandated sequence for bus release (up to 9 pulses and upon SDA release perform a STOP - according to the standard, slaves are bound to restart their internal state machine upon STOP no matter in what phase it happens -- I am not sure how compliant are ST's I2C incarnations in this regard and I doubt you'll find this properly documented, so you might want to experiment with this a bit.
As we are talking about last-resort safeguards now, a timeout on slaves might be in place, too - a longer one than the master's. It's upon you what is going to be the slaves' behaviour upon timeout, whether a plain I2C-reset, or reset of the whole chip.
JW
2017-02-08 02:36 PM
Jan is right first time... I think the data rate is way too high.
10K pullups on 100KHz IIC is fine..
your data and clock signals are borderline, you need to slow your clock by 50%
2017-02-08 02:52 PM
Hey Jan,
pullups are fixed already, just wanted to find a faster solution to the locking issue by slave, if that problem occurs again (even with fixed pullups it's possible, and, you know, murphys law... )
I didn't think about letting the master fix the issue by sending more pulses, because I thought since the SDA line is low, the master will get an error when trying to send a command on the bus. I will try to program the master to give more pulses tomorrow. This would unlock the bus way faster than the slave timeout, which would be the solution I was looking for I've never heard about that sequence for bus release, do you have any more information/research stuff about errors like that, that I could look into? (Gonna search for all of that tomorrow, but if you know a good page for that )Anyways, thanks for your reply and help, I will keep you updated if the sending of more pulses with the STM32 works when the slave locks the bus.Regards,Max2017-02-08 05:24 PM
pullups are fixed already,
Don't forget that adding processors/boards/nodes to the bus increases its capacitance so you may need to reassess this later. There are bus redriver/bridging ICs available should the pullup become unbearably low.
murphys law...Rudimentary Murphology teaches us, that Murphy's laws cannot be proven by observation ('if you watch, it won't happen'). By extension. any event you build guards against won't happen ever; conversely, any future mishap is of the 'I would never have thought of it even in my wildest dreams' nature (wild dreamers have clearly an advantage as developers of high reliability systems).
I didn't think about letting the master fix the issue by sending more pulses, because I thought since the SDA line is low, the master will get an error when trying to send a command on the bus.You need to switch the pins to GPIO output (still OD) and bit-bang the needed sequence 'manually'. It's unlikely you can recover with the master's I2C module continuing where left, so you'll need to reset that module and make appropriate operations in the software too, probably also messaging to the slaves. Remember, this is an exception already.
I've never heard about that sequence for bus release,The usual reason for it is master's I2C machine being unexpectedly reset amidst an ongoing communication. See
http://www.nxp.com/documents/user_manual/UM10204.pdf
rev.6, chapter 3.1.16 Bus clear. You have already studied the Specification thoroughly, just missing this bit by mistake, haven't you? ;)But still, don't you believe the scenario you discovered is the only way to a stuck SDA, once considering Murphy... And you may also encounter a stuck SCL (say, a bug in the slave's software, leaving it in a clock-stretching state infinitely), in which case you are doomed.
The keyword you are looking for is watchdog.
JW
2017-02-09 01:48 AM
Thank you Jan, that was excactly what I was looking for
It's obviosly not the only way to stuck the SDA, but it is one way I want to be prepared for.Problem with watchdog is, that it will cause a delay, which can be avoided (at least shortened) by using the 9 extra pulses by master as described in your post.Regards,
Max
2017-02-09 01:50 AM
This would help reducing the chance of the error occuring, but not to solve the problem if it still occurs with lower speeds. Jan's second post helped fixing the error, sending up to 9 more clk pulses will free the slave in this situation