2019-02-14 11:07 AM
Our setup has 6 VL53L1X sensors connected via TCA9545A multiplexors to STM32F303 host. We started with STSW-IMG007 API, but later switched to STSW-IMG009 ULD API.
First, the statement that ULD API has "only 4 files" is misleading. The code does not compile with only files included into API package. The example in X-CUBE-53L1A1 links to "fat" driver folder to get an access to about 6 additional .h headers, and then excludes it from compilation.
Question 1: Is this the way ULD driver supposed to be used?
Second, and most important, is that the sensors stop responding after some random time. Usually it is VL53L1_ERROR_CONTROL_INTERFACE returned from VL53L1_RdByte() function. Sometimes it happens after 20-30 min, and sometimes it is as short as 2 min. Note, that we observed same problem with "fat" driver as well, which was the reason to switch to ULD. We also tried Arduino boards with both Arduino and Pololu drivers and VL53L0X sensors. The result is always the same - they work for a while and then getting "stuck" at some point.
Question 2: What could be wrong and how to ensure reliable long-term operation?
2019-02-25 03:28 PM
The right answer is that the ULD driver should provide the missing files.
The X-cube however should give you a choice of the X driver or the ULD. (But I like your term 'fat'.)
This is our mistake and both are being fixed.
Your other issue is an I2C issues. I2C is VERY susceptible to noise, and if a bit of noise triggers a transaction on the clock line, you get an unintended extra bit, and you will end up with the famous 'bus stuck low' problem. By design the clock should be high went not transmitting, and if you ever see the clock line stuck low, it's due to noise on the line.
Fixes for the 'bus stuck low' are the traditional ones for noise. Decoupling caps, shorter wires, playing with the driver strengths, more ground between traces are all good ideas.
However if you realize you are ever stuck this way, if you can force the clock of 8 '0' bits, you can clear the interface.
This is not a VL53 error, but an error with all I2C, but because the VL53 need to be at the edge of your design, we see it a lot.
2019-02-25 03:57 PM
I haven't thought of checking the state of I2C lines when problem occurs, thanks for the idea. I'll do it ASAP and post the result.
2019-02-28 12:40 PM
In order to test "I2C issue" theory we removed multiplexor from the schematics and connected two VL53L1X sensors directly to STM32F303. The code has been modified to assign different address to one of the sensors. I2C lines were monitored with oscilloscope.
After about 30 min one sensor stopped working. The other sensor continued working just fine and oscilloscope confirmed normal bus operation, no "stuck" lines.
In debugging we traced the problem to HAL_I2C_Master_Transmit() call, which was returning HAL_I2C_ERROR_AF ("no ACK") error code when called for the faulty sensor.
All this does not look like I2C problem to me.
2019-02-28 01:26 PM
The "no ACK" occurs when none of the slaves on the I2C bus respond to the address transmitted by the master.
Clearly that sensor could be dead, but the more likely scenario is a glitch on the line changed the address. If no chip were addressed, then the bus would not be held by the slave. (No Bus stuck low problem.)
If I'm correct the sensor would work just fine if you simply retried the call. Is that possible?
But before I believe that a sensor would just stop working, may I ask you to try one more thing.
Shorten up the wires as much as you can and, if practical, twist the ground and the clock line.
By putting space between the clock and data, you might find the sensor works a lot better.
Here is the code we use to clear the bus if it ever gets stuck - assuming you are still using the STM32.
(You might need to do some tweaking here depending on how you have configured your MCU.)
Even if you don't use it for this issue, it illustrates the general solution to all I2C glitch bus hangs.
void _I2cFailRecover(void){
GPIO_InitTypeDef GPIO_InitStruct;
int i, nRetry=0;
// We can't assume bus state based on SDA and SCL state (we may be in a data or NAK bit so SCL=SDA=1)
// by setting SDA high and toggling SCL at least 10 time we ensure whatever agent and state
// all agent should end up seeing a "stop" and bus get back to an known idle i2c bus state
// Enable I/O
__GPIOB_CLK_ENABLE();
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_8, GPIO_PIN_SET);
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_9, GPIO_PIN_SET);
GPIO_InitStruct.Pin = GPIO_PIN_8|GPIO_PIN_9 ;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_OD;
GPIO_InitStruct.Pull = GPIO_PULLUP;
HAL_GPIO_Init(GPIOB, &GPIO_InitStruct);
//TODO we could do this faster by not using HAL delay 1ms for clk timing
do{
for( i=0; i<10; i++){
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_8, GPIO_PIN_RESET);
HAL_Delay(1);
HAL_GPIO_WritePin(GPIOB, GPIO_PIN_8, GPIO_PIN_SET);
HAL_Delay(1);
}
}while(HAL_GPIO_ReadPin(GPIOB, GPIO_PIN_9) == 0 && nRetry++<7);
if( HAL_GPIO_ReadPin(GPIOB, GPIO_PIN_9) == 0 ){
__GPIOA_CLK_ENABLE();
//We are still in bad i2c state warn user by blinking led but stay here
GPIO_InitStruct.Pin = GPIO_PIN_5 ;
GPIO_InitStruct.Mode = GPIO_MODE_OUTPUT_PP;
GPIO_InitStruct.Pull = GPIO_NOPULL;
HAL_GPIO_Init(GPIOA, &GPIO_InitStruct);
do{
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_SET);
HAL_Delay(33);
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_RESET);
HAL_Delay(33);
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_SET);
HAL_Delay(33);
HAL_GPIO_WritePin(GPIOA, GPIO_PIN_5, GPIO_PIN_RESET);
HAL_Delay(33*20);
}while(1);
}
}
2019-02-28 02:37 PM
The repeated calls to same sensor continue returning ACK error. But the sensor is not dead, since restarting the application resolves the problem, temporarily. Furthermore, we bought couple dozen of them and tried all in various combinations. It is completely random which one will stop first, but it is almost guaranteed that if we leave application running overnight by the morning they all will stop responding, one by one.
For the testing the sensor boards are connected to MCU by short 3" flat cables, although SCL and SDA lines do go next to each other.
Also "resetting" I2C bus does not look like very practical solution, considering that sensors fail one at a time and the other still working. For this reason in our recovery code we simply pull XSHUT of the faulty sensor low and then reinitialize it. This approach works, but since in our application the array of sensors is used in safety-critical collision avoidance module we were hoping for more reliable solution.
2019-03-01 07:52 AM
Clearly you are on the edge of something, It works for a while and then does not. I'm still betting it's the I2c interface. As justification we have shipped 700million devices similar to these with almost no returns. (However I must admit that a lots of them go into only a relatively small number of designs.)
If I'm right and it's the I2C interface, let us attempt to make it worse. Instead of doing the range, can you try continuously polling for the chip ID or re-initializing the chip? By continuously pounding the I2C and having the Sensor basically do nothing except respond, you will at least eliminate the I2C possibility.
The other thing you might check is the voltage regulator. When ranging this chip uses a lot of power. If it's starved for power, you might run into trouble. To be safe, a 40mA regulator is suggested.
Other things I would look at are voltages on VCC and compare those to the I2C voltages.
2019-03-01 08:46 AM
Yes, I know that this sensor has been on a market for quite a while, that's what makes it strange. Although I did a web search for "sensor stops responding" and found several threads describing similar behavior, all of them without any definite resolution. But all those threads mentioned either Adafruit or Pololu drivers, so not applicable to our situation, since we are working with STM libraries.
The sensors are powered from dedicated 3.3 V 600 mA regulator, which should be sufficient. The host is on NUCLEO-144 board.
Since the error is random I cannot give definite procedure. However if this can help, the majority of the observed cases were in the following conditions:
Very often this is enough to get one or more sensors stuck.
I will try your suggestion for address reading and post the results.
2019-03-21 03:42 PM
Per your suggestion I replaced the actual ranging request with just an address (verified with laser tag that there are no IR emissions). The setup run without errors for couple days, so I was getting ready to report that there is some fundamental problem with the sensors. And then we got two errors in a space of several minutes, for no apparent reason. This made the whole test ambiguous again.
Out of frustration I replaced 10k I2C pullups on the sensor boards with 1k resistors. Everything (including ranging) worked flawlessly for 5 days. So, it seems you were right from the start, it was indeed I2C problem. Although I cannot imagine how 10k would not be sufficient for 3" wiring.
Anyway, we are moving from breadboard to prototype now. I'll try to post update here with final results. Thanks for your help!
2019-03-21 04:07 PM