2021-05-04 12:37 AM
Hello everyone!
Got lis3dh and lps22hb on the custom board with F411 MCU. Use FIFO modes with both sensors. After getting interrupt on INT1 (lis3dh) and DRDY (lps22hb) pins MCU executes samples reading using DMA. Also got mutex for handling SPI bus access.
Ok, after power ON the board everything works fine for weeks or even months, but sometimes after MCU reboot (cause of IWDG or hard fault or because of fw update) sensors stop working and when reading a set of registers I receive strange data. Please, refer to the pictures below. Each row of the table corresponds to what was read from the sensor with an interval of 30 seconds.
I've read data sheets and application notes many times and am aware of reboot, reset and other features and commands that might be used in despair with this undefined behavior. Only device power off helps.
Can anyone help me?
2021-05-04 02:25 AM
Hi @Community member ,
this is indeed a strange behaviour.
first thing I would like to ask you is: is there something different you do at the restart with respect to normal start-up?
it is common that some portion of memory are not cleaned properly at start-up and sometimes the problems arise only when restarting.
you wrote you are aware of reboot and reset practices, but there is something that could be different (try to check the MCU memory and see if it is always the same).
focusing on the data in the registers, there seems to be a problem in the communication.
the WHO_AM_I register cannot change its value, so the problem is not in the sensor itself, but in the communication.
are you able to intercept these SPI communication with an oscilloscope to check if all the lines behave correctly.
(I suspect there could be a synchronization problem)
last bit of advice I can give you is to "patch" it without fixing it.
if you read a wrong WHO_AM_I at the restart (maybe checking twice is better in your case, but you can know better than me the occurrence of the problem) you can reboot the system.
(I would consider this a last resource, only if we can't find the real problem)
hope this helps
Niccolò
2021-05-04 03:40 AM
Hi, thanks for the quick response!
>>> ... is there something different you do at the restart with respect to normal start-up?
> Sorry, I'm not sure that I fully understand the question, so I just tell what's happening after system reboot (__NVIC_SystemReset):
1. I initialize CSs, INT1 and DRDY pins and deselect the slaves.
2. I initialize SPI bus and then DMA.
3. Then I read WHO_AM_I registers, and if I haven't got the right values I start sending messages with a 30 seconds period which contains registers values to figure out what's going on.
I think it's important to point out that the boards are powered constantly and therefore MCU reboot doesn't affect sensors.
>>> ... you wrote you are aware of reboot and reset practices, but there is something that could be different
> I simply meant the sensors reboot and reset operations which for lps22hb described in the datasheet at page 39 (DocID027083 Rev 6). And I do reset operation after every MCU reboot so that way I don't worry about current sensor configuration if I apply new features that interfere with current registers setting.
>> ... try to check the MCU memory and see if it is always the same
> Could you please kindly clarify what part of MCU memory I should examine?
>>> focusing on the data in the registers, there seems to be a problem in the communication.
the WHO_AM_I register cannot change its value, so the problem is not in the sensor itself, but in the communication.
are you able to intercept these SPI communication with an oscilloscope to check if all the lines behave correctly.
> I got the troubled devices that are working right now, but unfortunately they are far away from where I'm now, and I'm able to just update their FW - BTW I tried to disable one of the sensors (never select it during system run), and still got the same behavior. It's interesting though to capture what's on the data line if no one sensor is selected, I'll do that.
>>> last bit of advice I can give you is to "patch" it without fixing it...
> I spend so much time fixing this problem, so I agree to add any kludge just to leave it and start doing further data analysis. But the thing is that once sensors stop working only physical power OFF could solve the issue, and, cause of mentioned earlier far distance (thousands of miles) between me and devices, I gotta solve the issue remotely.
Anyway, I'm glad that someone answered my call for help.
Regarding register values that I'm getting from the sensors - if you take a closer look at the values you'll see that they are repeated, and is there a chance that cause of system reboot (which probably interrupt DMA transfer) sensor would stay in the state where it's expecting to only transfer data and ignore read/write cmds? Maybe because of the mechanism that increments address during FIFO buffer read process I just continue reading FIFO buffer that had place before MCU reboot?
Thanks!
2021-05-04 07:07 AM
Hi @Community member ,
don't worry, we're here to help you =)
ok, I did not understand that the board is always powered, so after your MCU reboot you can't power down the sensors before restarting.
anyway, I'll try to be as orderly as possible
a) these 3 steps seem right to me. only thing I would ask is what do you mean by initializing the pins, but it should be right anyway, because the sensors work the first time.
b) setting the BOOT bit is also good practice, but my concern was toward another problem: usually in my firmwares I set flags and modify things that can change the behaviour of the program itself through if-else statement and so on. maybe some of the initialization is not performed right because of some flags like these. (I know it should sound silly, but I had a problem caused by this). so, are you sure that the initialization is the same at power-on and at reboot?
c) I'm talking about the memory dump I usually do with STM32CubeProgrammer, the program I use to load the fw on the MCU, but the more I think about the possibility of having memory not allocated and so on the more I realize it should not cause this type of problem, so don't worry about this.
d) yes, checking what happens on the data line without selecting any device should help us.
e) yes, I understand that without being able to powering down the whole system, this problem is pretty difficult to solve. what did you try to "further reset" the system? I mean, did you try to reinitialize the SPI bus and DMA, for example?
f) regarding the register values, I did not see the repetition at first, and it could be interesting to know how this pattern changes. for example can you try to read with higher rate, to understand if it changes at every read, every second, or whatever? I don't think that the bit for autoincrementing the FIFO buffer could be the problem, because it needs a read to be performed on the FIFO to be incremented, and you are not reading it.
not being sure where the problem is (I think it's in the communication, but I could be wrong) I'm adding tags for SPI and MCU to the topic, so that someone from other sections can help you as well
Niccolò
2021-05-04 08:42 AM
Hi, @niccolo.ruffini ,
Thank you for the active collaboration!
a) By pins initialization I meant GPIO configuration process, as you said, yeah, there probably everything is fine.
b) I think it's a good point. You mean like for example RTC initialization - if your LSE doesn't work then you switch module clocking to a LSI based on the flag that would be set if an external oscillator is absent or broken, right? But in my case there're sometimes hundreds of reboots before sensors stop working. So If it is an SPI module issue - then how to make sure that the problem arises during peripherals' initialization - probably adding additional checks before sending/receiving something over the bus would help, I just have to understand what flags or register I should check. Anyway I've made a test once with the board that with me right now - I set the board to reboot every time the DMA starts transferring data to/from SPI, so the board was rebooting the whole night - and even then, there was no trouble with the sensors.
c) Ok.
d) So I start sending and receiving bytes without selecting any device, and find out that I receive not 0xFF, but a variety of values. Then I increased the communication baud rate (f(pclk)/256 -> f(pclk)/16) and the values start being with one 'F' or zero, like 0xFF, 0x1F, 0x00, 0xF0.
e) Yeah, as I mentioned earlier I got the ability to remotely reboot the device (therefore and SPI and DMA modules too) and I did it many times to see any progress.
f) I meant that for example if you look at the lps22hb values in the table above, then at least two of the registers' values are equal. In d) point I mentioned the results using higher SPI baud rate.
Thanks!
2021-05-04 09:44 AM
@niccolo.ruffini
point d) continuation:
So basically I got some signal on the bus that corrupts the data during transfer. So the picture below illustrates what I said earlier about received data with the higher baud rate:
I capture fronts, ends or either the body of the pulses. But I don't understand - what's the source of these pulses. I don't select any device, and there's nothing in the code that could use SPI bus, except the disabled sensors related routines. Could you please advise if the sensor (lis3dh or lps22hb) at any condition could drive MISO line, even if it isn't selected by master?
2021-05-05 12:55 AM
@Community member
regarding the continuation of point d), can you provide data of clock and CS too? (I would say to check also MOSI to be sure it is not changing for some strange reason)
the sensors should respond only if it selected by the CS and there is a clock
Niccolò
2021-05-05 01:05 AM
@Community member
b) I asked a colleague for help and he suggested to not setting the BOOT bit. instead, you should dump all the registers of the sensors before rebooting and write them back after the reboot. this is safer than setting the BOOT pin, because the hw inside the sensor may be susceptible to the variation and temporization of other lines.
anyway, if you manage to reproduce the problem on the board you have, it is very good, so let's hope it fails =P
f) ok, but we are not sure what happens between a read and the next one (it would be nice to check)
Niccolò
2021-05-05 01:45 AM
Hi @niccolo.ruffini,
The picture in my previous message is just the attempt to interactively display what is happening on the MISO line and not the capture of the data line of the troubled device.
I will intercept SPI communication (on the device that is with me right now) with the logic analyzer and share the results a little bit later.
>> the sensors should respond only if it selected by the CS and there is a clock
Yes, as I understand there's a tri-state buffer on the sensors' data pins and as long as CS is HIGH the pins are high impedance. So there's no sensor's condition which would break the rule of going Hi-Z if CS goes high, right?
2021-05-05 01:49 AM
Hi @Community member ,
ok, sorry, I misunderstood the picture.
>>>So there's no sensor's condition which would break the rule of going Hi-Z if CS goes high, right?
Niccolò