iis3dwb sample losses problem

VSavenia · ‎2022-09-01

Hi, I have been working with iis3dwb since 2020 and a week ago me and my colleague found something that looks like a design mistake from ST side, we found something like sample losses during FIFO buffering I am not sure but further I will try to provide as much info as possible to be understood. There will be a lot of details because the question isn’t easy.

Prehistory:

Me and my team designed a vibration sensor(further VTS) based on iis3dwb mems accelerometer (spoiler: it’s great, but…) and we are happy about it. Until recently we thought everything is ok and data reading is working as it should be, I did a lot of tests to see if we have some sample losses, I tested it on 8192 samples num for each axis due to internal uC memory limitations (and this amount was enough to calculate FFT and then acc RMS, velocity RMS… with required frequency dimension and accuracy). But this was the first step in production of VTS, then we installed 8MB RAM to store 1minute, 2 minute etc amount of RAW data to send it further to some “math processing unit�? and in these large files with raw data we found something that looks like sample losses.

A couple of remarks before we dive in:

1. For tests we are using vibration shaker TIRA TV51110, control system for the shaker and reference analogue sensor verified by laser method as a feedback sensor to provide high-quality tests.

2. I guarantee that data in the final csv file we analize is bit-to-bit the same as it was snapped from iis3dwb. Why am I so confident about it: we are using MD5 hash before writing data to the file after we received it from iis3dwb (we set up file system inside VTS), we are using MD5 hash to file after we wrote all required amount of data, comparing them, then if everything is ok we transfer the file from VTS to a PC, again producing MD5 hash for received file on PC side and comparing with the file hash on VTS side.

There you can rightly say that the problem could be between iis3dwb and uC on the spi line during the transaction but further you will see that it’s not.

3. To transfer data from iis3dwb we set up spi for 8 MHz (also tried on 4 MHz, 10 MHz..).

4. You can rightly ask a question about how we found it in millions of data and the answer is – waterfall. We load our csv file to PTC Mathcad then we calculate the “waterfall�? 3D chart (you will see the pictures) and with this chart you can find a needle in a haystack.

5. Datasheet and application note were read a hundred times and the init/setup routine was done as it should be according to the docs. I can share source files if needed, no problem, but you will see that it doesn’t look like a set up mistake (10-20 samples lost in hundreds of thousands of data without any signs of repeatability doesn’t sound like code mistake right).

P.S. Timestamp batching is disabled in “release�? because it produces more sample losses.

6. SETTINGS: iis3dwb stores data in FIFO in continuous mode with 26.67k ODR (ROUNDING is on), 3-axis mode is used, tstamp batch to FIFO is off, BDU is off, 16G range mode is on, irq open drain, active low, drdy pulsed mode, default bandwidth.

Okay let’s dive into details. We’ve done a lot of tests to call glitches -> sample losses and here they are:

1. We put our raw data with 262144 sample num (almost 10 sec signal) to mathcad to do some math processing. As a signal we had 100Hz sinus, 10G amplitude, we made a waterfall chart and counted 17 glitches. We were like “WTF?? ok let’s find out�?.

2. Firstly we thought that this is a vibration shaker behavior, then we snapped the signal from the reference analogue sensor and there everything was ok. Then we started to look for our-driver-bug(code mistake etc.) or iis3dwb incorrect work.

3. We read DS and AppNote, reviewed code to be sure we prepared iis3dwb as it should be, and everything was OK, nothing was missed. We have traps for ovr and full_ia flags and we do not get into them. WTM level is equal to 128 samples. AcquireRawData procedure consists of simple steps: switch from bypass -> continuous, wait for WTM interrupt, read current fifo lvl, burst read(currFifoLevelNum), parse and wait for the next interrupt, when required amount of data is acquired -> switch back to bypass mode and wait until a new request will come.

4. To check it is not fifo overflow we have flag traps but additionally we’ve put in one output buffer acceleration data and in second buffer 1/0 flags at the beginning of data reading from fifo to mark fifo level at the moment of reading data.

And there we found: glitches were inside of the data package from fifo not on corners(which could give rise to various hypotheses), fifo fullness was at ~25% of capacity (we set watermark level at 128) and reading procedure works perfectly at watermark level we set.

5. Then when we tested tstamp feature batching to FIFO to make sure we have 37uS difference between samples and here we confirmed our theory there were differences between samples != 3 (3 is equal to 37.5 uS as I get), and started to look into the “sample losses�? side more and more. But then we again looked at the waterfall chart and found an interesting moment: with tstamp_en we have significantly more glitches idk why and very noisy signal on low frequencies. We turned it off to continue our research, and started to test other iis3dwb modes and features.

6. We switched from 3-axis mode into 1 axis mode called IIS3DWB_ONLY_Y_ON_ALL_OUT_REG it’s important!!! because in IIS3DWB_ONLY_Y_ON_ONE_OUT_REG -> nothing changes and there was a light at the end of the tunnel, in a set of 262144 samples we found nothing, signal was perfect, but…

7. We recorded a set of 4 million samples num ~8Mb raw data file and found 3 glitches. Yes it’s way better than before but we still have sample losses and we want it to be crystal clear (i.e. w/o sample losses). At this moment we were sure that it’s not spi issue, or code mistake or shaker behavior – everything points to the fact that the problem lies within the sensor.

8. To check SPI we switched from 8 MHz to 4 ->nothing changed, switched to 10 MHz -> glitches still there, then back to 8. Activated BDU feature -> nothing changed. And after all previous tests it didn't seem like spi bug. Because in this case, from setting tstamp on or from changing the mode of the number of axes, we would see the same picture, wouldn't we?

9. We looked at Figure 9 in DS, found this phrase “...The value is expressed as a 32-bit word and the bit resolution is 12.5 μs.�?. And we were like “why 12.5? Do they actually put three A2D for each channel x,y,z or just 1 multiplexing it with 26.67k * 3 speed?�?

I think there is a problem in a sensor design. I attach links to ADI article (https://www.analog.com/ru/analog-dialogue/articles/demystifying-data-acquisition-systems.html). It is very close to what we have.

And at the end of my post I want to ask you a couple questions:

Q.1. How can we fix this issue to get a raw signal without sample losses?

Q.2. How many A2D elements are installed in fact? How many Sample and Hold elements are installed in fact?

Sorry If I missed something because there is a lot of info, I will try to provide any additional information if required.

Regards,

Vladislav Savenia

VSavenia · ‎2022-09-01

@Eleon BORLINI Heeeelp please:D

VSavenia · ‎2022-09-01

I can share these .csv files with raw data and Mathcad files too, I have a bunch of them :)

Eleon BORLINI · ‎2022-09-02

Hi @VSavenia ,

I'm submitting your findings directly to our ASIC design team, if only for the huge amount of details in this post ;)

By the way, did you reproduced this issue on all the samples you tested? And did you try to acquire directly bypassing completely the FIFO? You might miss some samples in your procedure, this is compatible with the fact that on a single axis you are almost clean.

I'll keep you posted.

-Eleon

VSavenia · ‎2022-09-03

Hi @Eleon BORLINI !

Yes, I reproduce this issue at least for three devices, results are the same.

I had the idea to try to get data directly missing fifo, but there were 2 moments: 1st - we are not interested in such a working mode, 2nd - as an engineer ofc I was interested to provide this test but me and my team spent a lot of time to do what we did and we were limited in time and decided to test the main hypotheses :)

The procedure of retrieving data out from fifo for 3-axis mode is the same as for 1-axis mode, it's enough just to set IIS3DWB_ONLY_Y_ON_ALL_OUT_REG mode to start getting "better data". I can share the code with you if needed and you will see that the procedure is ok.

Please pay attention also to the fact when I turn on tstamp_fifo_batch data getting worse, the only one change enough to get worse data, and the same situation with axes mode vice versa. This is strange.

Regards

VSavenia · ‎2022-09-22

@Eleon BORLINI Eleon, any info?

Eleon BORLINI · ‎2022-09-30

Hi @VSavenia ,

our designers confirmed me it should be a problem related to missing samples, probably due to the digital communication management (at FW level).

-Eleon

VSavenia · ‎2022-10-03

Thank you for your efforts, Eleon!

I think it's time to create some Errata Sheet.

dns13 · ‎2024-12-09

Is this fixed by now? Can't find anything about that in the docs. Still no Errata Sheet.