cancel
Showing results for 
Search instead for 
Did you mean: 

Dear Sir/Madam, I want to use the LSM6DSOX to recognize some gestures and there are many things I still need to understand. I bought the STEVAL-MKI109V3 and a sensor board so I can use Unico to "see" data, etc.

Cristea.Catalin
Associate III

Questions:

1. When do I use FSM and when do I use Decision Trees? Can I use both?

2. Do you call ML the use of decision trees and NOT FSM?

3. What I want is to detect gestures like if the sensor was in a pen and I wrote letters, let's say E, M, V, etc. (not necessarily all but as many as I can). Can the ML in this sensor do that? Decision Trees? FSM?

4. I tried to (almost) reproduce one of your videos for my purpose, so I did multiple captures of moving in the shape of E, and many moving like M. I used ODR52 and started capture before moving sensors in shape of letter then stopped capture. Again and again. Each movement was just above 1s so I got about 80 samples, I chose to use 52 of them.

QA: Should the many E captures be in ONE file (E.txt) as Unico appends new captures to same file, or in MANY (E1.txt, E2.txt, etc) files? It seems I can load them either way but I fear if I have multiple captures/file only first n samples from first capture may be used.

QB: Can I choose any values as "Decision Tree #1 Results? If my cases are "E" and "M" then I don't want either to be zero, so I chose 1 (E) and 2 (M) so that I should get "0" when nothing is recognized and 1 or 2 when E or M is recognized.

In fact I didn't get that at all: data always showed 2 when not moving and 1 sometimes; certainly not 0 when not moving and 1 when moving E and 2 when moving M.

Maybe one always needs an "everything else" case?

There's something else that confuses me: All the examples I saw, were about "repeating" or constant "activities" that get recognized: glance (stays glanced), running (a while), etc. so it doesn't matter WHEN the machine starts... machining.

So my question, when I want to recognize one-off gestures: how does the "machine" know when my gesture starts? Or does it continuously use a sliding window on which it runs the whole algorithm?

I think if you help me with these I may make good progress and have more questions afterwards; thank you :)

1 ACCEPTED SOLUTION

Accepted Solutions
Eleon BORLINI
ST Employee

Hi @Cristea.Catalin​ ,

Don't worry, answers and replies are the basically the same thing. You can also edit one of your previous comments and

>> I think I was wrong about the lack of directionality, as acceleration and/or angle change do "describe" movement direction.

That's correct. If you move on a plane, and for example you are writing a "L", you can know the (relative) direction. But you have to pay attention to the reciprocal orientation between the letter (a "L"), and the reference system of the LSM6DSOX accelerometer.

0693W00000Dq6EFQAZ.png0693W00000Dq6FOQAZ.png 

>> * When you say "among the three ones" do you mean "two"?

Yes you are right, FSM and ML (or Decision Tree).

>> * One comment that I already wonder about is that p.22 of "AN5259" suggests one file per "movement" (capture?); I don't get that from there. In fact it is again an example of a never-ending "class" = "walking" that should be already happening at start/stop, and I don't think that works when you want to capture shapes of letters. How would that go? I start moving like E before I click Start and I do many Es as I go down the page and then click Stop while still doing Es? The capture will start somewhere in the middle of the E. Then when I select the window length in Unico, what do I select? Won't all samples after that length be discarded from the capture?

I believe that the main issue is the one you already pointed out in one of your comments, i.e. the fact that you are trying to capture and recognize real time one single characteristic movement. A typical pattern recognition implemented in the ML is a repetitive gesture, with a fixed directionality, that helps classifying the context. Note also that you have a limited number of labels that can be implemented in the MLC (and the letters are at least 26), and the same in the case of the FSM. The FSM could be tested for recognizing, for example, the "L" sign in a pre-determined orientation, but, as anticipated before, you also have only a small set of categorization labels.

I would implement the fine gesture recognition at least with the STM32.AI (running on the MCU) and -probably- implementing the patter recognition with a neural network (which should be less sensitive to the orientation issue) in post processing.

-Eleon

View solution in original post

13 REPLIES 13
Eleon BORLINI
ST Employee

Hi @Cristea.Catalin​ ,

About the difference between FSM and ML, I suggest you have a look to the example on Github, that can show you practically when it is better to use FSM and when MLC:

Probably, for a fine gesture recognition such as the pattern resulting from writing a letter, the most relevant tool among the three ones is the ML (that is the same of the decision tree).

I believe that your procedure of collecting many gestures ("E" and "M") is the correct one. The dataset can be uploaded in either way, but probably the suggested one is to use a single .txt for a single movement, and then label them separately, as shown in the application note AN5259, from p.22.

0693W00000DpmU8QAJ.pngComing to your last question:

>> So my question, when I want to recognize one-off gestures: how does the "machine" know when my gesture starts? Or does it continuously use a sliding window on which it runs the whole algorithm?

This depends on the relevant features you set. For example, if you are in a steady state and then you start moving, if the MLC is enabled and configured as you want, it will recognize for example if the energy is above a certain threshold, then checks if the peak to peak is below another certain threshold, and so on (see the decision tree generated by your process, for the relevant nodes). As reported in the above app note, p.9: All the features are computed within a defined time window, which is also called “window length�? since it is expressed as the number of samples. The size of the window has to be determined by the user and is very important for the machine learning processing, since all the statistical parameters in the decision tree will be evaluated in this time window. It is not a moving window, features are computed just once for every WL sample (where WL is the size of the window).

In your case, for a one shot movement recognition I believe you should set the window length below the total duration of the movement, otherwise you will miss the movement itself.

If my reply answered your question, please click on Select as Best at the bottom of this post. This will help other users with the same issue to find the answer faster. 

-Eleon

Thank you Eleon, It will take me some time to go through the information you pointed out. I do have some, hopefully quick questions: * When you say "among the three ones" do you mean "two"? * One comment that I already wonder about is that p.22 of "AN5259<>" suggests one file per "movement" (capture?); I don't get that from there. In fact it is again an example of a never-ending "class" = "walking" that should be already happening at start/stop, and I don't think that works when you want to capture shapes of letters. How would that go? I start moving like E before I click Start and I do many Es as I go down the page and then click Stop while still doing Es? The capture will start somewhere in the middle of the E. Then when I select the window length in Unico, what do I select? Won't all samples after that length be discarded from the capture? Thanks, Cat Sent from my computer on my desk ;)
Cristea.Catalin
Associate III

I read more of the docs you suggested; thank you.

While I think I understand better what's happening, I found ZERO examples of recognizing even the smallest sequence of specific gesture sequences, like move down, then move right, to recognize an "L" for example.

NONE of the features in AN5259 section 1.3 can have ANY directionality; they are all about magnitudes and zero crossing. So how can this be used to detect "movement down"?

Another BIG problem is that I need the DT to start at the right time. Not AFTER the letter movement started. Not LONG BEFORE. JUST BEFORE IT STARTS. There's ONLY ONE, not like ALL your examples (walk, run, etc.)

I don't see how this would be done.

Please understand that I don't need to limit myself to what UNICO provides, but I need help as to HOW would this be done, and how to test it with your MKI109V3.

I suspect I could use the microcontroller to "tell" the chip to start NOW based on a GPIO, OR I could use a tree with a very small window (so it's fast/short duration) to determine an energetic gesture to mean "start now". Could the LSM6DSOX be configured to start one tree based on the result of another?

Could I use 1 fast/short tree to trigger the start of all the other trees in parallel, to recognize one of 15 gestures? (If there are max 16 trees?)

If all of this is true, then what I need is something like this:

At start, check if moving down. If No, return. If yes, check if moving right. If No, return. If yes, continue. Check if movement stopped. If No, return. If yes return "L". STOP.

Can this be done? Am I missing something? Is there a better way?

Thanks,

Cat

Cristea.Catalin
Associate III

I don't know why my "REPLY" above was changed into an "ANSWER" it is NOT an answer; it is clarifying why the previous answer(s) have not been accepted.

If possible please change all my posts from "ANSWER" to "REPLIES".

It doesn't make sense for the forum of such a great company to be so limited and closed-minded.

Also I tried to delete my previous post thinking I did it the wrong way and I couldn't.

Posters should be able to delete their own posts.

Cristea.Catalin
Associate III

I think I was wrong about the lack of directionality, as acceleration and/or angle change do "describe" movement direction.

But I need to start sampling before/as that first movement starts, then recognize that it happened, and check that the next movement towards a different direction happens, and if it happens go on until all movements to make an L or M are recognized.

None of the example I found seem to check for sequences or they don't explain anything. I did not find any FSM example/webinar yet that actually show how to do it; just ready made C and UCF files. Can you point me to any that explain how it's done?

Eleon BORLINI
ST Employee

Hi @Cristea.Catalin​ ,

Don't worry, answers and replies are the basically the same thing. You can also edit one of your previous comments and

>> I think I was wrong about the lack of directionality, as acceleration and/or angle change do "describe" movement direction.

That's correct. If you move on a plane, and for example you are writing a "L", you can know the (relative) direction. But you have to pay attention to the reciprocal orientation between the letter (a "L"), and the reference system of the LSM6DSOX accelerometer.

0693W00000Dq6EFQAZ.png0693W00000Dq6FOQAZ.png 

>> * When you say "among the three ones" do you mean "two"?

Yes you are right, FSM and ML (or Decision Tree).

>> * One comment that I already wonder about is that p.22 of "AN5259" suggests one file per "movement" (capture?); I don't get that from there. In fact it is again an example of a never-ending "class" = "walking" that should be already happening at start/stop, and I don't think that works when you want to capture shapes of letters. How would that go? I start moving like E before I click Start and I do many Es as I go down the page and then click Stop while still doing Es? The capture will start somewhere in the middle of the E. Then when I select the window length in Unico, what do I select? Won't all samples after that length be discarded from the capture?

I believe that the main issue is the one you already pointed out in one of your comments, i.e. the fact that you are trying to capture and recognize real time one single characteristic movement. A typical pattern recognition implemented in the ML is a repetitive gesture, with a fixed directionality, that helps classifying the context. Note also that you have a limited number of labels that can be implemented in the MLC (and the letters are at least 26), and the same in the case of the FSM. The FSM could be tested for recognizing, for example, the "L" sign in a pre-determined orientation, but, as anticipated before, you also have only a small set of categorization labels.

I would implement the fine gesture recognition at least with the STM32.AI (running on the MCU) and -probably- implementing the patter recognition with a neural network (which should be less sensitive to the orientation issue) in post processing.

-Eleon

Cristea.Catalin
Associate III

Thank you Eleon,

I do not plan to recognize all letters; it was just the easiest way to explain what I want.

Indeed, I would use an MCU to get more; but I do want to be able to use the built-in features to the maximum.

SO let's concentrate on that, please:

  1. Can MLC be used to classify specific sequences as needed for my application? I see no way to do it with Unico, but a decision tree should be able to "go" from one movement to another IF timing can be somehow involved: I need to check for movement right AFTER the down movement ended. HOW?
  2. How do I start the machine at the right time?
  3. I think FSM can, but:
    1. Is there an easy way to set the thresholds/masks from sampled movements; perhaps in some GUI way? Learning these conditions from captured movements? I don't see how to do it in Unico, but see no reason why it couldn't. The "language" you propose is very cumbersome
    2. Intuitively, what can MLC do FOR FSM, and how do you use them together? It seems you have some examples that I'll go through but the explanations I found so far were not very intuitive. It would be nice to have a paragraph or two explaining WHAT and HOW to use them together.

Thanks again,

Cat

Eleon BORLINI
ST Employee

Hi Cat @Cristea.Catalin​ ,

I understand the problem and it is not easy to solve in "IoT" mode.

If you want to go deep in the MCU AI capabilities, I suggest you to in the STM32 Machine Learning & AI section of the community, where experts could help you much better than me on this topic.

On sensor side, probably using FSM is a good solution, but for complex movement you might have to be able to couple different movement in sequence: when for example you want to recognize a, you can implement the Motion-Stationary detection FSM, that "triggers" the FourD position recognition for recognizing another specific position in space, and then again the FourD position recognition for recognizing another position.

Otherwise you could use a sequence of MLCs in a similar way.

You should think to detect the starting position (for example at the beginning of the gesture you want to recognize you are at rest), and then the specific gesture with another MLC. You should manage them at application level.

-Eleon

Cristea.Catalin
Associate III

Thank you Eleon,

From what I can see so far any AI/ML only applies to the decision tree (DT), not to the FSM.

From all the examples it seems that ML/DT can't do specific sequences as all the decisions are evaluated at one point in time (I might be wrong?); even though it's not quite clear to me how activities can be distinguished that way.

Anyway, it seems the FSM is what I need to do (even if I may include DT in it), and I wish there was some ML way to get at the FSM design rather than manually input commands and registers that are worse than assembly.

I found a "Wrist navigation gestures" example that should be helpful but it's not explained at all. I think the README should actually explain what each line in the FSM (in UNICO) does; THAT would be really helpful!

Thanks