cancel
Showing results for 
Search instead for 
Did you mean: 

Model Zoo Hand Posture and Motion Detection

ganbolede
Associate

Hello,

I am trying to implement a gesture recognition application.

I am using 53L8A1 to collect data.

And found out the model zoo supports Hand Posture using 2D CNN. In the model zoo is uses minimum distance, max_distance and background distance. Based on the normalization it follows:
X1_norm[zheaders["distance_mm"]] = (X1[zheaders["distance_mm"]] - 295) / 196
X1_norm[zheaders["signal_per_spad"]] = (X1[zheaders["signal_per_spad"]] - 281) / 452

I would like to know how number 295, 196 for distance and 281, 452 is calculated.
Since i am trying to detect dynamic gesture as well with motion.
It possible to do it in 3DCNN with 8x8x2xNumber Of Frames. Where 2 are the distances and the signal per spad?

1 ACCEPTED SOLUTION

Accepted Solutions
labussiy
ST Employee

Hello,

 

Actually, these values have been calculated on the public ST dataset, they are used to normalize the data between -1 and 1.

In a perfect world, we should update it for each new dataset.

Another approach could be to use the min/max distance as the normalization parameters.

 

Regarding your question on the model to detect dynamic gesture, I will be very honest, I never tried to use a 3D CNN. Why not trying a RNN (Recurrent Neural Network)? 

Also, if you are planning to deploy your model on STM32, you should check which topology is supported by the STM32Cube.AI.

 

 


Our community relies on fruitful exchanges and good quality content. You can thank and reward helpful and positive contributions by marking them as 'Accept as Solution'. When marking a solution, make sure it answers your original question or issue that you raised.

ST Employees that act as moderators have the right to accept the solution, judging by their expertise. This helps other community members identify useful discussions and refrain from raising the same question. If you notice any false behavior or abuse of the action, do not hesitate to 'Report Inappropriate Content'

View solution in original post

2 REPLIES 2
liaifat85
Senior III

With 8x8x2xNumber of Frames, you can represent an 8x8 grid (reflecting spatial resolution in each frame) across two channels (distance and signal per SPAD) over a time series of frames (Number of Frames).

labussiy
ST Employee

Hello,

 

Actually, these values have been calculated on the public ST dataset, they are used to normalize the data between -1 and 1.

In a perfect world, we should update it for each new dataset.

Another approach could be to use the min/max distance as the normalization parameters.

 

Regarding your question on the model to detect dynamic gesture, I will be very honest, I never tried to use a 3D CNN. Why not trying a RNN (Recurrent Neural Network)? 

Also, if you are planning to deploy your model on STM32, you should check which topology is supported by the STM32Cube.AI.

 

 


Our community relies on fruitful exchanges and good quality content. You can thank and reward helpful and positive contributions by marking them as 'Accept as Solution'. When marking a solution, make sure it answers your original question or issue that you raised.

ST Employees that act as moderators have the right to accept the solution, judging by their expertise. This helps other community members identify useful discussions and refrain from raising the same question. If you notice any false behavior or abuse of the action, do not hesitate to 'Report Inappropriate Content'