Model Zoo Hand Posture and Motion Detection

ganbolede · ‎2024-11-09

Hello,

I am trying to implement a gesture recognition application.

I am using 53L8A1 to collect data.

And found out the model zoo supports Hand Posture using 2D CNN. In the model zoo is uses minimum distance, max_distance and background distance. Based on the normalization it follows:
X1_norm[zheaders["distance_mm"]] = (X1[zheaders["distance_mm"]] - 295) / 196
X1_norm[zheaders["signal_per_spad"]] = (X1[zheaders["signal_per_spad"]] - 281) / 452

I would like to know how number 295, 196 for distance and 281, 452 is calculated.
Since i am trying to detect dynamic gesture as well with motion.
It possible to do it in 3DCNN with 8x8x2xNumber Of Frames. Where 2 are the distances and the signal per spad?

labussiy · ‎2024-11-18

Hello,

Actually, these values have been calculated on the public ST dataset, they are used to normalize the data between -1 and 1.

In a perfect world, we should update it for each new dataset.

Another approach could be to use the min/max distance as the normalization parameters.

Regarding your question on the model to detect dynamic gesture, I will be very honest, I never tried to use a 3D CNN. Why not trying a RNN (Recurrent Neural Network)?

Also, if you are planning to deploy your model on STM32, you should check which topology is supported by the STM32Cube.AI.

Our community relies on fruitful exchanges and good quality content. You can thank and reward helpful and positive contributions by marking them as 'Accept as Solution'. When marking a solution, make sure it answers your original question or issue that you raised.

ST Employees that act as moderators have the right to accept the solution, judging by their expertise. This helps other community members identify useful discussions and refrain from raising the same question. If you notice any false behavior or abuse of the action, do not hesitate to 'Report Inappropriate Content'

View solution in original post

liaifat85 · ‎2024-11-09

With 8x8x2xNumber of Frames, you can represent an 8x8 grid (reflecting spatial resolution in each frame) across two channels (distance and signal per SPAD) over a time series of frames (Number of Frames).

labussiy · ‎2024-11-18

Hello,

Actually, these values have been calculated on the public ST dataset, they are used to normalize the data between -1 and 1.

In a perfect world, we should update it for each new dataset.

Another approach could be to use the min/max distance as the normalization parameters.

Regarding your question on the model to detect dynamic gesture, I will be very honest, I never tried to use a 3D CNN. Why not trying a RNN (Recurrent Neural Network)?

Also, if you are planning to deploy your model on STM32, you should check which topology is supported by the STM32Cube.AI.

Our community relies on fruitful exchanges and good quality content. You can thank and reward helpful and positive contributions by marking them as 'Accept as Solution'. When marking a solution, make sure it answers your original question or issue that you raised.

ST Employees that act as moderators have the right to accept the solution, judging by their expertise. This helps other community members identify useful discussions and refrain from raising the same question. If you notice any false behavior or abuse of the action, do not hesitate to 'Report Inappropriate Content'