Pose estimation is a process to identify how a human body and/or individual limbs are configured in a given scene. Hand pose estimation is an important research topic which has a variety of applications in human-computer interaction (HCI) scenarios, such as gesture recognition, animation synthesis and robot control. However, capturing the hand motion is quite a challenging task due to its high flexibility. Many sensor-based and vision-based methods have been proposed to fulfill the task.
In sensor-based systems, specialized hardware is used for hand motion capture. Generally, vision-based hand pose estimation methods can be divided into two categories: appearance-based methods and model-based methods. In appearance-based approaches, various features are extracted from the input images to estimate the hand pose. Usually a lot of training samples are used to train a mapping function from the features to the hand poses in advance. Given the learned mapping function, the hand pose can be estimated efficiently. In model-based approaches the hand pose is estimated by aligning a projected 3D hand model to the extracted hand features in the inputs. Therefore, the desired information to be provided includes state at any time. These methods require a lot of calculations which are not possible in practice to implement them immediately.
Hand pose estimation using (color/depth) images consist of three steps:
To extract necessary features for pose estimation, depending on used model and usage of hand gesture analysis, features such as fingertips position, number of fingers, palm position and joint angles are extracted.
In this paper a model-based markerless dynamic hand poses estimation scheme is presented. Motion Capture is the process of recording a live motion event and translating it into usable mathematical terms by tracking a number of key points in space over time and combining them to obtain a single 3D representation of the performance. The sequence of depth images, color images and skeleton data obtained from Kinect (a new tool for markerless motion capture) at 30 frames per second are as inputs of this scheme. The proposed scheme exploits both temporal and spatial features of the input sequences, and focuses on index and thumb fingertips localization and joint angles of the robot arm to mimic the user's arm movements in 3D space in an uncontrolled environment. The RoboTECH II ST240 is used as a real robot arm model. Depth and skeleton data are used to determine the angles of the robot joints. Three approaches to identify the tip of the thumb and index fingers are presented using existing data, each with its own limitations. In these approaches, concepts such as thresholding, edge detection, making convex hull, skin modeling and background subtraction are used. Finally, by comparing tracked trajectories of the user's wrist and robot end effector, the graphs show an error about 0.43 degree in average which is an appropriate performance in this research.
The key contribution of this work is hand pose estimation per every input frame and updating arm robot according to estimated pose. Thumb and index fingertips detection as part of feature vector resulted using presented approaches. User movements transmit to the corresponding Move instruction for robot. Necessary features for Move instruction are rotation values around joints in different directions and opening value of index and thumb fingers at each other.
Rights and permissions | |
![]() |
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. |