Human action prediction from hand movement for human-robot collaboration

Graduate Thesis uoadl:2959112 184 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2021-08-02
Year:
2021
Author:
SOULOUNIAS NIKOLAOS
Supervisors info:
Μαρία Δαγιόγλου, Εθνικό Κέντρο Έρευνας Φυσικών Επιστημών (ΕΚΕΦΕ) «Δημόκριτος», Συνεργάτης Ερευνήτρια
Παναγιώτης Σταματόπουλος, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών (ΕΚΠΑ), Επίκουρος Καθηγητής
Original Title:
Human action prediction from hand movement for human-robot collaboration
Languages:
English
Translated title:
Human action prediction from hand movement for human-robot collaboration
Summary:
Human-Robot Collaboration can combine the speed and precision of a robot with the cognitive abilities of a human. In a Human-Robot Collaboration scenario, the human and the robot agent share the same workspace and coordinate their movements in order to achieve a common goal. Therefore, the robot’s ability to predict the action of the human before its completion is crucial to ensure safe and fluent collaboration between the agents. For example, in a production line the human agent’s actions involve grasping different objects. In this case we would like for the robot to be able to predict the object to be grasped.

This thesis focuses on the task of predicting the object to be grasped based solely on the human hand movement without considering the object’s properties. The core assumption of our work relied on behavioral findings that show that the hand gradually conforms to the properties of the target object throughout the entire grasping movement, not only during the end of it. Such results were obtained using body markers and a high-end motion capture system of multiple cameras. However, in a Human - Robot Collaboration set-up, the use of a single RGB-D camera would be desired. Unfortunately, this comes at the cost of an increased amount of noise in the hand pose information. The goal of this thesis was to explore whether it is possible to predict the size of an object based solely on human arm and hand kinematics obtained through a single RGB-D camera.

In our work, we used a dataset that included movements towards three objects with the same cubical shape, but different size. The dataset was collected using an RGB-D visual sensor. The OpenPose framework was chosen for the human arm and hand pose estimation. The keypoints detections were 2D. After the movements were preprocessed to filter out noisy frames, identify the grasping movement and select a part of the grasping movement, the kinematic features of this partial grasping movement were engineered. The features which were investigated were the fingertip aperture, the wrist coordinate, the wrist instantaneous speed and the wrist coordinates’ dispersion features. Following that, both “traditional” Machine Learning models and a Deep Learning model were used. The “traditional” Machine Learning models included Random Forest, Gradient Boosting, Extra Trees, Support Vector Machine and Gaussian Process, while the Deep Learning model was a simple Convolutional Neural Network. The models were evaluated for different kinematic feature combinations, different training set - cross-validation/testing set partition strategies and different movement completion percentages. The timing of the predictions and the performance of the best models was also evaluated for a simple HRC application using real robot data.

Due to the dataset’s size, the top-performing “traditional” Machine Learning models outperformed the Deep Learning model consistently. Furthermore, the models’ accuracy was higher when they were trained with movements of all the participants as opposed to when they predicted the movements of a participant who did not belong in the training set. For the former case the best accuracy was logged by the Extra Trees model and was 66.27%, 80.14%, 86.01%, 91.88% and 93.84% for 20%, 40%, 60%, 80% and 100% of the movement respectively. For the latter case the best accuracy was logged by the Support Vector Machine model and was 50.83%, 61.67%, 66.58%, 64.93% and 69.08% for the same movement completion percentages. The reason for this phenomenon was that a Machine Learning model was not able to generalize from the training set’s participants to the testing set’s participant, if the latter’s movement pattern was significantly different from the other participants’ movement patterns. Finally, the Machine Learning models learned to identify the small object as non-large and the large object as non-small. On the contrary, the medium class was mainly responsible for the suboptimal performance of the models.
Main subject category:
Technology - Computer science
Keywords:
Human-Robot Collaboration, Machine Learning, Computer Vision, Skeleton-based Human Action Prediction, Hand Skeletal Data, Kinematic Features, Prediction
Index:
Yes
Number of index pages:
11
Contains images:
Yes
Number of references:
67
Number of pages:
111
thesis_final.pdf (12 MB) Open in new window