Mapping of skeleton keypoints to avatar motions in signing space

Postgraduate Thesis uoadl:3396590 19 Read counter

Unit:
Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική
Deposit date:
2024-04-16
Year:
2024
Author:
Karamanidis Dimitrios
Supervisors info:
Χαρίλαος Παπαγεωργίου, Διευθυντής Έρευνας, Ε.Κ. ΑΘΗΝΑ
Original Title:
Mapping of skeleton keypoints to avatar motions in signing space
Languages:
English
Translated title:
Mapping of skeleton keypoints to avatar motions in signing space
Summary:
Sign Language constitutes the primary means of communication for the deaf and hard-
of-hearing individuals. Sign Language Representation is a complex task, which involves
human labor-intensive processes. To address this challenge, we propose an automated
method that maps skeleton keypoints to avatar motions by leveraging advanced deep
learning approaches. This mapping can be achieved by extracting accurate 3d body joints
coordinates from monocular videos using state-of-the-art human pose estimation
algorithms. In our study, we investigate certain approaches which detect the 2D body
joints in videos and subsequently convert them into 3D space, evaluated on a small
synthetic dataset of five videos, featuring the Paula avatar. Our work focuses on arm
motions, emphasizing on keypoints related to shoulders, elbows, and wrists,
acknowledging the significance of their movements in sign language understanding. Due
to the training of evaluated methods on generic dataset rather than those specific to sign
language, we had to make certain adjustments to ensure the accordance of skeleton
keypoints. We provide a comprehensive analysis of the benefits and drawbacks of each
method and report special patterns of performance on different axes. Notably, the
approach, which uses the BlazePose of Mediapipe as the 2D detector and the
VideoPose3D for 3D reconstruction, outperforms its competitors, achieving an average
Mean Per Joint Position Error (MPJPE) of 72.2 mm.
Main subject category:
Technology - Computer science
Keywords:
human pose estimation, 3D reconstruction, sign language representation, avatar motion
Index:
Yes
Number of index pages:
3
Contains images:
Yes
Number of references:
57
Number of pages:
55
thesis_ds1200004.pdf (3 MB) Open in new window