American Sign Language Recognition via Sensor glove data analysis with deep learning - An ARM Implementation

Postgraduate Thesis uoadl:3239312 61 Read counter

Unit:
Κατεύθυνση Τεχνολογίας Ολοκληρωμένων Κυκλωμάτων
Πληροφορική
Deposit date:
2022-10-25
Year:
2022
Author:
Barmpakos Theodoros
Supervisors info:
Μανωλάκος Ηλίας, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
American Sign Language Recognition via Sensor glove data analysis with deep learning - An ARM Implementation
Languages:
English
Translated title:
American Sign Language Recognition via Sensor glove data analysis with deep learning - An ARM Implementation
Summary:
Deep Learning (DL), and especially Convolutional Neural Networks (CNNs), have been widely used to solve a large variety of problems in computer vision, including Sign Language Recognition (SLR). There have been many efforts towards designing systems using cameras that can translate signer gestures to text or even speech. However, these systems are very sensitive to factors such as light intensity, background color and motion occlusion etc.

In this thesis, we present the design of a simple end­to­end embedded system that translates continuous American Sign Language (ASL) into text based on inputs received from an instrumented low­cost sensor glove that we have created using flex sensors and an IMU device. To prove the concept, we first generated a limited dataset of 20 random ASL sentences using a 20­word vocabulary where we manually pre­label the time series data into 21 classes and simultaneously separate gesture and non­gesture (transition class) movement periods, by using an external button. Subsequently, a sliding window technique was used to extract overlapping labeled samples (time windows) for continuous SLR. After standardization, the data samples are fed to a simple 3­layer 1D CNN (conv1d ­ conv1d ­ fully connected) for classification.

Convolutional layers are useful for automated feature extraction and fully connected for classification. Our CNN achieves 93.40% accuracy on the test set (unseen data). For all practical purposes, the accuracy is actually 100% as vocabulary gestures are not confused for each other, and errors occur only in the transition from a gesture to a non­gesture transition movement window and vice versa. The CNN was trained and its hyperparameters tuned using the ATOM python­based framework. Its accuracy was compared and found to be slightly higher than that of other popular machine learning methods, such as the Random Forests, Support Vector Machines, and Extreme Gradient Boosted Trees (XGBoost).

Finally, we have developed an all­software implementation of the designed CNN (inference part) for the ARM Cortex A9 processor on the Zybo development board. Using the Xilinx SDK and Eigen library, we managed to design a real­time embedded system that can achieve an operating frequency much higher than the sampling frequency. Optimization, training, and testing of the CNN were performed on a PC using ATOM and the Keras library with a Tensorflow­GPU backend.
Main subject category:
Technology - Computer science
Keywords:
SLR, sensor glove, ARM, CNN, Machine Learning
Index:
Yes
Number of index pages:
5
Contains images:
Yes
Number of references:
73
Number of pages:
111
thesis_2022_10_pergamos.pdf (3 MB) Open in new window