Περίληψη:
Monitoring and analysis of human sentiments is currently one of the
hottest research topics in the field of human-computer interaction,
having many applications. However, in order to become practical in daily
life, sentiment recognition techniques should analyze data collected in
an unobtrusive way. For this reason, analyzing audio signals of human
speech (as opposed to say biometrics) is considered key to potential
emotion recognition systems. In this work, we expand upon previous
efforts to analyze speech signals using computer vision techniques on
their spectrograms. In particular, we utilize ORB descriptors on
keypoints distributed on a regular grid over the spectrogram to obtain
an intermediate representation. Firstly, a technique similar to
Bag-of-Visual-Words (BoVW) is used, where a visual vocabulary is created
by clustering keypoint descriptors, but instead a soft candidacy score
is used to construct the histogram descriptors of the signal.
Furthermore, a technique which takes into account the temporal structure
of the spectrograms is examined, allowing for effective model
regularization. Both of these techniques are evaluated in several
popular emotion recognition datasets, with results indicating an
improvement over the simple BoVW method.
Συγγραφείς:
Pikramenos, George
Smyrnis, Georgios
Vernikos, Ioanrtis and
Konidaris, Thomas
Spyrou, Evaggelos
Perantonis, Stavros