Incorporating Trainable Filterbanks in Deep Neural Networks for Music Transcription

Graduate Thesis uoadl:3395209 37 Read counter

Unit:
Department of Informatics and Telecommunications
Πληροφορική
Deposit date:
2024-04-02
Year:
2024
Author:
PRIMENTA AIKATERINI-MARIA
Supervisors info:
Ιωάννης Παναγάκης, Αναπληρωτής Καθηγητής, Τμήμα Πληροφορικής και Τηλεπικοινωνιών, Εθνικό και Καποδιστριακό Πανεπιστήμιο Αθηνών
Original Title:
Incorporating Trainable Filterbanks in Deep Neural Networks for Music Transcription
Languages:
English
Translated title:
Incorporating Trainable Filterbanks in Deep Neural Networks for Music Transcription
Summary:
In recent years, Automatic Music Transcription, the process of converting audio
recordings into symbolic representations without the human intervention, has witnessed
significant advancements and has been applied across various domains in the music
field. Many existing approaches utilize Deep Neural Networks and rely on learning their
input features directly from representations like log-mel spectrograms. This leads to
challenges such as a high number of trainable parameters, limited adaptability and slow
convergence. In this thesis, we tackle these challenges by proposing a new method to
enhance piano transcription systems through the incorporation of trainable filterbanks for
feature extraction. Drawing inspiration from SincNet, a Convolutional Neural Network
architecture that implements parameterized sinc-based filterbanks, we aim to improve
the accuracy and efficiency of an existing high-resolution piano transcription system. Our
proposed framework achieves an Average Precision Score of 89%, which is comparable
to but lower than that of the original method. However, it outperforms the original method
in terms of the accuracy of onset and offset detections. The implementation of our
proposed method is available at
https://github.com/marikaitiprim/MusicTranscription-BScThesis.
Main subject category:
Technology - Computer science
Keywords:
Automatic Piano Transcription, Audio Signal Processing, Deep Neural Networks, Filterbanks, Log-Mel Spectrogram
Index:
Yes
Number of index pages:
4
Contains images:
Yes
Number of references:
44
Number of pages:
36
bsc-thesis_PRIMENTA.pdf (1 MB) Open in new window