Interactive Topic Modeling Curation Platform

Postgraduate Thesis uoadl:2884484 246 Read counter

Unit:
Κατεύθυνση / ειδίκευση Διαχείριση Πληροφορίας και Δεδομένων (ΔΕΔ)
Πληροφορική
Deposit date:
2019-11-01
Year:
2019
Author:
Koulalis Antonios
Supervisors info:
Ιωάννης Ιωαννίδης, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
Original Title:
Περιβάλλον Επιμέλειας των αποτελεσμάτων της Αυτόματης Θεματικής Κατηγοριοποίησης Κειμένων
Languages:
English
Greek
Translated title:
Interactive Topic Modeling Curation Platform
Summary:
Topic Modelling is a machine learning algorithm used in order to classify a large database of documents into categories each characterised by a set of words. These, so-called topics need to be manually curated in order to be more easily understandable. The contribution of curators, who undertake to make readable the output of the various Topic Modelling methods, are therefore required. This process is called data curation and is quite a laborious and time consuming process. Curator is required to study huge volumes of data, usually written in spreadsheets, before proceeding with their evaluation, correction and general curation. Therefore this process can take quite a long time for a curator. In addition, the lack of several curators increases the amount of work, that each of them is required to undertake. The absence of a tool that will reduce their working time and give them access to other data curations that would be useful to them, makes their work even more difficult. Long time work causes curator’s fatigue, increases the likelihood of errors, thereby reducing the quality of his results. In this thesis, we present a user-friendly web application that can address the above problems and facilitate the work of curators. In addition to exploring the data, the user is able to evaluate, edit and generally curate them. Actions such as categorization, merging, splitting, data labeling and many more will now be able to execute them easily and quickly through our application. Rich visualizations come to complement the above functionalities and together make up a tool essential for data curators.
Main subject category:
Technology - Computer science
Keywords:
data evaluation, data exploration, evaluation of statistical model data, scientific web applications, interactive visualizations, topic modeling curation
Index:
Yes
Number of index pages:
4
Contains images:
Yes
Number of references:
33
Number of pages:
62
Koulalhs_EKPA_diplwmatikh_2019.pdf (3 MB) Open in new window