Supervisors info:
Ιωάννης Ιωαννίδης, Καθηγητής, Πληροφορικής και Τηλεπικοινωνιών, ΕΚΠΑ
Summary:
Topic Modelling is a machine learning algorithm used in order to classify a large database of documents into categories each characterised by a set of words. These, so-called topics need to be manually curated in order to be more easily understandable. The contribution of curators, who undertake to make readable the output of the various Topic Modelling methods, are therefore required. This process is called data curation and is quite a laborious and time consuming process. Curator is required to study huge volumes of data, usually written in spreadsheets, before proceeding with their evaluation, correction and general curation. Therefore this process can take quite a long time for a curator. In addition, the lack of several curators increases the amount of work, that each of them is required to undertake. The absence of a tool that will reduce their working time and give them access to other data curations that would be useful to them, makes their work even more difficult. Long time work causes curator’s fatigue, increases the likelihood of errors, thereby reducing the quality of his results. In this thesis, we present a user-friendly web application that can address the above problems and facilitate the work of curators. In addition to exploring the data, the user is able to evaluate, edit and generally curate them. Actions such as categorization, merging, splitting, data labeling and many more will now be able to execute them easily and quickly through our application. Rich visualizations come to complement the above functionalities and together make up a tool essential for data curators.
Keywords:
data evaluation, data exploration, evaluation of statistical model data, scientific web applications, interactive visualizations, topic modeling curation