Summary:
In this thesis I present a web-based, visual analytics tool called Clusterix to
support clustering tasks by users, by having analysts at the center of the
workflow. Clusterix provides the facilities to:
- Load and preview CSV data;
- select columns to be used by the clustering algorithm and modify weights;
- select and run one or more clustering algorithms (K-Means, Hierarchical
Clustering) with varying parameters;
- view and interact with the results in a browser environment;
- modify the parameters or input data to correct the clustering output.
Such an iterative, visual analytics approach allows users to quickly determine
the best clustering algorithm and parameters for their data, and to correct any
errors in the clustering output. Clusterix has been applied to the clustering
of heterogeneous data sets, in particular to the clustering of author
affiliations in publications, for a recommendation system on InspireHEP, the
largest High Energy Physics library in the world, based at CERN.
Keywords:
visualization, clustering, machine learning, diagram, data analysis