Περίληψη:
Micro-blogging and social-media platforms are now prominent forums for
disseminating information, opinions and commentaries. Among these,
Twitter enjoys an in-excess of 330M base of users who continually
produce and consume information snippets. Users collectively create a
voluminous and multilingual corpus in a very broad range of topics on a
daily basis. The discourse generated in the blogosphere is often of
prime interest and importance to individuals, organizations, and
companies. These actors would certainly like to periodically receive an
overall assessment of demonstrated “sentiments” on specific issues
by automatically classifying tweets expressed in different languages in
conjunction with big-data analytics. In this paper, we propose a
scalable service platform that employs multilingual sentiment analysis
to classify streamed-tweets and yields analytics for selected topics in
real-time. We discuss the main component of our Spark-enabled platform
as we seek to offer an effective big-data service that can: 1)
dynamically handle voluminous as well as high-rate tweet traffic through
a multicomponent application exploiting the latest software
developments, 2) accurately identify messages originated by non-genuine
user-accounts, and 3) utilize the Spark machine-learning library (MLib)
to successfully classify streamed multi-lingual messages in real-time,
using multiple potentially distributed executors. To empower our service
platform, we have adopted training sets and developed sentiment analysis
(SA) models for English, French, and Greek that help classify streamed
tweets with high accuracy. While experimenting with our distributed
analytical platform, we establish both accurate and real-time
classification for tweets expressed in the above European languages.
Συγγραφείς:
Karageorgou, Ioanna
Liakos, Panagiotis
Delis, Alex