Sentiment Analysis and Text Mining in TripAdvisor reviews

Postgraduate Thesis uoadl:2836933 32 Read counter

Κατεύθυνση Ψηφιακά Μέσα Επικοινωνίας και Περιβάλλοντα Αλληλεπίδρασης
Βιβλιοθήκη Πολιτικής Επιστήμης και Δημόσιας Διοίκησης - Επικοινωνίας και Μέσων Μαζικής Ενημέρωσης - Τουρκικών Σπουδών και Σύγχρονων Ασιατικών Σπουδών
Deposit date:
Papadopoulou Evangelia
Supervisors info:
Κωνσταντίνος Μουρλάς, Επίκουρος Καθηγητής, Τμήμα Επικοινωνίας και Μέσων Μαζικής Ενημέρωσης, ΕΚΠΑ
Original Title:
Ανάλυση Συναισθήματος και Εξόρυξη Κειμένου σε αξιολογήσεις στο TripAdvisor
Translated title:
Sentiment Analysis and Text Mining in TripAdvisor reviews
The subject of this thesis is the analysis of sentiment in data on the social platform of TripAdvisor with the non-supervised lexical based learning method. Sentiment analysis, also called opinion mining, is the field of science that analyzes opinions, feelings, assessments, attitudes, feelings to entities, and features that are expressed in written text. Recent years have shown rapid growth as a research field, due to the great influence of social networks and social media platforms in our everyday lives, and because it provides an automated way to analyze the written information that abounds in online sources.
Recognition of sentiment occurs in two categories, positive and negative emotions, and for the needs of work, a total of about 49,748 evaluations from the users of TripAdvisor on five key tourist spots in Athens are used.
The first part of the analysis refers to the data recovery through the use of scraping and after pre-processing the data for all the specific terms encountered in them and can make their analysis difficult, while at the same time various ways of extracting information from the data and in particular the classical Bag-of-Words method with the term frequency and tf-idf (term frequency - inverse document frequency) variants and the vector representations of words called word vectors using the python language, the pandas library and the use of NRC-Emotion Lexicon. The above ideas are all evaluated in the dataset.
The work concludes that speech-based emotional analysis techniques respond very well to the problem by providing quick implementations and reliable performance.
Following different methods: (a) the frequency of use of individual words using the TF-IDF method, (b) the appearance of selected polarity of words using a dictionary, and (c) technical K-means Clustering, producing significant information on the users' view of these five tourist spots in Athens.
The results of our study show that we can classify reviews written in English based on their emotional polarity in a very effective way. This research can help minimize the time needed to search for relevant information as well as help in the development of tourism in Athens and Greece in general.
Main subject category:
Social, Political and Economic sciences
Other subject categories:
Technology - Computer science
Sentiment Analysis, Text Mining, TripAdvisor
Number of index pages:
Contains images:
Number of references:
Number of pages: