Pergamos - Library and Information Center of National and Kapodistrian University of Athens

Unit:

Κατεύθυνση Μεγάλα Δεδομένα και Τεχνητή Νοημοσύνη
Πληροφορική

Deposit date:

2020-09-10

Year:

2020

Author:

Nikolaidou Konstantina

Supervisors info:

Βασίλης Κατσούρος, Διευθυντής Ερευνών, Ερευνητικό Κέντρο ”Αθηνά”

Original Title:

Image Informed Neural Machine Translation

Languages:

English
Greek

Translated title:

Image Informed Neural Machine Translation

Summary:

Neural Machine Translation is one of the most important tasks in Natural Language Processing. Complex architectures have achieved incredible performances using solely text as input data. The vast amount of data and their accessibility has increased the need in exploiting them in order to achieve better and more human-like results. Multimodal Machine Translation uses additional modalities like images and speech in order to ground and enhance the task of translation assuming that they contain alternative representations of the input data. We focus our work in investigating several variations of the same system to integrate visual features in a neural machine translation system. To this end, we use the state-of-the-art Transformer architecture. We are performing the task for three different target languages: French, German and Czech, using English as our source language. We use the Multi30Kdataset, a large-scale multilingual multimodal dataset publicly available for the task of multimodal machine translation and cross-lingual captioning. We evaluate the results on three different test sets and compute and compare the systems through the BLEU, METEOR and TER scores. We further investigate whether the additional modalities are used by masking color words in the source sets. The examined systems seem to perform similarly with small differences in their performance, with the multimodal ones giving more natural translations in many cases. Investigation on the predicted texts revealed several inconsistent parts and biases in the dataset. Results of the mask predictions show that it is very likely that systems indeed use the additional modalities when text is not sufficient, while the text model outputs predictions based on training biases.

Main subject category:

Science

Keywords:

Multimodality, Neural Networks, Neural Machine Translation

Index:

Yes

Number of index pages:

Contains images:

Yes

Number of references:

Number of pages:

File: