Advanced search

Project: ArisToCAT - Assessing The Comprehensibility of Automatic Translations

2017-01-01 – 2020-12-31


Machine translation systems cannot guarantee that the text they produce will be fluent and coherent in both syntax and semantics. Erroneous words and syntax occur frequently in machinetranslated text, leaving the reader to guess parts of the intended message. This project (i) analyzes eye movement data to investigate to what extent the lack of predictability in texts that were created by MT impairs comprehension, and (ii) tries to automatically estimate the comprehensibility of machine-translated text. To tackle the first research objective, we will collect and analyze eye movements of participants reading Dutch machine-translated text. In a first experiment we investigate the impact of different categories of MT errors (syntactic versus semantic, function words versus content words, shortdistance versus long-distance triggers of errors) on comprehension. In a second experiment, the participants read six short machine-translated texts of approximately 300-400 words for comprehension. To tackle the second research objective, an MT comprehensibility estimation system for Dutch will be built. The system takes as input a machine-translated sentence and tries to detect the MT errors that seriously hamper comprehension. We start off with a basic system incorporating baseline features such as sentence length and word frequency and gradually add features derived from language models with increasing complexity, namely n-gram, dependency and neural language models.