Advanced search
1 file | 601.52 KB Add to list

Predicting syntactic equivalence between source and target sentences

Bram Vanroy (UGent) , Arda Tezcan (UGent) and Lieve Macken (UGent)
Author
Organization
Abstract
The translation difficulty of a text is influenced by many different factors. Some of these are specific to the source text and related to readability while others more directly involve translation and the relation between the source and the target text. One such factor is syntactic equivalence, which can be calculated on the basis of a source sentence and its translation. When the expected syntactic form of the target sentence is dissimilar to its source, translating said source sentence proves more difficult for a translator. The degree of syntactic equivalence between a word-aligned source and target sentence can be derived from the crossing alignment links, averaged by the number of alignments, either at word or at sequence level. However, when predicting the translatability of a source sentence, its translation is not available. Therefore, we train machine learning systems on a parallel English-Dutch corpus to predict the expected syntactic equivalence of an English source sentence without having access to its Dutch translation. We use traditional machine learning systems (Random Forest Regression and Support Vector Regression) combined with syntactic sentence-level features as well as recurrent neural networks that utilise word embeddings and accurate morpho-syntactic features.
Keywords
computational linguistics, natural language processing, translatability, translation studies, translation difficulty, deep learning, lt3

Downloads

  • Vanroy et al - 2019 - Predicting syntactic equivalence between source and target sentences.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 601.52 KB

Citation

Please use this url to cite or link to this publication:

MLA
Vanroy, Bram, et al. “Predicting Syntactic Equivalence between Source and Target Sentences.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 9, 2019, pp. 101–16.
APA
Vanroy, B., Tezcan, A., & Macken, L. (2019). Predicting syntactic equivalence between source and target sentences. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 9, 101–116.
Chicago author-date
Vanroy, Bram, Arda Tezcan, and Lieve Macken. 2019. “Predicting Syntactic Equivalence between Source and Target Sentences.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 9: 101–16.
Chicago author-date (all authors)
Vanroy, Bram, Arda Tezcan, and Lieve Macken. 2019. “Predicting Syntactic Equivalence between Source and Target Sentences.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 9: 101–116.
Vancouver
1.
Vanroy B, Tezcan A, Macken L. Predicting syntactic equivalence between source and target sentences. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2019;9:101–16.
IEEE
[1]
B. Vanroy, A. Tezcan, and L. Macken, “Predicting syntactic equivalence between source and target sentences,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 9, pp. 101–116, 2019.
@article{8639492,
  abstract     = {{The translation difficulty of a text is influenced by many different factors. Some of these are specific to the source text and related to readability while others more directly involve translation and the relation between the source and the target text. One such factor is syntactic equivalence, which can be calculated on the basis of a source sentence and its translation. When the expected syntactic form of the target sentence is dissimilar to its source, translating said source sentence proves more difficult for a translator. The degree of syntactic equivalence between a word-aligned source and target sentence can be derived from the crossing alignment links, averaged by the number of alignments, either at word or at sequence level. However, when predicting the translatability of a source sentence, its translation is not available. Therefore, we train machine learning systems on a parallel English-Dutch corpus to predict the expected syntactic equivalence of an English source sentence without having access to its Dutch translation. We use traditional machine learning systems (Random Forest Regression and Support Vector Regression) combined with syntactic sentence-level features as well as recurrent neural networks that utilise word embeddings and accurate morpho-syntactic features.}},
  author       = {{Vanroy, Bram and Tezcan, Arda and Macken, Lieve}},
  issn         = {{2211-4009}},
  journal      = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}},
  keywords     = {{computational linguistics,natural language processing,translatability,translation studies,translation difficulty,deep learning,lt3}},
  language     = {{eng}},
  pages        = {{101--116}},
  title        = {{Predicting syntactic equivalence between source and target sentences}},
  url          = {{https://clinjournal.org/clinj/article/view/95}},
  volume       = {{9}},
  year         = {{2019}},
}