Integrated sequence tagging for medieval Latin using deep representation learning
- Author
- Mike Kestemont and Jeroen De Gussem (UGent)
- Organization
- Abstract
- In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.
- Keywords
- Computer Science - Computation and Language, Computer Science - Learning, Statistics - Machine Learning
Downloads
-
1603.01597.pdf
- full text (Published version)
- |
- open access
- |
- |
- 910.16 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8528273
- MLA
- Kestemont, Mike, and Jeroen De Gussem. “Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning.” JOURNAL OF DATA MINING AND DIGITAL HUMANITIES, 2017, doi:10.46298/jdmdh.1398.
- APA
- Kestemont, M., & De Gussem, J. (2017). Integrated sequence tagging for medieval Latin using deep representation learning. JOURNAL OF DATA MINING AND DIGITAL HUMANITIES. https://doi.org/10.46298/jdmdh.1398
- Chicago author-date
- Kestemont, Mike, and Jeroen De Gussem. 2017. “Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning.” JOURNAL OF DATA MINING AND DIGITAL HUMANITIES. https://doi.org/10.46298/jdmdh.1398.
- Chicago author-date (all authors)
- Kestemont, Mike, and Jeroen De Gussem. 2017. “Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning.” JOURNAL OF DATA MINING AND DIGITAL HUMANITIES. doi:10.46298/jdmdh.1398.
- Vancouver
- 1.Kestemont M, De Gussem J. Integrated sequence tagging for medieval Latin using deep representation learning. JOURNAL OF DATA MINING AND DIGITAL HUMANITIES. 2017;
- IEEE
- [1]M. Kestemont and J. De Gussem, “Integrated sequence tagging for medieval Latin using deep representation learning,” JOURNAL OF DATA MINING AND DIGITAL HUMANITIES, 2017.
@article{8528273, abstract = {{In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.}}, articleno = {{jdmdh.1398}}, author = {{Kestemont, Mike and De Gussem, Jeroen}}, issn = {{2416-5999}}, journal = {{JOURNAL OF DATA MINING AND DIGITAL HUMANITIES}}, keywords = {{Computer Science - Computation and Language,Computer Science - Learning,Statistics - Machine Learning}}, language = {{eng}}, pages = {{17}}, title = {{Integrated sequence tagging for medieval Latin using deep representation learning}}, url = {{http://doi.org/10.46298/jdmdh.1398}}, year = {{2017}}, }
- Altmetric
- View in Altmetric