Ghent University Academic Bibliography

Advanced

Integrated sequence tagging for medieval Latin using deep representation learning

Mike Kestemont and Jeroen De Gussem UGent (2017) JOURNAL OF DATA MINING & DIGITAL HUMANITIES.
abstract
In this paper we consider two sequence tagging tasks for medieval Latin: part-of-speech tagging and lemmatization. These are both basic, yet foundational preprocessing steps in applications such as text re-use detection. Nevertheless, they are generally complicated by the considerable orthographic variation which is typical of medieval Latin. In Digital Classics, these tasks are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion. For example, a lexicon is used to generate all the potential lemma-tag pairs for a token, and next, a context-aware PoS-tagger is used to select the most appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items, error percolation is a major downside of such approaches. In this paper we explore the possibility to elegantly solve these tasks using a single, integrated approach. For this, we make use of a layered neural network architecture from the field of deep representation learning.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
in press
journal title
JOURNAL OF DATA MINING & DIGITAL HUMANITIES
issue title
Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages
article number
3835
ISSN
2416-5999
language
English
UGent publication?
yes
classification
A2
id
8528273
handle
http://hdl.handle.net/1854/LU-8528273
date created
2017-08-06 18:57:08
date last changed
2018-01-25 09:27:49
@article{8528273,
  abstract     = {In this paper we consider two sequence tagging tasks for medieval Latin:
part-of-speech tagging and lemmatization. These are both basic, yet
foundational preprocessing steps in applications such as text re-use detection.
Nevertheless, they are generally complicated by the considerable orthographic
variation which is typical of medieval Latin. In Digital Classics, these tasks
are traditionally solved in a (i) cascaded and (ii) lexicon-dependent fashion.
For example, a lexicon is used to generate all the potential lemma-tag pairs
for a token, and next, a context-aware PoS-tagger is used to select the most
appropriate tag-lemma pair. Apart from the problems with out-of-lexicon items,
error percolation is a major downside of such approaches. In this paper we
explore the possibility to elegantly solve these tasks using a single,
integrated approach. For this, we make use of a layered neural network
architecture from the field of deep representation learning.},
  articleno    = {3835},
  author       = {Kestemont, Mike and De Gussem, Jeroen},
  issn         = {2416-5999},
  journal      = {JOURNAL OF DATA MINING \& DIGITAL HUMANITIES},
  language     = {eng},
  title        = {Integrated sequence tagging for medieval Latin using deep representation learning},
  year         = {2017},
}

Chicago
Kestemont, Mike, and Jeroen De Gussem. 2017. “Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning.” Journal of Data Mining & Digital Humanities.
APA
Kestemont, Mike, & De Gussem, J. (2017). Integrated sequence tagging for medieval Latin using deep representation learning. JOURNAL OF DATA MINING & DIGITAL HUMANITIES.
Vancouver
1.
Kestemont M, De Gussem J. Integrated sequence tagging for medieval Latin using deep representation learning. JOURNAL OF DATA MINING & DIGITAL HUMANITIES. 2017;
MLA
Kestemont, Mike, and Jeroen De Gussem. “Integrated Sequence Tagging for Medieval Latin Using Deep Representation Learning.” JOURNAL OF DATA MINING & DIGITAL HUMANITIES (2017): n. pag. Print.