Advanced search
1 file | 403.36 KB Add to list

An automatic part-of-speech tagger for Middle Low German

Mariya Koleva (UGent) , Melissa Farasyn (UGent) , Bart Desmet (UGent) , Anne Breitbarth (UGent) and Veronique Hoste (UGent)
Author
Organization
Project
Abstract
Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.
Keywords
historical linguistics, part-of-speech tagging, conditional random fields, feature selection, normalization, LT3

Downloads

  • KolevaEtAl. finalised 16.03.17.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 403.36 KB

Citation

Please use this url to cite or link to this publication:

MLA
Koleva, Mariya, et al. “An Automatic Part-of-Speech Tagger for Middle Low German.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, vol. 22, no. 1, 2017, pp. 108–41, doi:10.1075/ijcl.22.1.05kol.
APA
Koleva, M., Farasyn, M., Desmet, B., Breitbarth, A., & Hoste, V. (2017). An automatic part-of-speech tagger for Middle Low German. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 22(1), 108–141. https://doi.org/10.1075/ijcl.22.1.05kol
Chicago author-date
Koleva, Mariya, Melissa Farasyn, Bart Desmet, Anne Breitbarth, and Veronique Hoste. 2017. “An Automatic Part-of-Speech Tagger for Middle Low German.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS 22 (1): 108–41. https://doi.org/10.1075/ijcl.22.1.05kol.
Chicago author-date (all authors)
Koleva, Mariya, Melissa Farasyn, Bart Desmet, Anne Breitbarth, and Veronique Hoste. 2017. “An Automatic Part-of-Speech Tagger for Middle Low German.” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS 22 (1): 108–141. doi:10.1075/ijcl.22.1.05kol.
Vancouver
1.
Koleva M, Farasyn M, Desmet B, Breitbarth A, Hoste V. An automatic part-of-speech tagger for Middle Low German. INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS. 2017;22(1):108–41.
IEEE
[1]
M. Koleva, M. Farasyn, B. Desmet, A. Breitbarth, and V. Hoste, “An automatic part-of-speech tagger for Middle Low German,” INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, vol. 22, no. 1, pp. 108–141, 2017.
@article{8516445,
  abstract     = {{Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.}},
  author       = {{Koleva, Mariya and Farasyn, Melissa and Desmet, Bart and Breitbarth, Anne and Hoste, Veronique}},
  issn         = {{1384-6655}},
  journal      = {{INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS}},
  keywords     = {{historical linguistics,part-of-speech tagging,conditional random fields,feature selection,normalization,LT3}},
  language     = {{eng}},
  number       = {{1}},
  pages        = {{108--141}},
  title        = {{An automatic part-of-speech tagger for Middle Low German}},
  url          = {{http://doi.org/10.1075/ijcl.22.1.05kol}},
  volume       = {{22}},
  year         = {{2017}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: