Advanced search
1 file | 315.21 KB Add to list

DBBErt : part-of-speech tagging of pre-modern Greek text

Colin Swaelens (UGent) , Ilse De Vos (UGent) and Els Lefever (UGent)
Author
Organization
Project
Abstract
This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.

Downloads

  • CLARIN2023 ConferenceProceedings DBBErt.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 315.21 KB

Citation

Please use this url to cite or link to this publication:

MLA
Swaelens, Colin, et al. “DBBErt : Part-of-Speech Tagging of Pre-Modern Greek Text.” CLARIN Annual Conference Proceedings, edited by Krister Lindén et al., 2023, pp. 155–58.
APA
Swaelens, C., De Vos, I., & Lefever, E. (2023). DBBErt : part-of-speech tagging of pre-modern Greek text. In K. Lindén, J. Niemi, & T. Kontino (Eds.), CLARIN Annual Conference Proceedings (pp. 155–158).
Chicago author-date
Swaelens, Colin, Ilse De Vos, and Els Lefever. 2023. “DBBErt : Part-of-Speech Tagging of Pre-Modern Greek Text.” In CLARIN Annual Conference Proceedings, edited by Krister Lindén, Jyrki Niemi, and Thalassia Kontino, 155–58.
Chicago author-date (all authors)
Swaelens, Colin, Ilse De Vos, and Els Lefever. 2023. “DBBErt : Part-of-Speech Tagging of Pre-Modern Greek Text.” In CLARIN Annual Conference Proceedings, ed by. Krister Lindén, Jyrki Niemi, and Thalassia Kontino, 155–158.
Vancouver
1.
Swaelens C, De Vos I, Lefever E. DBBErt : part-of-speech tagging of pre-modern Greek text. In: Lindén K, Niemi J, Kontino T, editors. CLARIN Annual Conference Proceedings. 2023. p. 155–8.
IEEE
[1]
C. Swaelens, I. De Vos, and E. Lefever, “DBBErt : part-of-speech tagging of pre-modern Greek text,” in CLARIN Annual Conference Proceedings, Leuven, Belgium, 2023, pp. 155–158.
@inproceedings{01HCZ1CC2Z2J3W68CA430RSFQH,
  abstract     = {{This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.}},
  author       = {{Swaelens, Colin and De Vos, Ilse and Lefever, Els}},
  booktitle    = {{CLARIN Annual Conference Proceedings}},
  editor       = {{Lindén, Krister and Niemi, Jyrki and Kontino, Thalassia}},
  issn         = {{2773-2177}},
  language     = {{eng}},
  location     = {{Leuven, Belgium}},
  pages        = {{155--158}},
  title        = {{DBBErt : part-of-speech tagging of pre-modern Greek text}},
  year         = {{2023}},
}