DBBErt : part-of-speech tagging of pre-modern Greek text
- Author
- Colin Swaelens (UGent) , Ilse De Vos (UGent) and Els Lefever (UGent)
- Organization
- Project
- Abstract
- This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.
Downloads
-
CLARIN2023 ConferenceProceedings DBBErt.pdf
- full text (Published version)
- |
- open access
- |
- |
- 315.21 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01HCZ1CC2Z2J3W68CA430RSFQH
- MLA
- Swaelens, Colin, et al. “DBBErt : Part-of-Speech Tagging of Pre-Modern Greek Text.” CLARIN Annual Conference Proceedings, edited by Krister Lindén et al., 2023, pp. 155–58.
- APA
- Swaelens, C., De Vos, I., & Lefever, E. (2023). DBBErt : part-of-speech tagging of pre-modern Greek text. In K. Lindén, J. Niemi, & T. Kontino (Eds.), CLARIN Annual Conference Proceedings (pp. 155–158).
- Chicago author-date
- Swaelens, Colin, Ilse De Vos, and Els Lefever. 2023. “DBBErt : Part-of-Speech Tagging of Pre-Modern Greek Text.” In CLARIN Annual Conference Proceedings, edited by Krister Lindén, Jyrki Niemi, and Thalassia Kontino, 155–58.
- Chicago author-date (all authors)
- Swaelens, Colin, Ilse De Vos, and Els Lefever. 2023. “DBBErt : Part-of-Speech Tagging of Pre-Modern Greek Text.” In CLARIN Annual Conference Proceedings, ed by. Krister Lindén, Jyrki Niemi, and Thalassia Kontino, 155–158.
- Vancouver
- 1.Swaelens C, De Vos I, Lefever E. DBBErt : part-of-speech tagging of pre-modern Greek text. In: Lindén K, Niemi J, Kontino T, editors. CLARIN Annual Conference Proceedings. 2023. p. 155–8.
- IEEE
- [1]C. Swaelens, I. De Vos, and E. Lefever, “DBBErt : part-of-speech tagging of pre-modern Greek text,” in CLARIN Annual Conference Proceedings, Leuven, Belgium, 2023, pp. 155–158.
@inproceedings{01HCZ1CC2Z2J3W68CA430RSFQH, abstract = {{This contribution presents DBBErt, a machine-learning approach to linguistic annotation for pre-Modern Greek, which provides a part-of-speech and fine-grained morphological analysis of Greek tokens. To this end, transformer-based language models were built on both pre-Modern and Modern Greek text and further fine-tuned on annotated treebanks. The experimental results look very promising on a gold standard of Byzantine book epigrams, with an F-score of 83% for coarse-grained part-of-speech-tagging and of 69% for fine-grained morphological analysis. The resulting pipeline and models will be added to the CLARIN infrastructure to stimulate further research in NLP for Ancient and Medieval Greek.}}, author = {{Swaelens, Colin and De Vos, Ilse and Lefever, Els}}, booktitle = {{CLARIN Annual Conference Proceedings}}, editor = {{Lindén, Krister and Niemi, Jyrki and Kontino, Thalassia}}, issn = {{2773-2177}}, language = {{eng}}, location = {{Leuven, Belgium}}, pages = {{155--158}}, title = {{DBBErt : part-of-speech tagging of pre-modern Greek text}}, year = {{2023}}, }