- Author
- Colin Swaelens (UGent) , Ilse De Vos (UGent) and Els Lefever (UGent)
- Organization
- Project
- Abstract
- In this paper, we explore the feasibility of developing a part-of-speech tagger for not-normalised, Byzantine Greek epigrams. Hence, we compared three different transformer-based models with embedding representations, which are then fine-tuned on a fine-grained part-of-speech tagging task. To train the language models, we compiled two data sets: the first consisting of Ancient and Byzantine Greek texts, the second of Ancient, Byzantine and Modern Greek. This allowed us to ascertain whether Modern Greek contributes to the modelling of Byzantine Greek. For the supervised task of part-of-speech tagging, we collected a training set of existing, annotated (Ancient) Greek texts. For evaluation, a gold standard containing 10,000 tokens of unedited Byzantine Greek poems was manually annotated and validated through an inter-annotator agreement study. The experimental results look very promising, with the BERT model trained on all Greek data achieving the best performance for fine-grained part-of-speech tagging.
- Keywords
- Byzantine Greek, Part-of-speech tagging, Morphological analysis, Computational linguistics, Natural language processing, Machine learning, Neural networks, Language models
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 1.28 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01HHJ697YYWMESFKMWT0HDTCZP
- MLA
- Swaelens, Colin, et al. “Linguistic Annotation of Byzantine Book Epigrams.” LANGUAGE RESOURCES AND EVALUATION, 2024, doi:10.1007/s10579-023-09703-x.
- APA
- Swaelens, C., De Vos, I., & Lefever, E. (2024). Linguistic annotation of Byzantine book epigrams. LANGUAGE RESOURCES AND EVALUATION. https://doi.org/10.1007/s10579-023-09703-x
- Chicago author-date
- Swaelens, Colin, Ilse De Vos, and Els Lefever. 2024. “Linguistic Annotation of Byzantine Book Epigrams.” LANGUAGE RESOURCES AND EVALUATION. https://doi.org/10.1007/s10579-023-09703-x.
- Chicago author-date (all authors)
- Swaelens, Colin, Ilse De Vos, and Els Lefever. 2024. “Linguistic Annotation of Byzantine Book Epigrams.” LANGUAGE RESOURCES AND EVALUATION. doi:10.1007/s10579-023-09703-x.
- Vancouver
- 1.Swaelens C, De Vos I, Lefever E. Linguistic annotation of Byzantine book epigrams. LANGUAGE RESOURCES AND EVALUATION. 2024;
- IEEE
- [1]C. Swaelens, I. De Vos, and E. Lefever, “Linguistic annotation of Byzantine book epigrams,” LANGUAGE RESOURCES AND EVALUATION, 2024.
@article{01HHJ697YYWMESFKMWT0HDTCZP, abstract = {{In this paper, we explore the feasibility of developing a part-of-speech tagger for not-normalised, Byzantine Greek epigrams. Hence, we compared three different transformer-based models with embedding representations, which are then fine-tuned on a fine-grained part-of-speech tagging task. To train the language models, we compiled two data sets: the first consisting of Ancient and Byzantine Greek texts, the second of Ancient, Byzantine and Modern Greek. This allowed us to ascertain whether Modern Greek contributes to the modelling of Byzantine Greek. For the supervised task of part-of-speech tagging, we collected a training set of existing, annotated (Ancient) Greek texts. For evaluation, a gold standard containing 10,000 tokens of unedited Byzantine Greek poems was manually annotated and validated through an inter-annotator agreement study. The experimental results look very promising, with the BERT model trained on all Greek data achieving the best performance for fine-grained part-of-speech tagging.}}, author = {{Swaelens, Colin and De Vos, Ilse and Lefever, Els}}, issn = {{1574-020X}}, journal = {{LANGUAGE RESOURCES AND EVALUATION}}, keywords = {{Byzantine Greek,Part-of-speech tagging,Morphological analysis,Computational linguistics,Natural language processing,Machine learning,Neural networks,Language models}}, language = {{eng,gre}}, pages = {{26}}, title = {{Linguistic annotation of Byzantine book epigrams}}, url = {{http://doi.org/10.1007/s10579-023-09703-x}}, year = {{2024}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: