
Unlocking domain knowledge : model adaptation for non-normative Dutch
- Author
- Florian Debaene (UGent) , Aaron Maladry (UGent) , Pranaydeep Singh (UGent) , Els Lefever (UGent) and Veronique Hoste (UGent)
- Organization
- Project
- Abstract
- This study examines the adaptation of transformer models to two non-normative Dutch language variants: early modern Dutch and contemporary social media Dutch. Both share linguistic features that set them apart from standard Dutch, including spelling inconsistencies, semantic shifts and out-of-domain vocabulary. To address this, we explore two domain adaptation techniques to adapt models to these language variants: (1) continued full-model pre-training and (2) training specialized adapters integrated into existing models. We evaluate these adaptation techniques on sentiment and emotion detection in early modern Dutch comedies and farces and on emotion and irony detection in Dutch tweets. Our results show that both adaptation methods significantly improve performance on historical and social media Dutch tasks, with the greatest gains occurring when domain-relevant datasets are used. The effectiveness of model adaptation is task-dependent and sensitive to the selection of pre-training data, emphasizing domain relevance over data quantity for optimizing downstream performance. We hypothesize that contemporary Dutch encoder models already capture informal language but lack historical Dutch exposure, making adaptation more impactful for the latter. Additionally, we compare adapted encoder models to generative decoder models, which are state-of-the-art in many NLP tasks. While generative models fail to match the performance of our adapted models for historical Dutch, fine-tuned generative models outperform adapted models on social media Dutch tasks. This suggests that task-specific fine-tuning remains crucial for effective generative modelling. Finally, we release two pre-training corpora for Dutch encoder adaptation and two novel task-specific datasets for early modern Dutch on Hugging Face.
Downloads
-
publisher version.pdf
- full text (Published version)
- |
- open access
- |
- |
- 404.49 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01JN3S41CMSR4AN0QX11R9D1EC
- MLA
- Debaene, Florian, et al. “Unlocking Domain Knowledge : Model Adaptation for Non-Normative Dutch.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 2025.
- APA
- Debaene, F., Maladry, A., Singh, P., Lefever, E., & Hoste, V. (2025). Unlocking domain knowledge : model adaptation for non-normative Dutch. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL.
- Chicago author-date
- Debaene, Florian, Aaron Maladry, Pranaydeep Singh, Els Lefever, and Veronique Hoste. 2025. “Unlocking Domain Knowledge : Model Adaptation for Non-Normative Dutch.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL.
- Chicago author-date (all authors)
- Debaene, Florian, Aaron Maladry, Pranaydeep Singh, Els Lefever, and Veronique Hoste. 2025. “Unlocking Domain Knowledge : Model Adaptation for Non-Normative Dutch.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL.
- Vancouver
- 1.Debaene F, Maladry A, Singh P, Lefever E, Hoste V. Unlocking domain knowledge : model adaptation for non-normative Dutch. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2025;
- IEEE
- [1]F. Debaene, A. Maladry, P. Singh, E. Lefever, and V. Hoste, “Unlocking domain knowledge : model adaptation for non-normative Dutch,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 2025.
@article{01JN3S41CMSR4AN0QX11R9D1EC, abstract = {{This study examines the adaptation of transformer models to two non-normative Dutch language variants: early modern Dutch and contemporary social media Dutch. Both share linguistic features that set them apart from standard Dutch, including spelling inconsistencies, semantic shifts and out-of-domain vocabulary. To address this, we explore two domain adaptation techniques to adapt models to these language variants: (1) continued full-model pre-training and (2) training specialized adapters integrated into existing models. We evaluate these adaptation techniques on sentiment and emotion detection in early modern Dutch comedies and farces and on emotion and irony detection in Dutch tweets. Our results show that both adaptation methods significantly improve performance on historical and social media Dutch tasks, with the greatest gains occurring when domain-relevant datasets are used. The effectiveness of model adaptation is task-dependent and sensitive to the selection of pre-training data, emphasizing domain relevance over data quantity for optimizing downstream performance. We hypothesize that contemporary Dutch encoder models already capture informal language but lack historical Dutch exposure, making adaptation more impactful for the latter. Additionally, we compare adapted encoder models to generative decoder models, which are state-of-the-art in many NLP tasks. While generative models fail to match the performance of our adapted models for historical Dutch, fine-tuned generative models outperform adapted models on social media Dutch tasks. This suggests that task-specific fine-tuning remains crucial for effective generative modelling. Finally, we release two pre-training corpora for Dutch encoder adaptation and two novel task-specific datasets for early modern Dutch on Hugging Face.}}, author = {{Debaene, Florian and Maladry, Aaron and Singh, Pranaydeep and Lefever, Els and Hoste, Veronique}}, issn = {{2211-4009}}, journal = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}}, language = {{eng}}, title = {{Unlocking domain knowledge : model adaptation for non-normative Dutch}}, year = {{2025}}, }