- Author
- Pranaydeep Singh (UGent) , Orphée De Clercq (UGent) and Els Lefever (UGent)
- Organization
- Abstract
- Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-a-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on two downstream tasks each for six languages. We further dive into the possible modifications to the basic setup for low-resourced languages by exploring ideas to tune the final vocabulary of the distilled models. Lastly, we perform a detailed ablation study to understand the different components of the setup better and find out what works best for the two under-resourced languages, Swahili and Slovene.
- Keywords
- knowledge distillation, low-resource NLP, sustainable NLP, language modeling
Downloads
-
electronics Distillation.pdf
- full text (Published version)
- |
- open access
- |
- |
- 796.63 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01GTVN7TTFWD95ZSMMV19CWY5H
- MLA
- Singh, Pranaydeep, et al. “Distilling Monolingual Models from Large Multilingual Transformers.” ELECTRONICS, vol. 12, no. 4, 2023, doi:10.3390/electronics12041022.
- APA
- Singh, P., De Clercq, O., & Lefever, E. (2023). Distilling monolingual models from large multilingual transformers. ELECTRONICS, 12(4). https://doi.org/10.3390/electronics12041022
- Chicago author-date
- Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2023. “Distilling Monolingual Models from Large Multilingual Transformers.” ELECTRONICS 12 (4). https://doi.org/10.3390/electronics12041022.
- Chicago author-date (all authors)
- Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2023. “Distilling Monolingual Models from Large Multilingual Transformers.” ELECTRONICS 12 (4). doi:10.3390/electronics12041022.
- Vancouver
- 1.Singh P, De Clercq O, Lefever E. Distilling monolingual models from large multilingual transformers. ELECTRONICS. 2023;12(4).
- IEEE
- [1]P. Singh, O. De Clercq, and E. Lefever, “Distilling monolingual models from large multilingual transformers,” ELECTRONICS, vol. 12, no. 4, 2023.
@article{01GTVN7TTFWD95ZSMMV19CWY5H,
abstract = {{Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-a-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on two downstream tasks each for six languages. We further dive into the possible modifications to the basic setup for low-resourced languages by exploring ideas to tune the final vocabulary of the distilled models. Lastly, we perform a detailed ablation study to understand the different components of the setup better and find out what works best for the two under-resourced languages, Swahili and Slovene.}},
articleno = {{1022}},
author = {{Singh, Pranaydeep and De Clercq, Orphée and Lefever, Els}},
issn = {{2079-9292}},
journal = {{ELECTRONICS}},
keywords = {{knowledge distillation,low-resource NLP,sustainable NLP,language modeling}},
language = {{eng}},
number = {{4}},
pages = {{17}},
title = {{Distilling monolingual models from large multilingual transformers}},
url = {{http://doi.org/10.3390/electronics12041022}},
volume = {{12}},
year = {{2023}},
}
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: