Advanced search
1 file | 796.63 KB Add to list

Distilling monolingual models from large multilingual transformers

Pranaydeep Singh (UGent) , Orphée De Clercq (UGent) and Els Lefever (UGent)
(2023) ELECTRONICS. 12(4).
Author
Organization
Abstract
Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-a-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on two downstream tasks each for six languages. We further dive into the possible modifications to the basic setup for low-resourced languages by exploring ideas to tune the final vocabulary of the distilled models. Lastly, we perform a detailed ablation study to understand the different components of the setup better and find out what works best for the two under-resourced languages, Swahili and Slovene.
Keywords
knowledge distillation, low-resource NLP, sustainable NLP, language modeling

Downloads

  • electronics Distillation.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 796.63 KB

Citation

Please use this url to cite or link to this publication:

MLA
Singh, Pranaydeep, et al. “Distilling Monolingual Models from Large Multilingual Transformers.” ELECTRONICS, vol. 12, no. 4, 2023, doi:10.3390/electronics12041022.
APA
Singh, P., De Clercq, O., & Lefever, E. (2023). Distilling monolingual models from large multilingual transformers. ELECTRONICS, 12(4). https://doi.org/10.3390/electronics12041022
Chicago author-date
Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2023. “Distilling Monolingual Models from Large Multilingual Transformers.” ELECTRONICS 12 (4). https://doi.org/10.3390/electronics12041022.
Chicago author-date (all authors)
Singh, Pranaydeep, Orphée De Clercq, and Els Lefever. 2023. “Distilling Monolingual Models from Large Multilingual Transformers.” ELECTRONICS 12 (4). doi:10.3390/electronics12041022.
Vancouver
1.
Singh P, De Clercq O, Lefever E. Distilling monolingual models from large multilingual transformers. ELECTRONICS. 2023;12(4).
IEEE
[1]
P. Singh, O. De Clercq, and E. Lefever, “Distilling monolingual models from large multilingual transformers,” ELECTRONICS, vol. 12, no. 4, 2023.
@article{01GTVN7TTFWD95ZSMMV19CWY5H,
  abstract     = {{Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual models such as mBERT and XLM-RoBERTa, which come with significant overheads for deployment vis-a-vis their model size, inference speeds, etc. We attempt to tackle this problem by proposing a novel methodology to apply knowledge distillation techniques to filter language-specific information from a large multilingual model into a small, fast monolingual model that can often outperform the teacher model. We demonstrate the viability of this methodology on two downstream tasks each for six languages. We further dive into the possible modifications to the basic setup for low-resourced languages by exploring ideas to tune the final vocabulary of the distilled models. Lastly, we perform a detailed ablation study to understand the different components of the setup better and find out what works best for the two under-resourced languages, Swahili and Slovene.}},
  articleno    = {{1022}},
  author       = {{Singh, Pranaydeep and De Clercq, Orphée and Lefever, Els}},
  issn         = {{2079-9292}},
  journal      = {{ELECTRONICS}},
  keywords     = {{knowledge distillation,low-resource NLP,sustainable NLP,language modeling}},
  language     = {{eng}},
  number       = {{4}},
  pages        = {{17}},
  title        = {{Distilling monolingual models from large multilingual transformers}},
  url          = {{http://doi.org/10.3390/electronics12041022}},
  volume       = {{12}},
  year         = {{2023}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: