Language technologies for low resource languages : sociolinguistic and multilingual insights
- Author
- A. Seza Doğruöz (UGent) and Sunayana Sitaram
- Organization
- Abstract
- There is a growing interest in building language technologies (LTs) for low resource languages (LRLs). However, there are flaws in the planning, data collection and development phases mostly due to the assumption that LRLs are similar to High Resource Languages (HRLs) but only smaller in size. In our paper, we first provide examples of failed LTs for LRLs and provide the reasons for these failures. Second, we discuss the problematic issues with the data for LRLs. Finally, we provide recommendations for building better LTs for LRLs through insights from sociolinguistics and multilingualism. Our goal is not to solve all problems around LTs for LRLs but to raise awareness about the existing issues, provide recommendations toward possible solutions and encourage collaboration across academic disciplines for developing LTs that actually serve the needs and preferences of the LRL communities.
- Keywords
- Low Resource Languages, Computational Linguistics, LT3, Multilingualism
Downloads
-
Dogruoz Sitaram LREC 2022.pdf
- full text (Published version)
- |
- open access
- |
- |
- 132.59 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8756694
- MLA
- Doğruöz, A. Seza, and Sunayana Sitaram. “Language Technologies for Low Resource Languages : Sociolinguistic and Multilingual Insights.” Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, edited by Maite Melero et al., European Language Resources Association (ELRA), 2022, pp. 92–97.
- APA
- Doğruöz, A. S., & Sitaram, S. (2022). Language technologies for low resource languages : sociolinguistic and multilingual insights. In M. Melero, S. Sakti, & C. Soria (Eds.), Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (pp. 92–97). Marseille, France: European Language Resources Association (ELRA).
- Chicago author-date
- Doğruöz, A. Seza, and Sunayana Sitaram. 2022. “Language Technologies for Low Resource Languages : Sociolinguistic and Multilingual Insights.” In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, edited by Maite Melero, Sakriani Sakti, and Claudia Soria, 92–97. Marseille, France: European Language Resources Association (ELRA).
- Chicago author-date (all authors)
- Doğruöz, A. Seza, and Sunayana Sitaram. 2022. “Language Technologies for Low Resource Languages : Sociolinguistic and Multilingual Insights.” In Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, ed by. Maite Melero, Sakriani Sakti, and Claudia Soria, 92–97. Marseille, France: European Language Resources Association (ELRA).
- Vancouver
- 1.Doğruöz AS, Sitaram S. Language technologies for low resource languages : sociolinguistic and multilingual insights. In: Melero M, Sakti S, Soria C, editors. Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages. Marseille, France: European Language Resources Association (ELRA); 2022. p. 92–7.
- IEEE
- [1]A. S. Doğruöz and S. Sitaram, “Language technologies for low resource languages : sociolinguistic and multilingual insights,” in Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages, Marseille, France, 2022, pp. 92–97.
@inproceedings{8756694, abstract = {{There is a growing interest in building language technologies (LTs) for low resource languages (LRLs). However, there are flaws in the planning, data collection and development phases mostly due to the assumption that LRLs are similar to High Resource Languages (HRLs) but only smaller in size. In our paper, we first provide examples of failed LTs for LRLs and provide the reasons for these failures. Second, we discuss the problematic issues with the data for LRLs. Finally, we provide recommendations for building better LTs for LRLs through insights from sociolinguistics and multilingualism. Our goal is not to solve all problems around LTs for LRLs but to raise awareness about the existing issues, provide recommendations toward possible solutions and encourage collaboration across academic disciplines for developing LTs that actually serve the needs and preferences of the LRL communities.}}, author = {{Doğruöz, A. Seza and Sitaram, Sunayana}}, booktitle = {{Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages}}, editor = {{Melero, Maite and Sakti, Sakriani and Soria, Claudia}}, isbn = {{9791095546917}}, keywords = {{Low Resource Languages,Computational Linguistics,LT3,Multilingualism}}, language = {{eng}}, location = {{Marseille, France}}, pages = {{92--97}}, publisher = {{European Language Resources Association (ELRA)}}, title = {{Language technologies for low resource languages : sociolinguistic and multilingual insights}}, url = {{http://www.lrec-conf.org/proceedings/lrec2022/workshops/SIGUL/index.html}}, year = {{2022}}, }