Word level language identification in online multilingual communication
- Author
- Dong Nguyen and A. Seza Doğruöz (UGent)
- Organization
- Abstract
- Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data re- quire automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
- Keywords
- multilingual, Turkish, Dutch, code-switching, automatic language identification, social media, Lt3
Downloads
-
nguyen dogruoz EMNLP 2013.pdf
- full text (Published version)
- |
- open access
- |
- |
- 239.84 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8694791
- MLA
- Nguyen, Dong, and A. Seza Doğruöz. “Word Level Language Identification in Online Multilingual Communication.” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, edited by David Yarowsky et al., Association for Computational Linguistics (ACL), 2013, pp. 857–62.
- APA
- Nguyen, D., & Doğruöz, A. S. (2013). Word level language identification in online multilingual communication. In D. Yarowsky, T. Baldwin, A. Korhonen, K. Livescu, & S. Bethard (Eds.), Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 857–862). Seattle, Washington, USA: Association for Computational Linguistics (ACL).
- Chicago author-date
- Nguyen, Dong, and A. Seza Doğruöz. 2013. “Word Level Language Identification in Online Multilingual Communication.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, edited by David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, 857–62. Seattle, Washington, USA: Association for Computational Linguistics (ACL).
- Chicago author-date (all authors)
- Nguyen, Dong, and A. Seza Doğruöz. 2013. “Word Level Language Identification in Online Multilingual Communication.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, ed by. David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, 857–862. Seattle, Washington, USA: Association for Computational Linguistics (ACL).
- Vancouver
- 1.Nguyen D, Doğruöz AS. Word level language identification in online multilingual communication. In: Yarowsky D, Baldwin T, Korhonen A, Livescu K, Bethard S, editors. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics (ACL); 2013. p. 857–62.
- IEEE
- [1]D. Nguyen and A. S. Doğruöz, “Word level language identification in online multilingual communication,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 2013, pp. 857–862.
@inproceedings{8694791,
abstract = {{Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data re- quire automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.}},
author = {{Nguyen, Dong and Doğruöz, A. Seza}},
booktitle = {{Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing}},
editor = {{Yarowsky, David and Baldwin, Timothy and Korhonen, Anna and Livescu, Karen and Bethard, Steven}},
isbn = {{9781937284978}},
keywords = {{multilingual,Turkish,Dutch,code-switching,automatic language identification,social media,Lt3}},
language = {{eng}},
location = {{Seattle, Washington, USA}},
pages = {{857--862}},
publisher = {{Association for Computational Linguistics (ACL)}},
title = {{Word level language identification in online multilingual communication}},
url = {{https://www.aclweb.org/anthology/D13-1.pdf}},
year = {{2013}},
}