Advanced search
1 file | 239.84 KB Add to list

Word level language identification in online multilingual communication

Author
Organization
Abstract
Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data re- quire automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
Keywords
multilingual, Turkish, Dutch, code-switching, automatic language identification, social media, Lt3

Downloads

  • nguyen dogruoz EMNLP 2013.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 239.84 KB

Citation

Please use this url to cite or link to this publication:

MLA
Nguyen, Dong, and A. Seza Doğruöz. “Word Level Language Identification in Online Multilingual Communication.” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, edited by David Yarowsky et al., Association for Computational Linguistics (ACL), 2013, pp. 857–62.
APA
Nguyen, D., & Doğruöz, A. S. (2013). Word level language identification in online multilingual communication. In D. Yarowsky, T. Baldwin, A. Korhonen, K. Livescu, & S. Bethard (Eds.), Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 857–862). Seattle, Washington, USA: Association for Computational Linguistics (ACL).
Chicago author-date
Nguyen, Dong, and A. Seza Doğruöz. 2013. “Word Level Language Identification in Online Multilingual Communication.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, edited by David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, 857–62. Seattle, Washington, USA: Association for Computational Linguistics (ACL).
Chicago author-date (all authors)
Nguyen, Dong, and A. Seza Doğruöz. 2013. “Word Level Language Identification in Online Multilingual Communication.” In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, ed by. David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard, 857–862. Seattle, Washington, USA: Association for Computational Linguistics (ACL).
Vancouver
1.
Nguyen D, Doğruöz AS. Word level language identification in online multilingual communication. In: Yarowsky D, Baldwin T, Korhonen A, Livescu K, Bethard S, editors. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, Washington, USA: Association for Computational Linguistics (ACL); 2013. p. 857–62.
IEEE
[1]
D. Nguyen and A. S. Doğruöz, “Word level language identification in online multilingual communication,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, 2013, pp. 857–862.
@inproceedings{8694791,
  abstract     = {{Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data re- quire automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.}},
  author       = {{Nguyen, Dong and Doğruöz, A. Seza}},
  booktitle    = {{Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing}},
  editor       = {{Yarowsky, David and Baldwin, Timothy and Korhonen, Anna and Livescu, Karen and Bethard, Steven}},
  isbn         = {{9781937284978}},
  keywords     = {{multilingual,Turkish,Dutch,code-switching,automatic language identification,social media,Lt3}},
  language     = {{eng}},
  location     = {{Seattle, Washington, USA}},
  pages        = {{857--862}},
  publisher    = {{Association for Computational Linguistics (ACL)}},
  title        = {{Word level language identification in online multilingual communication}},
  url          = {{https://www.aclweb.org/anthology/D13-1.pdf}},
  year         = {{2013}},
}