Why gender and age prediction from tweets is hard : lessons from a crowdsourcing experiment
- Author
- Dong Nguyen, Dolf Trieschnigg, A. Seza Doğruöz (UGent) , Rilana Gravel, Mariët Theune, Theo Meder and Franciska de Jong
- Organization
- Abstract
- There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speak- ers, which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game, to underline the importance of approaching age and gender as social variables rather than static biological variables. In our game, thousands of players guessed the gender and age of Twitter users based on tweets alone. We show that more than 10% of the Twitter users do not employ language that the crowd associates with their biological sex. It is also shown that older Twitter users are often perceived to be younger. Our findings highlight the limitations of current approaches to gender and age prediction from texts.
- Keywords
- social media, Dutch, twitter, gender, crowdsourcing, Lt3, computational sociolinguistics
Downloads
-
Dogruoz COLING 2014.pdf
- full text (Published version)
- |
- open access
- |
- |
- 890.68 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8694786
- MLA
- Nguyen, Dong, et al. “Why Gender and Age Prediction from Tweets Is Hard : Lessons from a Crowdsourcing Experiment.” Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : Technical Papers, Dublin City University and Association for Computational Linguistics, 2014, pp. 1950–61.
- APA
- Nguyen, D., Trieschnigg, D., Doğruöz, A. S., Gravel, R., Theune, M., Meder, T., & de Jong, F. (2014). Why gender and age prediction from tweets is hard : lessons from a crowdsourcing experiment. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : Technical Papers, 1950–1961. Dublin, Ireland: Dublin City University and Association for Computational Linguistics.
- Chicago author-date
- Nguyen, Dong, Dolf Trieschnigg, A. Seza Doğruöz, Rilana Gravel, Mariët Theune, Theo Meder, and Franciska de Jong. 2014. “Why Gender and Age Prediction from Tweets Is Hard : Lessons from a Crowdsourcing Experiment.” In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : Technical Papers, 1950–61. Dublin, Ireland: Dublin City University and Association for Computational Linguistics.
- Chicago author-date (all authors)
- Nguyen, Dong, Dolf Trieschnigg, A. Seza Doğruöz, Rilana Gravel, Mariët Theune, Theo Meder, and Franciska de Jong. 2014. “Why Gender and Age Prediction from Tweets Is Hard : Lessons from a Crowdsourcing Experiment.” In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : Technical Papers, 1950–1961. Dublin, Ireland: Dublin City University and Association for Computational Linguistics.
- Vancouver
- 1.Nguyen D, Trieschnigg D, Doğruöz AS, Gravel R, Theune M, Meder T, et al. Why gender and age prediction from tweets is hard : lessons from a crowdsourcing experiment. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : technical papers. Dublin, Ireland: Dublin City University and Association for Computational Linguistics; 2014. p. 1950–61.
- IEEE
- [1]D. Nguyen et al., “Why gender and age prediction from tweets is hard : lessons from a crowdsourcing experiment,” in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : technical papers, Dublin, Ireland, 2014, pp. 1950–1961.
@inproceedings{8694786,
abstract = {{There is a growing interest in automatically predicting the gender and age of authors from texts. However, most research so far ignores that language use is related to the social identity of speak- ers, which may be different from their biological identity. In this paper, we combine insights from sociolinguistics with data collected through an online game, to underline the importance of approaching age and gender as social variables rather than static biological variables. In our game, thousands of players guessed the gender and age of Twitter users based on tweets alone. We show that more than 10% of the Twitter users do not employ language that the crowd associates with their biological sex. It is also shown that older Twitter users are often perceived to be younger. Our findings highlight the limitations of current approaches to gender and age prediction from texts.}},
author = {{Nguyen, Dong and Trieschnigg, Dolf and Doğruöz, A. Seza and Gravel, Rilana and Theune, Mariët and Meder, Theo and de Jong, Franciska}},
booktitle = {{Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics : technical papers}},
isbn = {{9781941643266}},
keywords = {{social media,Dutch,twitter,gender,crowdsourcing,Lt3,computational sociolinguistics}},
language = {{eng}},
location = {{Dublin, Ireland}},
pages = {{1950--1961}},
publisher = {{Dublin City University and Association for Computational Linguistics}},
title = {{Why gender and age prediction from tweets is hard : lessons from a crowdsourcing experiment}},
url = {{https://www.aclweb.org/anthology/C14-1.pdf}},
year = {{2014}},
}