Advanced search
1 file | 2.20 MB Add to list
Author
Organization
Abstract
Stereotypes have been studied extensively in the fields of social psychology and, especially with the recent advances in technology, in computational linguistics. Stereotypes have also gained even more attention nowadays because of a notable rise in their dissemination due to demographic changes and world events. This paper focuses on ethnic stereotypes related to immigration and presents the StereoHoax corpus, a multilingual dataset of 17,814 tweets in French, Italian, and Spanish. The corpus includes conversational threads reporting on and responding to racial hoaxes about immigrants, which we define as false claims of unlawful actions attributed to specific ethnic groups. This work describes the data collection process and the fine-grained annotation scheme we used, which is based mainly on the Stereotype Content Model adapted to the study applied to immigrants of Bosco et al. (2023). Quantitative and qualitative analyses show the distribution and correlation of annotated categories across languages, revealing, for instance, intercultural differences in the expression of stereotypes through forms of discredit. To validate our data, we performed four machine learning experiments using pre-trained BERT-like models in order to lay a foundation for automatic stereotype detection research. Leveraging the StereoHoax corpus, we gained crucial insights into the importance of context, especially in relation to the detection of implicit stereotypes. Overall, we believe that the StereoHoax corpus will prove to be a valuable resource for the automatic detection of stereotypes regarding immigrants and the study of the linguistic and psychological patterns associated with their dissemination.
Keywords
Racial Hoax, Immigration, Social psychology, Natural language processing, Stereotype detection, Corpus analysis, SEXISM IDENTIFICATION, MORAL DISENGAGEMENT, DEHUMANIZATION, IMMIGRANTS, PREJUDICE, LANGUAGE, WARMTH

Downloads

  • s10579-024-09791-3.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.20 MB

Citation

Please use this url to cite or link to this publication:

MLA
Schmeisser-Nieto, Wolfgang S., et al. “Stereohoax : A Multilingual Corpus of Racial Hoaxes and Social Media Reactions Annotated for Stereotypes.” LANGUAGE RESOURCES AND EVALUATION, 2025, doi:10.1007/s10579-024-09791-3.
APA
Schmeisser-Nieto, W. S., Cignarella, A. T., Bourgeade, T., Frenda, S., Ariza-Casabona, A., Laurent, M., … D’Errico, F. (2025). Stereohoax : a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes. LANGUAGE RESOURCES AND EVALUATION. https://doi.org/10.1007/s10579-024-09791-3
Chicago author-date
Schmeisser-Nieto, Wolfgang S., Alessandra Teresa Cignarella, Tom Bourgeade, Simona Frenda, Alejandro Ariza-Casabona, Mario Laurent, Paolo Giovanni Cicirelli, et al. 2025. “Stereohoax : A Multilingual Corpus of Racial Hoaxes and Social Media Reactions Annotated for Stereotypes.” LANGUAGE RESOURCES AND EVALUATION. https://doi.org/10.1007/s10579-024-09791-3.
Chicago author-date (all authors)
Schmeisser-Nieto, Wolfgang S., Alessandra Teresa Cignarella, Tom Bourgeade, Simona Frenda, Alejandro Ariza-Casabona, Mario Laurent, Paolo Giovanni Cicirelli, Andrea Marra, Giuseppe Corbelli, Farah Benamara, Cristina Bosco, Veronique Moriceau, Marinella Paciello, Viviana Patti, Mariona Taule, and Francesca D’Errico. 2025. “Stereohoax : A Multilingual Corpus of Racial Hoaxes and Social Media Reactions Annotated for Stereotypes.” LANGUAGE RESOURCES AND EVALUATION. doi:10.1007/s10579-024-09791-3.
Vancouver
1.
Schmeisser-Nieto WS, Cignarella AT, Bourgeade T, Frenda S, Ariza-Casabona A, Laurent M, et al. Stereohoax : a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes. LANGUAGE RESOURCES AND EVALUATION. 2025;
IEEE
[1]
W. S. Schmeisser-Nieto et al., “Stereohoax : a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes,” LANGUAGE RESOURCES AND EVALUATION, 2025.
@article{01JHT9G0V7Q5J5EBMZY5SJJSZM,
  abstract     = {{Stereotypes have been studied extensively in the fields of social psychology and, especially with the recent advances in technology, in computational linguistics. Stereotypes have also gained even more attention nowadays because of a notable rise in their dissemination due to demographic changes and world events. This paper focuses on ethnic stereotypes related to immigration and presents the StereoHoax corpus, a multilingual dataset of 17,814 tweets in French, Italian, and Spanish. The corpus includes conversational threads reporting on and responding to racial hoaxes about immigrants, which we define as false claims of unlawful actions attributed to specific ethnic groups. This work describes the data collection process and the fine-grained annotation scheme we used, which is based mainly on the Stereotype Content Model adapted to the study applied to immigrants of Bosco et al. (2023). Quantitative and qualitative analyses show the distribution and correlation of annotated categories across languages, revealing, for instance, intercultural differences in the expression of stereotypes through forms of discredit. To validate our data, we performed four machine learning experiments using pre-trained BERT-like models in order to lay a foundation for automatic stereotype detection research. Leveraging the StereoHoax corpus, we gained crucial insights into the importance of context, especially in relation to the detection of implicit stereotypes. Overall, we believe that the StereoHoax corpus will prove to be a valuable resource for the automatic detection of stereotypes regarding immigrants and the study of the linguistic and psychological patterns associated with their dissemination.}},
  author       = {{Schmeisser-Nieto, Wolfgang S. and Cignarella, Alessandra Teresa and Bourgeade, Tom and Frenda, Simona and Ariza-Casabona, Alejandro and Laurent, Mario and Cicirelli, Paolo Giovanni and Marra, Andrea and Corbelli, Giuseppe and Benamara, Farah and Bosco, Cristina and Moriceau, Veronique and Paciello, Marinella and Patti, Viviana and Taule, Mariona and D'Errico, Francesca}},
  issn         = {{1574-020X}},
  journal      = {{LANGUAGE RESOURCES AND EVALUATION}},
  keywords     = {{Racial Hoax,Immigration,Social psychology,Natural language processing,Stereotype detection,Corpus analysis,SEXISM IDENTIFICATION,MORAL DISENGAGEMENT,DEHUMANIZATION,IMMIGRANTS,PREJUDICE,LANGUAGE,WARMTH}},
  language     = {{eng}},
  pages        = {{39}},
  title        = {{Stereohoax : a multilingual corpus of racial hoaxes and social media reactions annotated for stereotypes}},
  url          = {{http://doi.org/10.1007/s10579-024-09791-3}},
  year         = {{2025}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: