Advanced search
1 file | 212.10 KB Add to list

The iron(ic) melting pot : reviewing human evaluation in humour, irony and sarcasm generation

Author
Organization
Abstract
Human evaluation in often considered to be the gold standard method of evaluating a Natural Language Generation system. However, whilst its importance is accepted by the community at large, the quality of its execution is often brought into question. In this position paper, we argue that the generation of more esoteric forms of language - humour, irony and sarcasm - constitutes a subdomain where the characteristics of selected evaluator panels are of utmost importance, and every effort should be made to report demographic characteristics wherever possible, in the interest of transparency and replicability. We support these claims with an overview of each language form and an analysis of examples in terms of how their interpretation is affected by different participant variables. We additionally perform a critical survey of recent works in NLG to assess how well evaluation procedures are reported in this subdomain, and note a severe lack of open reporting of evaluator demographic information, and a significant reliance on crowdsourcing platforms for recruitment.
Keywords
PERCEPTIONS

Downloads

  • 2023.findings-emnlp.444.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 212.10 KB

Citation

Please use this url to cite or link to this publication:

MLA
Loakman, Tyler, et al. “The Iron(Ic) Melting Pot : Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation.” Findings of the Association for Computational Linguistics : EMNLP 2023, edited by Houda Bouamor et al., Association for Computational Linguistics, 2023, pp. 6676–89, doi:10.18653/v1/2023.findings-emnlp.444.
APA
Loakman, T., Maladry, A., & Lin, C. (2023). The iron(ic) melting pot : reviewing human evaluation in humour, irony and sarcasm generation. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics : EMNLP 2023 (pp. 6676–6689). https://doi.org/10.18653/v1/2023.findings-emnlp.444
Chicago author-date
Loakman, Tyler, Aaron Maladry, and Chenghua Lin. 2023. “The Iron(Ic) Melting Pot : Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation.” In Findings of the Association for Computational Linguistics : EMNLP 2023, edited by Houda Bouamor, Juan Pino, and Kalika Bali, 6676–89. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.444.
Chicago author-date (all authors)
Loakman, Tyler, Aaron Maladry, and Chenghua Lin. 2023. “The Iron(Ic) Melting Pot : Reviewing Human Evaluation in Humour, Irony and Sarcasm Generation.” In Findings of the Association for Computational Linguistics : EMNLP 2023, ed by. Houda Bouamor, Juan Pino, and Kalika Bali, 6676–6689. Association for Computational Linguistics. doi:10.18653/v1/2023.findings-emnlp.444.
Vancouver
1.
Loakman T, Maladry A, Lin C. The iron(ic) melting pot : reviewing human evaluation in humour, irony and sarcasm generation. In: Bouamor H, Pino J, Bali K, editors. Findings of the Association for Computational Linguistics : EMNLP 2023. Association for Computational Linguistics; 2023. p. 6676–89.
IEEE
[1]
T. Loakman, A. Maladry, and C. Lin, “The iron(ic) melting pot : reviewing human evaluation in humour, irony and sarcasm generation,” in Findings of the Association for Computational Linguistics : EMNLP 2023, Singapore, 2023, pp. 6676–6689.
@inproceedings{01HSV550GJ9TW02GKEAJMNWPHM,
  abstract     = {{Human evaluation in often considered to be the gold standard method of evaluating a Natural Language Generation system. However, whilst its importance is accepted by the community at large, the quality of its execution is often brought into question. In this position paper, we argue that the generation of more esoteric forms of language - humour, irony and sarcasm - constitutes a subdomain where the characteristics of selected evaluator panels are of utmost importance, and every effort should be made to report demographic characteristics wherever possible, in the interest of transparency and replicability. We support these claims with an overview of each language form and an analysis of examples in terms of how their interpretation is affected by different participant variables. We additionally perform a critical survey of recent works in NLG to assess how well evaluation procedures are reported in this subdomain, and note a severe lack of open reporting of evaluator demographic information, and a significant reliance on crowdsourcing platforms for recruitment.}},
  author       = {{Loakman, Tyler and Maladry, Aaron and Lin, Chenghua}},
  booktitle    = {{Findings of the Association for Computational Linguistics : EMNLP 2023}},
  editor       = {{Bouamor, Houda and Pino, Juan and Bali, Kalika}},
  isbn         = {{9798891760615}},
  keywords     = {{PERCEPTIONS}},
  language     = {{eng}},
  location     = {{Singapore}},
  pages        = {{6676--6689}},
  publisher    = {{Association for Computational Linguistics}},
  title        = {{The iron(ic) melting pot : reviewing human evaluation in humour, irony and sarcasm generation}},
  url          = {{http://doi.org/10.18653/v1/2023.findings-emnlp.444}},
  year         = {{2023}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: