Advanced search
1 file | 2.23 MB Add to list
Author
Organization
Project
Abstract
The Iranic language family includes many underrepresented languages and dialects that remain largely unexplored in modern NLP research. We introduce APARSIN, a multi-variety benchmark covering 14 Iranic languages, dialects, and accents, designed for sentiment analysis and machine translation. The dataset includes both high and low-resource varieties, several of which are endangered, capturing linguistic variation across them. We evaluate a set of instruction-tuned Large Language Models (LLMs) on these tasks and analyze their performance across the varieties. Our results highlight substantial performance gaps between standard Persian and other Iranic languages and dialects, demonstrating the need for more inclusive multilingual and dialectally diverse NLP benchmarks.

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.23 MB

Citation

Please use this url to cite or link to this publication:

MLA
Jafari, Sadegh, et al. “APARSIN : A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages.” The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, Association for Computational Linguistics (ACL), 2026, pp. 83–97, doi:10.18653/v1/2026.silkroadnlp-1.9.
APA
Jafari, S., Azin, T., Roodi, F., Dehghani Tafti, Z., Ghadrdan, M., Vatankhahan Esfahani, E., … Hoste, V. (2026). APARSIN : a multi-variety sentiment and translation benchmark for Iranic languages. The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, 83–97. https://doi.org/10.18653/v1/2026.silkroadnlp-1.9
Chicago author-date
Jafari, Sadegh, Tara Azin, Farhad Roodi, Zahra Dehghani Tafti, Mehrdad Ghadrdan, Elham Vatankhahan Esfahani, Aylin Naebzadeh, et al. 2026. “APARSIN : A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages.” In The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, 83–97. Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2026.silkroadnlp-1.9.
Chicago author-date (all authors)
Jafari, Sadegh, Tara Azin, Farhad Roodi, Zahra Dehghani Tafti, Mehrdad Ghadrdan, Elham Vatankhahan Esfahani, Aylin Naebzadeh, Mohammadhadi Shahhosseini, Ghafoor Khan, Kazem Forghani Forghani, Danial Namazi, Mohammad Hossein Hashemi, Farhan Farsi, Mohammad Osoolian, Maede Mohammadi, Mohammad Erfan Zare, Muhammad Hasnain Khan, Muhammad Hussain, Nooreen Zaki, Joma Mohammadi, Shayan Bali, Mohammad Javad Ranjbar, Els Lefever, and Veronique Hoste. 2026. “APARSIN : A Multi-Variety Sentiment and Translation Benchmark for Iranic Languages.” In The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, 83–97. Association for Computational Linguistics (ACL). doi:10.18653/v1/2026.silkroadnlp-1.9.
Vancouver
1.
Jafari S, Azin T, Roodi F, Dehghani Tafti Z, Ghadrdan M, Vatankhahan Esfahani E, et al. APARSIN : a multi-variety sentiment and translation benchmark for Iranic languages. In: The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family. Association for Computational Linguistics (ACL); 2026. p. 83–97.
IEEE
[1]
S. Jafari et al., “APARSIN : a multi-variety sentiment and translation benchmark for Iranic languages,” in The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family, Rabat, Morocco, 2026, pp. 83–97.
@inproceedings{01KJ5BTHMHWYPEH5GGFFWJYP7W,
  abstract     = {{The Iranic language family includes many underrepresented languages and dialects that remain largely unexplored in modern NLP research. We introduce APARSIN, a multi-variety benchmark covering 14 Iranic languages, dialects, and accents, designed for sentiment analysis and machine translation. The dataset includes both high and low-resource varieties, several of which are endangered, capturing linguistic variation across them. We evaluate a set of instruction-tuned Large Language Models (LLMs) on these tasks and analyze their performance across the varieties. Our results highlight substantial performance gaps between standard Persian and other Iranic languages and dialects, demonstrating the need for more inclusive multilingual and dialectally diverse NLP benchmarks.}},
  author       = {{Jafari, Sadegh and Azin, Tara and Roodi, Farhad and Dehghani Tafti, Zahra and Ghadrdan, Mehrdad and Vatankhahan Esfahani, Elham and Naebzadeh, Aylin and Shahhosseini, Mohammadhadi and Khan, Ghafoor and Forghani, Kazem Forghani and Namazi, Danial and Hossein Hashemi, Mohammad and Farsi, Farhan and Osoolian, Mohammad and Mohammadi, Maede and Erfan Zare, Mohammad and Hasnain Khan, Muhammad and Hussain, Muhammad and Zaki, Nooreen and Mohammadi, Joma and Bali, Shayan and Javad Ranjbar, Mohammad and Lefever, Els and Hoste, Veronique}},
  booktitle    = {{The Proceedings of the First Workshop on NLP and LLMs for the Iranian Language Family}},
  isbn         = {{9798891763715}},
  language     = {{eng}},
  location     = {{Rabat, Morocco}},
  pages        = {{83--97}},
  publisher    = {{Association for Computational Linguistics (ACL)}},
  title        = {{APARSIN : a multi-variety sentiment and translation benchmark for Iranic languages}},
  url          = {{http://doi.org/10.18653/v1/2026.silkroadnlp-1.9}},
  year         = {{2026}},
}

Altmetric
View in Altmetric