Crowdsourced and AI-generated age-of-acquisition (AoA) norms for vocabulary in print : extending the Kuperman et al. (2012) norms
- Author
- Clarence Green, Anthony Pak-Hin Kong, Marc Brysbaert (UGent) and Kathleen Keogh
- Organization
- Abstract
- This paper revisits the age-of-acquisition (AoA) norms of Kuperman et al. (2012). Three studies were conducted. Study 1 reports a crowdsourcing 'megastudy' obtaining 790,024 estimates from participants with the age they could first read and write 11,074 early acquired words from Kuperman et al. (2012). The study aimed to differentiate between oral language receptive AoA and print-based AoA. The results correlate well with the original estimates, offering, as hypothesized, higher AoAs for reading/writing. These are released as supplements to the original norms. Study 2 explored the potential of large language models (LLMs), specifically GPT-4o, to replicate these crowdsourced AoA estimates. The findings indicated a strong correlation between AI-generated estimates and human judgments, showing the utility of AI in estimating AoA and developing norms for psycholinguistic and educational research in lieu of crowdsourcing. Study 3 leveraged AI to extend estimates to all well-known words in Kuperman et al. (2012) and the English Crowdsourcing Project (ECP). Study 3 also investigated a trained model fine-tuned on 2000 ratings from Kuperman et al. (2012). Fine-tuning increased alignment with human ratings, though comparisons with untrained models suggested that fine-tuning is not essential in English for obtaining useful AoA estimates. Both trained and untrained AI-generated norms correlated highly with human ratings and performed well in accounting for word processing times and accuracy in regressions. Uses and limitations of the AI estimates are discussed. All resources are made available in the Open Science Framework and can be used freely for research and education.
- Keywords
- Age of acquisition, Large language model, AI, Word norms, Crowdsourcing, Vocabulary, RATINGS, ENGLISH, IMAGEABILITY, WORDS, FREQUENCY, TEXT
Downloads
-
Green et al 2025 AI estimates of AoA English.pdf
- full text (Published version)
- |
- open access
- |
- |
- 4.21 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01KHGCYAPB96SQ8ZTZWYCEE4ND
- MLA
- Green, Clarence, et al. “Crowdsourced and AI-Generated Age-of-Acquisition (AoA) Norms for Vocabulary in Print : Extending the Kuperman et al. (2012) Norms.” BEHAVIOR RESEARCH METHODS, vol. 57, no. 11, 2025, doi:10.3758/s13428-025-02843-8.
- APA
- Green, C., Kong, A. P.-H., Brysbaert, M., & Keogh, K. (2025). Crowdsourced and AI-generated age-of-acquisition (AoA) norms for vocabulary in print : extending the Kuperman et al. (2012) norms. BEHAVIOR RESEARCH METHODS, 57(11). https://doi.org/10.3758/s13428-025-02843-8
- Chicago author-date
- Green, Clarence, Anthony Pak-Hin Kong, Marc Brysbaert, and Kathleen Keogh. 2025. “Crowdsourced and AI-Generated Age-of-Acquisition (AoA) Norms for Vocabulary in Print : Extending the Kuperman et al. (2012) Norms.” BEHAVIOR RESEARCH METHODS 57 (11). https://doi.org/10.3758/s13428-025-02843-8.
- Chicago author-date (all authors)
- Green, Clarence, Anthony Pak-Hin Kong, Marc Brysbaert, and Kathleen Keogh. 2025. “Crowdsourced and AI-Generated Age-of-Acquisition (AoA) Norms for Vocabulary in Print : Extending the Kuperman et al. (2012) Norms.” BEHAVIOR RESEARCH METHODS 57 (11). doi:10.3758/s13428-025-02843-8.
- Vancouver
- 1.Green C, Kong AP-H, Brysbaert M, Keogh K. Crowdsourced and AI-generated age-of-acquisition (AoA) norms for vocabulary in print : extending the Kuperman et al. (2012) norms. BEHAVIOR RESEARCH METHODS. 2025;57(11).
- IEEE
- [1]C. Green, A. P.-H. Kong, M. Brysbaert, and K. Keogh, “Crowdsourced and AI-generated age-of-acquisition (AoA) norms for vocabulary in print : extending the Kuperman et al. (2012) norms,” BEHAVIOR RESEARCH METHODS, vol. 57, no. 11, 2025.
@article{01KHGCYAPB96SQ8ZTZWYCEE4ND,
abstract = {{This paper revisits the age-of-acquisition (AoA) norms of Kuperman et al. (2012). Three studies were conducted. Study 1 reports a crowdsourcing 'megastudy' obtaining 790,024 estimates from participants with the age they could first read and write 11,074 early acquired words from Kuperman et al. (2012). The study aimed to differentiate between oral language receptive AoA and print-based AoA. The results correlate well with the original estimates, offering, as hypothesized, higher AoAs for reading/writing. These are released as supplements to the original norms. Study 2 explored the potential of large language models (LLMs), specifically GPT-4o, to replicate these crowdsourced AoA estimates. The findings indicated a strong correlation between AI-generated estimates and human judgments, showing the utility of AI in estimating AoA and developing norms for psycholinguistic and educational research in lieu of crowdsourcing. Study 3 leveraged AI to extend estimates to all well-known words in Kuperman et al. (2012) and the English Crowdsourcing Project (ECP). Study 3 also investigated a trained model fine-tuned on 2000 ratings from Kuperman et al. (2012). Fine-tuning increased alignment with human ratings, though comparisons with untrained models suggested that fine-tuning is not essential in English for obtaining useful AoA estimates. Both trained and untrained AI-generated norms correlated highly with human ratings and performed well in accounting for word processing times and accuracy in regressions. Uses and limitations of the AI estimates are discussed. All resources are made available in the Open Science Framework and can be used freely for research and education.}},
articleno = {{304}},
author = {{Green, Clarence and Kong, Anthony Pak-Hin and Brysbaert, Marc and Keogh, Kathleen}},
issn = {{1554-351X}},
journal = {{BEHAVIOR RESEARCH METHODS}},
keywords = {{Age of acquisition,Large language model,AI,Word norms,Crowdsourcing,Vocabulary,RATINGS,ENGLISH,IMAGEABILITY,WORDS,FREQUENCY,TEXT}},
language = {{eng}},
number = {{11}},
pages = {{27}},
title = {{Crowdsourced and AI-generated age-of-acquisition (AoA) norms for vocabulary in print : extending the Kuperman et al. (2012) norms}},
url = {{http://doi.org/10.3758/s13428-025-02843-8}},
volume = {{57}},
year = {{2025}},
}
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: