Advanced search
1 file | 4.41 MB Add to list

Word knowledge in the crowd: measuring vocabulary size and word prevalence in a massive online experiment

Emmanuel Keuleers (UGent) , Michaël Stevens (UGent) , Pawel Mandera (UGent) and Marc Brysbaert (UGent)
Author
Organization
Abstract
We use the results of a large online experiment on word knowledge in Dutch to investigate variables influencing vocabulary size in a large population and to examine the effect of word prevalence -the percentage of a population knowing a word- as a measure of word occurrence. Nearly 300,000 participants were presented with about 70 word stimuli (selected from a list of 53,000 words) in an adapted lexical decision task. We identify age, education and multilingualism as the most important factors influencing vocabulary size. The results suggest that the accumulation of vocabulary throughout life and in multiple languages mirrors the logarithmic growth of number of types with number of tokens observed in text corpora (Herdan's law). Moreover, the vocabulary that multilinguals acquire in related languages seems to increase their L1 vocabulary size and outweighs the loss caused by decreased exposure to L1. In addition, we show that corpus word frequency and prevalence are complementary measures of word occurrence covering a broad range of language experiences. Prevalence is shown to be the strongest independent predictor of word processing times in the Dutch Lexicon Project, making it an important variable for psycholinguistic research.
Keywords
FREQUENCY, ENGLISH LEXICON PROJECT, RECOGNITION, RATINGS, MEMORY, LEMMAS, NORMS, Prevalence, Herdan's law, Bilingualism, Frequency, Ageing, Crowdsourcing

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 4.41 MB

Citation

Please use this url to cite or link to this publication:

MLA
Keuleers, Emmanuel et al. “Word Knowledge in the Crowd: Measuring Vocabulary Size and Word Prevalence in a Massive Online Experiment.” QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY 68.8 (2015): 1665–1692. Print.
APA
Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word knowledge in the crowd: measuring vocabulary size and word prevalence in a massive online experiment. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 68(8), 1665–1692.
Chicago author-date
Keuleers, Emmanuel, Michaël Stevens, Pawel Mandera, and Marc Brysbaert. 2015. “Word Knowledge in the Crowd: Measuring Vocabulary Size and Word Prevalence in a Massive Online Experiment.” Quarterly Journal of Experimental Psychology 68 (8): 1665–1692.
Chicago author-date (all authors)
Keuleers, Emmanuel, Michaël Stevens, Pawel Mandera, and Marc Brysbaert. 2015. “Word Knowledge in the Crowd: Measuring Vocabulary Size and Word Prevalence in a Massive Online Experiment.” Quarterly Journal of Experimental Psychology 68 (8): 1665–1692.
Vancouver
1.
Keuleers E, Stevens M, Mandera P, Brysbaert M. Word knowledge in the crowd: measuring vocabulary size and word prevalence in a massive online experiment. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY. 2015;68(8):1665–92.
IEEE
[1]
E. Keuleers, M. Stevens, P. Mandera, and M. Brysbaert, “Word knowledge in the crowd: measuring vocabulary size and word prevalence in a massive online experiment,” QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, vol. 68, no. 8, pp. 1665–1692, 2015.
@article{5878579,
  abstract     = {We use the results of a large online experiment on word knowledge in Dutch to investigate variables influencing vocabulary size in a large population and to examine the effect of word prevalence -the percentage of a population knowing a word- as a measure of word occurrence. Nearly 300,000 participants were presented with about 70 word stimuli (selected from a list of 53,000 words) in an adapted lexical decision task. We identify age, education and multilingualism as the most important factors influencing vocabulary size. The results suggest that the accumulation of vocabulary throughout life and in multiple languages mirrors the logarithmic growth of number of types with number of tokens observed in text corpora (Herdan's law). Moreover, the vocabulary that multilinguals acquire in related languages seems to increase their L1 vocabulary size and outweighs the loss caused by decreased exposure to L1. In addition, we show that corpus word frequency and prevalence are complementary measures of word occurrence covering a broad range of language experiences. Prevalence is shown to be the strongest independent predictor of word processing times in the Dutch Lexicon Project, making it an important variable for psycholinguistic research.},
  author       = {Keuleers, Emmanuel and Stevens, Michaël and Mandera, Pawel and Brysbaert, Marc},
  issn         = {1747-0218},
  journal      = {QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY},
  keywords     = {FREQUENCY,ENGLISH LEXICON PROJECT,RECOGNITION,RATINGS,MEMORY,LEMMAS,NORMS,Prevalence,Herdan's law,Bilingualism,Frequency,Ageing,Crowdsourcing},
  language     = {eng},
  number       = {8},
  pages        = {1665--1692},
  title        = {Word knowledge in the crowd: measuring vocabulary size and word prevalence in a massive online experiment},
  url          = {http://dx.doi.org/10.1080/17470218.2015.1022560},
  volume       = {68},
  year         = {2015},
}

Altmetric
View in Altmetric
Web of Science
Times cited: