Ghent University Academic Bibliography

Advanced

SUBTLEX-CH: Chinese word and character frequencies based on film subtitles

Qing Cai and Marc Brysbaert UGent (2010) PLOS ONE. 5(6).
abstract
Background: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency available to researchers, and the quality is less than what researchers in other languages are used to. Methodology: Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million character, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. Conclusions: Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.
Please use this url to cite or link to this publication:
author
organization
year
type
journalArticle (original)
publication status
published
subject
keyword
LEXICON, NORMS, ENGLISH
journal title
PLOS ONE
PLoS One
volume
5
issue
6
article number
e10729
Web of Science type
Article
Web of Science id
000278284700001
JCR category
BIOLOGY
JCR impact factor
4.411 (2010)
JCR rank
12/84 (2010)
JCR quartile
1 (2010)
ISSN
1932-6203
DOI
10.1371/journal.pone.0010729
language
English
UGent publication?
yes
classification
A1
copyright statement
I have retained and own the full copyright for this publication
id
1045459
handle
http://hdl.handle.net/1854/LU-1045459
date created
2010-09-24 10:11:10
date last changed
2016-12-21 15:42:16
@article{1045459,
  abstract     = {Background: Word frequency is the most important variable in language research. However, despite the growing interest in the Chinese language, there are only a few sources of word frequency available to researchers, and the quality is less than what researchers in other languages are used to. Methodology: Following recent work by New, Brysbaert, and colleagues in English, French and Dutch, we assembled a database of word and character frequencies based on a corpus of film and television subtitles (46.8 million character, 33.5 million words). In line with what has been found in the other languages, the new word and character frequencies explain significantly more of the variance in Chinese word naming and lexical decision performance than measures based on written texts. Conclusions: Our results confirm that word frequencies based on subtitles are a good estimate of daily language exposure and capture much of the variance in word processing efficiency. In addition, our database is the first to include information about the contextual diversity of the words and to provide good frequency estimates for multi-character words and the different syntactic roles in which the words are used. The word frequencies are freely available for research purposes.},
  articleno    = {e10729},
  author       = {Cai, Qing and Brysbaert, Marc},
  issn         = {1932-6203},
  journal      = {PLOS ONE},
  keyword      = {LEXICON,NORMS,ENGLISH},
  language     = {eng},
  number       = {6},
  title        = {SUBTLEX-CH: Chinese word and character frequencies based on film subtitles},
  url          = {http://dx.doi.org/10.1371/journal.pone.0010729},
  volume       = {5},
  year         = {2010},
}

Chicago
Cai, Qing, and Marc Brysbaert. 2010. “SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles.” Plos One 5 (6).
APA
Cai, Q., & Brysbaert, M. (2010). SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLOS ONE, 5(6).
Vancouver
1.
Cai Q, Brysbaert M. SUBTLEX-CH: Chinese word and character frequencies based on film subtitles. PLOS ONE. 2010;5(6).
MLA
Cai, Qing, and Marc Brysbaert. “SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles.” PLOS ONE 5.6 (2010): n. pag. Print.