Ghent University Academic Bibliography

Advanced

Towards an improved methodology for automated readability prediction

Philip van Oosten UGent, Dries Tanghe UGent and Veronique Hoste UGent (2010) LREC 2010 : seventh conference on international language resources and evaluation. p.775-782
abstract
Since the first half of the 20th century, readability formulas have been widely employed to automatically predict the readability of an unseen text. In this article, the formulas and the text characteristics they are composed of are evaluated in the context of large Dutch and English corpora. We describe the behaviour of the formulas and the text characteristics by means of correlation matrices and a principal component analysis, and test the methodological validity of the formulas by means of collinearity tests. Both the correlation matrices and the principal component analysis show that the formulas described in this paper strongly correspond, regardless of the language for which they were designed. Furthermore, the collinearity test reveals shortcomings in the methodology that was used to create some of the existing readability formulas. All of this leads us to conclude that a new readability prediction method is needed. We finally make suggestions to come to a cleaner methodology and present web applications that will help us collect data to compile a new gold standard for readability prediction.
Please use this url to cite or link to this publication:
author
organization
year
type
conference
publication status
published
subject
in
LREC 2010 : seventh conference on international language resources and evaluation
editor
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner and Daniel Tapias
pages
775 - 782
publisher
European Language Resources Association (ELRA)
place of publication
Paris, France
conference name
7th Conference on International Language Resources and Evaluation (LREC 2010)
conference location
Valletta, Malta
conference start
2010-05-19
conference end
2010-05-21
ISBN
9782951740860
language
English
UGent publication?
no
classification
C1
copyright statement
I have transferred the copyright for this publication to the publisher
id
1055826
handle
http://hdl.handle.net/1854/LU-1055826
alternative location
http://www.lrec-conf.org/proceedings/lrec2010/pdf/286_Paper.pdf
date created
2010-10-08 13:25:13
date last changed
2017-01-02 09:52:19
@inproceedings{1055826,
  abstract     = {Since the first half of the 20th century, readability formulas have been widely employed to automatically predict the readability of an unseen text. In this article, the formulas and the text characteristics they are composed of are evaluated in the context of large Dutch and English corpora. We describe the behaviour of the formulas and the text characteristics by means of correlation matrices and a principal component analysis, and test the methodological validity of the formulas by means of collinearity tests. Both the correlation matrices and the principal component analysis show that the formulas described in this paper strongly correspond, regardless of the language for which they were designed. Furthermore, the collinearity test reveals shortcomings in the methodology that was used to create some of the existing readability formulas. All of this leads us to conclude that a new readability prediction method is needed. We finally make suggestions to come to a cleaner methodology and present web applications that will help us collect data to compile a new gold standard for readability prediction.},
  author       = {van Oosten, Philip and Tanghe, Dries and Hoste, Veronique},
  booktitle    = {LREC 2010 : seventh conference on international language resources and evaluation},
  editor       = {Calzolari, Nicoletta and Choukri, Khalid and Maegaard, Bente and Mariani, Joseph and Odijk, Jan and Piperidis, Stelios and Rosner, Mike and Tapias, Daniel},
  isbn         = {9782951740860},
  language     = {eng},
  location     = {Valletta, Malta},
  pages        = {775--782},
  publisher    = {European Language Resources Association (ELRA)},
  title        = {Towards an improved methodology for automated readability prediction},
  url          = {http://www.lrec-conf.org/proceedings/lrec2010/pdf/286\_Paper.pdf},
  year         = {2010},
}

Chicago
van Oosten, Philip, Dries Tanghe, and Veronique Hoste. 2010. “Towards an Improved Methodology for Automated Readability Prediction.” In LREC 2010 : Seventh Conference on International Language Resources and Evaluation, ed. Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, 775–782. Paris, France: European Language Resources Association (ELRA).
APA
van Oosten, P., Tanghe, D., & Hoste, V. (2010). Towards an improved methodology for automated readability prediction. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, et al. (Eds.), LREC 2010 : seventh conference on international language resources and evaluation (pp. 775–782). Presented at the 7th Conference on International Language Resources and Evaluation (LREC 2010), Paris, France: European Language Resources Association (ELRA).
Vancouver
1.
van Oosten P, Tanghe D, Hoste V. Towards an improved methodology for automated readability prediction. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, et al., editors. LREC 2010 : seventh conference on international language resources and evaluation. Paris, France: European Language Resources Association (ELRA); 2010. p. 775–82.
MLA
van Oosten, Philip, Dries Tanghe, and Veronique Hoste. “Towards an Improved Methodology for Automated Readability Prediction.” LREC 2010 : Seventh Conference on International Language Resources and Evaluation. Ed. Nicoletta Calzolari et al. Paris, France: European Language Resources Association (ELRA), 2010. 775–782. Print.