Advanced search
1 file | 1.17 MB Add to list

La construcción del Corpus Oral y Sonoro del Español Rural - Anotado y Parseado (COSER-AP) : avances en el etiquetado de partes del discurso

Author
Organization
Project
Abstract
This article presents the advances made for the construction of the ‘Annotated and Parsed Audible Corpus of Spoken Rural Spanish’. The methodology for building a treebank to evaluate the accuracy of state-of-the-art part of speech taggers: spaCy, Stanza and UDPipe is presented. It is shown that, when oral data is tagged the accuracy is 0.90-0.93; none of the regional varieties presents a significant difference in accuracy over the others. Regarding the grammatical categories, interjections, proper nouns, adjectives, and auxiliaries have the lowest F value. Finally, some examples of the polyfunctionality of some grammatical categories and the fuzzy boundaries between them are discussed, like the passive participle, and the ambiguity between adverbs and subordinate conjunctions that might affect accuracy.
Keywords
Corpus, POS tagging, spoken Spanish, COSER, polyfunctionality

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.17 MB

Citation

Please use this url to cite or link to this publication:

MLA
Bonilla, Johnatan E., et al. “La Construcción Del Corpus Oral y Sonoro Del Español Rural - Anotado y Parseado (COSER-AP) : Avances En El Etiquetado de Partes Del Discurso.” REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA, vol. 20, no. 2(40), 2022, pp. 77–96.
APA
Bonilla, J. E., Bouzouita, M., & Segundo Diaz, R. L. (2022). La construcción del Corpus Oral y Sonoro del Español Rural - Anotado y Parseado (COSER-AP) : avances en el etiquetado de partes del discurso. REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA, 20(2(40)), 77–96.
Chicago author-date
Bonilla, Johnatan E., Miriam Bouzouita, and Rosa Lilia Segundo Diaz. 2022. “La Construcción Del Corpus Oral y Sonoro Del Español Rural - Anotado y Parseado (COSER-AP) : Avances En El Etiquetado de Partes Del Discurso.” REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA 20 (2(40)): 77–96.
Chicago author-date (all authors)
Bonilla, Johnatan E., Miriam Bouzouita, and Rosa Lilia Segundo Diaz. 2022. “La Construcción Del Corpus Oral y Sonoro Del Español Rural - Anotado y Parseado (COSER-AP) : Avances En El Etiquetado de Partes Del Discurso.” REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA 20 (2(40)): 77–96.
Vancouver
1.
Bonilla JE, Bouzouita M, Segundo Diaz RL. La construcción del Corpus Oral y Sonoro del Español Rural - Anotado y Parseado (COSER-AP) : avances en el etiquetado de partes del discurso. REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA. 2022;20(2(40)):77–96.
IEEE
[1]
J. E. Bonilla, M. Bouzouita, and R. L. Segundo Diaz, “La construcción del Corpus Oral y Sonoro del Español Rural - Anotado y Parseado (COSER-AP) : avances en el etiquetado de partes del discurso,” REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA, vol. 20, no. 2(40), pp. 77–96, 2022.
@article{01GK9JMJF23PMN5HXG8SZYD1C4,
  abstract     = {{This article presents the advances made for the construction of the ‘Annotated and Parsed Audible Corpus of Spoken Rural Spanish’. The methodology for building a treebank to evaluate the accuracy of state-of-the-art part of speech taggers: spaCy, Stanza and UDPipe is presented. It is shown that, when oral data is tagged the accuracy is 0.90-0.93; none of the regional varieties presents a significant difference in accuracy over the others. Regarding the grammatical categories, interjections, proper nouns, adjectives, and auxiliaries have the lowest F value. Finally, some examples of the polyfunctionality of some grammatical categories and the fuzzy boundaries between them are discussed, like the passive participle, and the ambiguity between adverbs and subordinate conjunctions that might affect accuracy.}},
  author       = {{Bonilla, Johnatan E. and Bouzouita, Miriam and Segundo Diaz, Rosa Lilia}},
  issn         = {{1579-9425}},
  journal      = {{REVISTA INTERNACIONAL DE LINGÜÍSTICA IBEROAMERICANA}},
  keywords     = {{Corpus,POS tagging,spoken Spanish,COSER,polyfunctionality}},
  language     = {{spa}},
  number       = {{2(40)}},
  pages        = {{77--96}},
  title        = {{La construcción del Corpus Oral y Sonoro del Español Rural - Anotado y Parseado (COSER-AP) : avances en el etiquetado de partes del discurso}},
  volume       = {{20}},
  year         = {{2022}},
}