Advanced search
1 file | 627.54 KB Add to list

Querying large treebanks : benchmarking GrETEL indexing

Author
Organization
Abstract
The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the search space as small as possible before actually starting the treebank search, by pre-processing the treebank at hand. We recursively divide the treebank into smaller parts, called subtree-banks, which are then converted into database files. All subtree-banks are organized according to their linguistic dependency pattern, and labeled as such. Additionally, general patterns are linked to more specific ones. By doing so, we create millions of databases, and given a linguistic structure we know in which databases that structure can occur, leading up to a significant efficiency boost. We present the results of a benchmark experiment, testing the effect of the GrInding procedure on the SoNaR-500 treebank.
Keywords
treebank querying, computational linguistics, treebanks, corpora, lt3

Downloads

  • 10.querying-large-treebanks[1].pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 627.54 KB

Citation

Please use this url to cite or link to this publication:

MLA
Vanroy, Bram, et al. “Querying Large Treebanks : Benchmarking GrETEL Indexing.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 7, 2017, pp. 145–66.
APA
Vanroy, B., Vandeghinste, V., & Augustinus, L. (2017). Querying large treebanks : benchmarking GrETEL indexing. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, 7, 145–166.
Chicago author-date
Vanroy, Bram, Vincent Vandeghinste, and Liesbeth Augustinus. 2017. “Querying Large Treebanks : Benchmarking GrETEL Indexing.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 7: 145–66.
Chicago author-date (all authors)
Vanroy, Bram, Vincent Vandeghinste, and Liesbeth Augustinus. 2017. “Querying Large Treebanks : Benchmarking GrETEL Indexing.” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL 7: 145–166.
Vancouver
1.
Vanroy B, Vandeghinste V, Augustinus L. Querying large treebanks : benchmarking GrETEL indexing. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL. 2017;7:145–66.
IEEE
[1]
B. Vanroy, V. Vandeghinste, and L. Augustinus, “Querying large treebanks : benchmarking GrETEL indexing,” COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL, vol. 7, pp. 145–166, 2017.
@article{8534144,
  abstract     = {{The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the search space as small as possible before actually starting the treebank search, by pre-processing the treebank at hand. We recursively divide the treebank into smaller parts, called subtree-banks, which are then converted into database files. All subtree-banks are organized according to their linguistic dependency pattern, and labeled as such. Additionally, general patterns are linked to more specific ones. By doing so, we create millions of databases, and given a linguistic structure we know in which databases that structure can occur, leading up to a significant efficiency boost. We present the results of a benchmark experiment, testing the effect of the GrInding procedure on the SoNaR-500 treebank.}},
  author       = {{Vanroy, Bram and Vandeghinste, Vincent and Augustinus, Liesbeth}},
  issn         = {{2211-4009}},
  journal      = {{COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS JOURNAL}},
  keywords     = {{treebank querying,computational linguistics,treebanks,corpora,lt3}},
  language     = {{eng}},
  location     = {{Leuven, Belgium}},
  pages        = {{145--166}},
  title        = {{Querying large treebanks : benchmarking GrETEL indexing}},
  url          = {{https://www.clinjournal.org/clinj/article/view/75}},
  volume       = {{7}},
  year         = {{2017}},
}