Advanced search
1 file | 2.08 MB

Triple storage for random-access versioned querying of RDF archives

Ruben Taelman (UGent) , Miel Vander Sande (UGent) , Joachim Van Herwegen (UGent) , Erik Mannens (UGent) and Ruben Verborgh (UGent)
Author
Organization
Abstract
When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new tradeoff regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. (C) 2018 Elsevier B.V. All rights reserved.
Keywords
SEMANTIC WEB, DBPEDIA, Linked data, RDF archiving, Semantic data versioning, Storage, Indexing

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 2.08 MB

Citation

Please use this url to cite or link to this publication:

Chicago
Taelman, Ruben, Miel Vander Sande, Joachim Van Herwegen, Erik Mannens, and Ruben Verborgh. 2019. “Triple Storage for Random-access Versioned Querying of RDF Archives.” Journal of Web Semantics 54: 4–28.
APA
Taelman, R., Vander Sande, M., Van Herwegen, J., Mannens, E., & Verborgh, R. (2019). Triple storage for random-access versioned querying of RDF archives. JOURNAL OF WEB SEMANTICS, 54, 4–28.
Vancouver
1.
Taelman R, Vander Sande M, Van Herwegen J, Mannens E, Verborgh R. Triple storage for random-access versioned querying of RDF archives. JOURNAL OF WEB SEMANTICS. Amsterdam: Elsevier Science Bv; 2019;54:4–28.
MLA
Taelman, Ruben et al. “Triple Storage for Random-access Versioned Querying of RDF Archives.” JOURNAL OF WEB SEMANTICS 54 (2019): 4–28. Print.
@article{8607531,
  abstract     = {When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new tradeoff regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. (C) 2018 Elsevier B.V. All rights reserved.},
  author       = {Taelman, Ruben and Vander Sande, Miel and Van Herwegen, Joachim and Mannens, Erik and Verborgh, Ruben},
  issn         = {1570-8268},
  journal      = {JOURNAL OF WEB SEMANTICS},
  language     = {eng},
  pages        = {4--28},
  publisher    = {Elsevier Science Bv},
  title        = {Triple storage for random-access versioned querying of RDF archives},
  url          = {http://dx.doi.org/10.1016/j.websem.2018.08.001},
  volume       = {54},
  year         = {2019},
}

Altmetric
View in Altmetric
Web of Science
Times cited: