
Triple storage for random-access versioned querying of RDF archives
- Author
- Ruben Taelman (UGent) , Miel Vander Sande (UGent) , Joachim Van Herwegen (UGent) , Erik Mannens (UGent) and Ruben Verborgh (UGent)
- Organization
- Abstract
- When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new tradeoff regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. (C) 2018 Elsevier B.V. All rights reserved.
- Keywords
- SEMANTIC WEB, DBPEDIA, Linked data, RDF archiving, Semantic data versioning, Storage, Indexing
Downloads
-
(...).pdf
- full text
- |
- UGent only
- |
- |
- 2.08 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8607531
- MLA
- Taelman, Ruben, et al. “Triple Storage for Random-Access Versioned Querying of RDF Archives.” JOURNAL OF WEB SEMANTICS, vol. 54, Elsevier Science Bv, 2019, pp. 4–28, doi:10.1016/j.websem.2018.08.001.
- APA
- Taelman, R., Vander Sande, M., Van Herwegen, J., Mannens, E., & Verborgh, R. (2019). Triple storage for random-access versioned querying of RDF archives. JOURNAL OF WEB SEMANTICS, 54, 4–28. https://doi.org/10.1016/j.websem.2018.08.001
- Chicago author-date
- Taelman, Ruben, Miel Vander Sande, Joachim Van Herwegen, Erik Mannens, and Ruben Verborgh. 2019. “Triple Storage for Random-Access Versioned Querying of RDF Archives.” JOURNAL OF WEB SEMANTICS 54: 4–28. https://doi.org/10.1016/j.websem.2018.08.001.
- Chicago author-date (all authors)
- Taelman, Ruben, Miel Vander Sande, Joachim Van Herwegen, Erik Mannens, and Ruben Verborgh. 2019. “Triple Storage for Random-Access Versioned Querying of RDF Archives.” JOURNAL OF WEB SEMANTICS 54: 4–28. doi:10.1016/j.websem.2018.08.001.
- Vancouver
- 1.Taelman R, Vander Sande M, Van Herwegen J, Mannens E, Verborgh R. Triple storage for random-access versioned querying of RDF archives. JOURNAL OF WEB SEMANTICS. 2019;54:4–28.
- IEEE
- [1]R. Taelman, M. Vander Sande, J. Van Herwegen, E. Mannens, and R. Verborgh, “Triple storage for random-access versioned querying of RDF archives,” JOURNAL OF WEB SEMANTICS, vol. 54, pp. 4–28, 2019.
@article{8607531, abstract = {{When publishing Linked Open Datasets on the Web, most attention is typically directed to their latest version. Nevertheless, useful information is present in or between previous versions. In order to exploit this historical information in dataset analysis, we can maintain history in RDF archives. Existing approaches either require much storage space, or they expose an insufficiently expressive or efficient interface with respect to querying demands. In this article, we introduce an RDF archive indexing technique that is able to store datasets with a low storage overhead, by compressing consecutive versions and adding metadata for reducing lookup times. We introduce algorithms based on this technique for efficiently evaluating queries at a certain version, between any two versions, and for versions. Using the BEAR RDF archiving benchmark, we evaluate our implementation, called OSTRICH. Results show that OSTRICH introduces a new tradeoff regarding storage space, ingestion time, and querying efficiency. By processing and storing more metadata during ingestion time, it significantly lowers the average lookup time for versioning queries. OSTRICH performs better for many smaller dataset versions than for few larger dataset versions. Furthermore, it enables efficient offsets in query result streams, which facilitates random access in results. Our storage technique reduces query evaluation time for versioned queries through a preprocessing step during ingestion, which only in some cases increases storage space when compared to other approaches. This allows data owners to store and query multiple versions of their dataset efficiently, lowering the barrier to historical dataset publication and analysis. (C) 2018 Elsevier B.V. All rights reserved.}}, author = {{Taelman, Ruben and Vander Sande, Miel and Van Herwegen, Joachim and Mannens, Erik and Verborgh, Ruben}}, issn = {{1570-8268}}, journal = {{JOURNAL OF WEB SEMANTICS}}, keywords = {{SEMANTIC WEB,DBPEDIA,Linked data,RDF archiving,Semantic data versioning,Storage,Indexing}}, language = {{eng}}, pages = {{4--28}}, publisher = {{Elsevier Science Bv}}, title = {{Triple storage for random-access versioned querying of RDF archives}}, url = {{http://doi.org/10.1016/j.websem.2018.08.001}}, volume = {{54}}, year = {{2019}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: