Advanced search
1 file | 696.81 KB Add to list

Parallel RDF generation from heterogeneous big data

Author
Organization
Abstract
To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.
Keywords
RDF generation, big data, linked data, semantic web

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 696.81 KB

Citation

Please use this url to cite or link to this publication:

MLA
Haesendonck, Gerald, et al. “Parallel RDF Generation from Heterogeneous Big Data.” PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019), edited by S. Groppe and L. Gruenwald, ACM Press, 2019, doi:10.1145/3323878.3325802.
APA
Haesendonck, G., Maroy, W., Heyvaert, P., Verborgh, R., & Dimou, A. (2019). Parallel RDF generation from heterogeneous big data. In S. Groppe & L. Gruenwald (Eds.), PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019). https://doi.org/10.1145/3323878.3325802
Chicago author-date
Haesendonck, Gerald, Wouter Maroy, Pieter Heyvaert, Ruben Verborgh, and Anastasia Dimou. 2019. “Parallel RDF Generation from Heterogeneous Big Data.” In PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019), edited by S. Groppe and L. Gruenwald. ACM Press. https://doi.org/10.1145/3323878.3325802.
Chicago author-date (all authors)
Haesendonck, Gerald, Wouter Maroy, Pieter Heyvaert, Ruben Verborgh, and Anastasia Dimou. 2019. “Parallel RDF Generation from Heterogeneous Big Data.” In PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019), ed by. S. Groppe and L. Gruenwald. ACM Press. doi:10.1145/3323878.3325802.
Vancouver
1.
Haesendonck G, Maroy W, Heyvaert P, Verborgh R, Dimou A. Parallel RDF generation from heterogeneous big data. In: Groppe S, Gruenwald L, editors. PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019). ACM Press; 2019.
IEEE
[1]
G. Haesendonck, W. Maroy, P. Heyvaert, R. Verborgh, and A. Dimou, “Parallel RDF generation from heterogeneous big data,” in PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019), Amsterdam, the Netherlands, 2019.
@inproceedings{8619808,
  abstract     = {{To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.}},
  articleno    = {{1}},
  author       = {{Haesendonck, Gerald and Maroy, Wouter and Heyvaert, Pieter and Verborgh, Ruben and Dimou, Anastasia}},
  booktitle    = {{PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON SEMANTIC BIG DATA (SBD 2019)}},
  editor       = {{Groppe, S. and Gruenwald, L.}},
  isbn         = {{9781450367660}},
  keywords     = {{RDF generation,big data,linked data,semantic web}},
  language     = {{eng}},
  location     = {{Amsterdam, the Netherlands}},
  pages        = {{6}},
  publisher    = {{ACM Press}},
  title        = {{Parallel RDF generation from heterogeneous big data}},
  url          = {{http://doi.org/10.1145/3323878.3325802}},
  year         = {{2019}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: