Advanced search
1 file | 841.86 KB Add to list
Author
Organization
Abstract
Data processing pipelines are a crucial component of any data-centric system today. Machine learning, data integration, and knowledge graph publishing are examples where data processing pipelines are needed. Furthermore, most production systems require data pipelines that support continuous operation and streaming-based capabilities for low-latency computations over large volumes of data. However, creation and maintenance of data processing pipelines is challenging and a lot of effort is usually spent on ad-hoc scripting, which limits reusability across systems. Existing solutions are not interoperable out-of-the-box and do not allow for easy integration of different execution environments (e.g., Java, Python, JavaScript, Rust, etc), while maintaining a streaming operation. For example, combining Python, JavaScript and Java-based libraries natively in a single pipeline is not straightforward. An interoperable and declarative mechanism could allow for continuous communication and integrated execution of data processing functions across different execution environments. We introduce RDF-Connect, a declarative framework based on semantic standards that enables instantiating pipelines with data processing functions across execution environments communicating through well-known communication protocols. We describe its architecture and demonstrate its use for an RDF knowledge graph creation, validation and publishing use case. The declarative nature of our approach facilitates reusability and maintainability of data processing pipelines. We currently support JavaScript and JVM-based environments but we aim to extend RDF-Connect support to other rich ecosystems such as Python and to lower-level languages such as Rust, to take advantage of system-level performance gains

Downloads

  • DS812.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 841.86 KB

Citation

Please use this url to cite or link to this publication:

MLA
Vercruysse, Arthur, et al. “RDF-Connect : A Declarative Framework for Streaming and  Cross-Environment Data Processing Pipelines.” SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024), vol. 3830, 2024.
APA
Vercruysse, A., Pots, J., Rojas Melendez, J. A., & Colpaert, P. (2024). RDF-Connect : a declarative framework for streaming and  cross-environment data processing pipelines. SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024), 3830.
Chicago author-date
Vercruysse, Arthur, Jens Pots, Julian Andres Rojas Melendez, and Pieter Colpaert. 2024. “RDF-Connect : A Declarative Framework for Streaming and  Cross-Environment Data Processing Pipelines.” In SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024). Vol. 3830.
Chicago author-date (all authors)
Vercruysse, Arthur, Jens Pots, Julian Andres Rojas Melendez, and Pieter Colpaert. 2024. “RDF-Connect : A Declarative Framework for Streaming and  Cross-Environment Data Processing Pipelines.” In SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024). Vol. 3830.
Vancouver
1.
Vercruysse A, Pots J, Rojas Melendez JA, Colpaert P. RDF-Connect : a declarative framework for streaming and  cross-environment data processing pipelines. In: SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024). 2024.
IEEE
[1]
A. Vercruysse, J. Pots, J. A. Rojas Melendez, and P. Colpaert, “RDF-Connect : a declarative framework for streaming and  cross-environment data processing pipelines,” in SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024), Baltimore, USA, 2024, vol. 3830.
@inproceedings{01J84X94FCJTDFAAH6QZ2DQXAD,
  abstract     = {{Data processing pipelines are a crucial component of any data-centric system today. Machine learning, data
integration, and knowledge graph publishing are examples where data processing pipelines are needed. Furthermore, most production systems require data pipelines that support continuous operation and streaming-based
capabilities for low-latency computations over large volumes of data. However, creation and maintenance of data
processing pipelines is challenging and a lot of effort is usually spent on ad-hoc scripting, which limits reusability
across systems. Existing solutions are not interoperable out-of-the-box and do not allow for easy integration
of different execution environments (e.g., Java, Python, JavaScript, Rust, etc), while maintaining a streaming
operation. For example, combining Python, JavaScript and Java-based libraries natively in a single pipeline is
not straightforward. An interoperable and declarative mechanism could allow for continuous communication
and integrated execution of data processing functions across different execution environments. We introduce
RDF-Connect, a declarative framework based on semantic standards that enables instantiating pipelines with
data processing functions across execution environments communicating through well-known communication
protocols. We describe its architecture and demonstrate its use for an RDF knowledge graph creation, validation
and publishing use case. The declarative nature of our approach facilitates reusability and maintainability of
data processing pipelines. We currently support JavaScript and JVM-based environments but we aim to extend
RDF-Connect support to other rich ecosystems such as Python and to lower-level languages such as Rust, to take
advantage of system-level performance gains}},
  author       = {{Vercruysse, Arthur and Pots, Jens and Rojas Melendez, Julian Andres and Colpaert, Pieter}},
  booktitle    = {{SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024)}},
  issn         = {{1613-0073}},
  language     = {{eng}},
  location     = {{Baltimore, USA}},
  pages        = {{15}},
  title        = {{RDF-Connect : a declarative framework for streaming and  cross-environment data processing pipelines}},
  url          = {{https://ceur-ws.org/Vol-3830/}},
  volume       = {{3830}},
  year         = {{2024}},
}