
RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines
(2024)
SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024).
In CEUR Workshop Proceedings
3830.
- Author
- Arthur Vercruysse (UGent) , Jens Pots (UGent) , Julian Andres Rojas Melendez (UGent) and Pieter Colpaert (UGent)
- Organization
- Abstract
- Data processing pipelines are a crucial component of any data-centric system today. Machine learning, data integration, and knowledge graph publishing are examples where data processing pipelines are needed. Furthermore, most production systems require data pipelines that support continuous operation and streaming-based capabilities for low-latency computations over large volumes of data. However, creation and maintenance of data processing pipelines is challenging and a lot of effort is usually spent on ad-hoc scripting, which limits reusability across systems. Existing solutions are not interoperable out-of-the-box and do not allow for easy integration of different execution environments (e.g., Java, Python, JavaScript, Rust, etc), while maintaining a streaming operation. For example, combining Python, JavaScript and Java-based libraries natively in a single pipeline is not straightforward. An interoperable and declarative mechanism could allow for continuous communication and integrated execution of data processing functions across different execution environments. We introduce RDF-Connect, a declarative framework based on semantic standards that enables instantiating pipelines with data processing functions across execution environments communicating through well-known communication protocols. We describe its architecture and demonstrate its use for an RDF knowledge graph creation, validation and publishing use case. The declarative nature of our approach facilitates reusability and maintainability of data processing pipelines. We currently support JavaScript and JVM-based environments but we aim to extend RDF-Connect support to other rich ecosystems such as Python and to lower-level languages such as Rust, to take advantage of system-level performance gains
Downloads
-
DS812.pdf
- full text (Published version)
- |
- open access
- |
- |
- 841.86 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01J84X94FCJTDFAAH6QZ2DQXAD
- MLA
- Vercruysse, Arthur, et al. “RDF-Connect : A Declarative Framework for Streaming and Cross-Environment Data Processing Pipelines.” SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024), vol. 3830, 2024.
- APA
- Vercruysse, A., Pots, J., Rojas Melendez, J. A., & Colpaert, P. (2024). RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines. SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024), 3830.
- Chicago author-date
- Vercruysse, Arthur, Jens Pots, Julian Andres Rojas Melendez, and Pieter Colpaert. 2024. “RDF-Connect : A Declarative Framework for Streaming and Cross-Environment Data Processing Pipelines.” In SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024). Vol. 3830.
- Chicago author-date (all authors)
- Vercruysse, Arthur, Jens Pots, Julian Andres Rojas Melendez, and Pieter Colpaert. 2024. “RDF-Connect : A Declarative Framework for Streaming and Cross-Environment Data Processing Pipelines.” In SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) Co-Located with 23th International Semantic Web Conference (ISWC 2024). Vol. 3830.
- Vancouver
- 1.Vercruysse A, Pots J, Rojas Melendez JA, Colpaert P. RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines. In: SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024). 2024.
- IEEE
- [1]A. Vercruysse, J. Pots, J. A. Rojas Melendez, and P. Colpaert, “RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines,” in SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024), Baltimore, USA, 2024, vol. 3830.
@inproceedings{01J84X94FCJTDFAAH6QZ2DQXAD, abstract = {{Data processing pipelines are a crucial component of any data-centric system today. Machine learning, data integration, and knowledge graph publishing are examples where data processing pipelines are needed. Furthermore, most production systems require data pipelines that support continuous operation and streaming-based capabilities for low-latency computations over large volumes of data. However, creation and maintenance of data processing pipelines is challenging and a lot of effort is usually spent on ad-hoc scripting, which limits reusability across systems. Existing solutions are not interoperable out-of-the-box and do not allow for easy integration of different execution environments (e.g., Java, Python, JavaScript, Rust, etc), while maintaining a streaming operation. For example, combining Python, JavaScript and Java-based libraries natively in a single pipeline is not straightforward. An interoperable and declarative mechanism could allow for continuous communication and integrated execution of data processing functions across different execution environments. We introduce RDF-Connect, a declarative framework based on semantic standards that enables instantiating pipelines with data processing functions across execution environments communicating through well-known communication protocols. We describe its architecture and demonstrate its use for an RDF knowledge graph creation, validation and publishing use case. The declarative nature of our approach facilitates reusability and maintainability of data processing pipelines. We currently support JavaScript and JVM-based environments but we aim to extend RDF-Connect support to other rich ecosystems such as Python and to lower-level languages such as Rust, to take advantage of system-level performance gains}}, author = {{Vercruysse, Arthur and Pots, Jens and Rojas Melendez, Julian Andres and Colpaert, Pieter}}, booktitle = {{SOFLIM2KG-SEMIIM 2024 : Joint Proceedings SOFLIM2KG and SEMIIM 2024 : Joint Proceedings of the 1st Software Lifecycle Management for Knowledge Graphs Workshop and the 3rd International Workshop on Semantic Industrial Information Modelling (SOFLIM2KG-SEMIIM 2024) co-located with 23th International Semantic Web Conference (ISWC 2024)}}, issn = {{1613-0073}}, language = {{eng}}, location = {{Baltimore, USA}}, pages = {{15}}, title = {{RDF-Connect : a declarative framework for streaming and cross-environment data processing pipelines}}, url = {{https://ceur-ws.org/Vol-3830/}}, volume = {{3830}}, year = {{2024}}, }