A novel approach to assess and improve syntactic interoperability in data integration
- Author
- Rihem Nasfi (UGent) , Antoon Bronselaer (UGent) and Guy De Tré (UGent)
- Organization
- Abstract
- Data integration is essential to enrich a database with external information. One effective approach is to match shared identifiers across diverse databases. However, a lack of syntactic interoperability, which refers to the ability to match data based on their syntax, can pose challenges. In this paper, we present a novel method to evaluate and enhance syntactic interop-erability, considering associated costs. First, we introduce the linking index and completeness index as generic measures of fine-grained syntactic interoperability. Second, we analyze the data consistency level of the identifiers using a rule-based framework for data quality assessment. Third, we propose a data integration strategy that strikes a balance between fixing data inconsistencies and the resulting benefits, as measured by the linking and completeness indices. The approach is illustrated through two use cases: bibliographic databases and clinical trial registries. The results demonstrate that standardizing identifiers' representations can signifi-cantly improve syntactic interoperability in certain scenarios while in others, the standardization process does not yield improvements, discouraging, hence integration decisions. By conducting a cost-benefit analysis of improving data interoperability, this analysis enables data integrators to make informed decisions regarding the feasibility and advantages of proceeding with data integration.
- Keywords
- Library and Information Sciences, Management Science and Operations Research, Computer Science Applications, Media Technology, Information Systems, Relational databases, Interoperability, Data quality
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 2.90 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01HF6JBSHA3244PFGSK4XH3X33
- MLA
- Nasfi, Rihem, et al. “A Novel Approach to Assess and Improve Syntactic Interoperability in Data Integration.” INFORMATION PROCESSING & MANAGEMENT, vol. 60, no. 6, 2023, doi:10.1016/j.ipm.2023.103522.
- APA
- Nasfi, R., Bronselaer, A., & De Tré, G. (2023). A novel approach to assess and improve syntactic interoperability in data integration. INFORMATION PROCESSING & MANAGEMENT, 60(6). https://doi.org/10.1016/j.ipm.2023.103522
- Chicago author-date
- Nasfi, Rihem, Antoon Bronselaer, and Guy De Tré. 2023. “A Novel Approach to Assess and Improve Syntactic Interoperability in Data Integration.” INFORMATION PROCESSING & MANAGEMENT 60 (6). https://doi.org/10.1016/j.ipm.2023.103522.
- Chicago author-date (all authors)
- Nasfi, Rihem, Antoon Bronselaer, and Guy De Tré. 2023. “A Novel Approach to Assess and Improve Syntactic Interoperability in Data Integration.” INFORMATION PROCESSING & MANAGEMENT 60 (6). doi:10.1016/j.ipm.2023.103522.
- Vancouver
- 1.Nasfi R, Bronselaer A, De Tré G. A novel approach to assess and improve syntactic interoperability in data integration. INFORMATION PROCESSING & MANAGEMENT. 2023;60(6).
- IEEE
- [1]R. Nasfi, A. Bronselaer, and G. De Tré, “A novel approach to assess and improve syntactic interoperability in data integration,” INFORMATION PROCESSING & MANAGEMENT, vol. 60, no. 6, 2023.
@article{01HF6JBSHA3244PFGSK4XH3X33, abstract = {{Data integration is essential to enrich a database with external information. One effective approach is to match shared identifiers across diverse databases. However, a lack of syntactic interoperability, which refers to the ability to match data based on their syntax, can pose challenges. In this paper, we present a novel method to evaluate and enhance syntactic interop-erability, considering associated costs. First, we introduce the linking index and completeness index as generic measures of fine-grained syntactic interoperability. Second, we analyze the data consistency level of the identifiers using a rule-based framework for data quality assessment. Third, we propose a data integration strategy that strikes a balance between fixing data inconsistencies and the resulting benefits, as measured by the linking and completeness indices. The approach is illustrated through two use cases: bibliographic databases and clinical trial registries. The results demonstrate that standardizing identifiers' representations can signifi-cantly improve syntactic interoperability in certain scenarios while in others, the standardization process does not yield improvements, discouraging, hence integration decisions. By conducting a cost-benefit analysis of improving data interoperability, this analysis enables data integrators to make informed decisions regarding the feasibility and advantages of proceeding with data integration.}}, articleno = {{103522}}, author = {{Nasfi, Rihem and Bronselaer, Antoon and De Tré, Guy}}, issn = {{0306-4573}}, journal = {{INFORMATION PROCESSING & MANAGEMENT}}, keywords = {{Library and Information Sciences,Management Science and Operations Research,Computer Science Applications,Media Technology,Information Systems,Relational databases,Interoperability,Data quality}}, language = {{eng}}, number = {{6}}, pages = {{23}}, title = {{A novel approach to assess and improve syntactic interoperability in data integration}}, url = {{http://doi.org/10.1016/j.ipm.2023.103522}}, volume = {{60}}, year = {{2023}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: