Ghent University Academic Bibliography

Advanced

Automatic discovery of high-level provenance using semantic similarity

Tom De Nies, Sam Coppens, Davy Van Deursen UGent, Erik Mannens UGent and Rik Van de Walle UGent (2012) LECTURE NOTES IN COMPUTER SCIENCE. 7525. p.97-110
abstract
As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work.
Please use this url to cite or link to this publication:
author
organization
year
type
conference (proceedingsPaper)
publication status
published
subject
keyword
Similarity, Semantic Web, Linked Data, Provenance, News, Data Model
in
LECTURE NOTES IN COMPUTER SCIENCE
editor
P Groth and J Frew
volume
7525
issue title
Provenance and Annotation of Data and Processes
pages
97 - 110
publisher
Springer
conference name
4th International Provenance and Annotation Workshop (IPAW - 2012)
conference location
Santa Barbara, California
conference start
2012-06-19
conference end
2012-06-21
Web of Science type
Proceedings Paper
Web of Science id
000345094300008
ISSN
0302-9743
ISBN
9783642342226
DOI
10.1007/978-3-642-34222-6_8
language
English
UGent publication?
yes
classification
P1
copyright statement
I have transferred the copyright for this publication to the publisher
id
3232929
handle
http://hdl.handle.net/1854/LU-3232929
date created
2013-06-03 11:31:51
date last changed
2016-12-19 15:36:32
@inproceedings{3232929,
  abstract     = {As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73\% of the original sources of 410 news stories, at 68\% precision. Lastly, we discuss possible improvements and future work.},
  author       = {De Nies, Tom and Coppens, Sam and Van Deursen, Davy and Mannens, Erik and Van de Walle, Rik},
  booktitle    = {LECTURE NOTES IN COMPUTER SCIENCE},
  editor       = {Groth, P and Frew, J},
  isbn         = {9783642342226},
  issn         = {0302-9743},
  keyword      = {Similarity,Semantic Web,Linked Data,Provenance,News,Data Model},
  language     = {eng},
  location     = {Santa Barbara, California},
  pages        = {97--110},
  publisher    = {Springer},
  title        = {Automatic discovery of high-level provenance using semantic similarity},
  url          = {http://dx.doi.org/10.1007/978-3-642-34222-6\_8},
  volume       = {7525},
  year         = {2012},
}

Chicago
De Nies, Tom, Sam Coppens, Davy Van Deursen, Erik Mannens, and Rik Van de Walle. 2012. “Automatic Discovery of High-level Provenance Using Semantic Similarity.” In Lecture Notes in Computer Science, ed. P Groth and J Frew, 7525:97–110. Springer.
APA
De Nies, T., Coppens, S., Van Deursen, D., Mannens, E., & Van de Walle, R. (2012). Automatic discovery of high-level provenance using semantic similarity. In P. Groth & J. Frew (Eds.), LECTURE NOTES IN COMPUTER SCIENCE (Vol. 7525, pp. 97–110). Presented at the 4th International Provenance and Annotation Workshop (IPAW - 2012), Springer.
Vancouver
1.
De Nies T, Coppens S, Van Deursen D, Mannens E, Van de Walle R. Automatic discovery of high-level provenance using semantic similarity. In: Groth P, Frew J, editors. LECTURE NOTES IN COMPUTER SCIENCE. Springer; 2012. p. 97–110.
MLA
De Nies, Tom, Sam Coppens, Davy Van Deursen, et al. “Automatic Discovery of High-level Provenance Using Semantic Similarity.” Lecture Notes in Computer Science. Ed. P Groth & J Frew. Vol. 7525. Springer, 2012. 97–110. Print.