Reconstructing human-generated provenance through similarity-based clustering
- Author
- Tom De Nies (UGent) , Erik Mannens (UGent) and Rik Van de Walle (UGent)
- Organization
- Abstract
- In this paper, we revisit our method for reconstructing the primary sources of documents, which make up an important part of their provenance. Our method is based on the assumption that if two documents are semantically similar, there is a high chance that they also share a common source. We previously evaluated this assumption on an excerpt from a news archive, achieving 68.2% precision and 73% recall when reconstructing the primary sources of all articles. However, since we could not release this dataset to the public, it made our results hard to compare to others. In this work, we extend the flexibility of our method by adding a new parameter, and re-evaluate it on the human-generated dataset created for the 2014 Provenance Reconstruction Challenge. The extended method achieves up to 86% precision and 59% recall, and is now directly comparable to any approach that uses the same dataset.
- Keywords
- provenance, reconstruction, similarity, clustering
Downloads
-
2016 - Tom De Nies et al. - IPAW2016 - Reconstructing Human-Generated Provenance Through Similarity-Based Clustering.pdf
- full text
- |
- open access
- |
- |
- 460.86 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-8503548
- MLA
- De Nies, Tom, et al. “Reconstructing Human-Generated Provenance through Similarity-Based Clustering.” Lecture Notes in Computer Science, Springer Nature, 2016, pp. 191–94, doi:10.1007/978-3-319-40593-3_19.
- APA
- De Nies, T., Mannens, E., & Van de Walle, R. (2016). Reconstructing human-generated provenance through similarity-based clustering. Lecture Notes in Computer Science, 191–194. https://doi.org/10.1007/978-3-319-40593-3_19
- Chicago author-date
- De Nies, Tom, Erik Mannens, and Rik Van de Walle. 2016. “Reconstructing Human-Generated Provenance through Similarity-Based Clustering.” In Lecture Notes in Computer Science, 191–94. Springer Nature. https://doi.org/10.1007/978-3-319-40593-3_19.
- Chicago author-date (all authors)
- De Nies, Tom, Erik Mannens, and Rik Van de Walle. 2016. “Reconstructing Human-Generated Provenance through Similarity-Based Clustering.” In Lecture Notes in Computer Science, 191–194. Springer Nature. doi:10.1007/978-3-319-40593-3_19.
- Vancouver
- 1.De Nies T, Mannens E, Van de Walle R. Reconstructing human-generated provenance through similarity-based clustering. In: Lecture Notes in Computer Science. Springer Nature; 2016. p. 191–4.
- IEEE
- [1]T. De Nies, E. Mannens, and R. Van de Walle, “Reconstructing human-generated provenance through similarity-based clustering,” in Lecture Notes in Computer Science, McLean, VA, USA, 2016, pp. 191–194.
@inproceedings{8503548, abstract = {{In this paper, we revisit our method for reconstructing the primary sources of documents, which make up an important part of their provenance. Our method is based on the assumption that if two documents are semantically similar, there is a high chance that they also share a common source. We previously evaluated this assumption on an excerpt from a news archive, achieving 68.2% precision and 73% recall when reconstructing the primary sources of all articles. However, since we could not release this dataset to the public, it made our results hard to compare to others. In this work, we extend the flexibility of our method by adding a new parameter, and re-evaluate it on the human-generated dataset created for the 2014 Provenance Reconstruction Challenge. The extended method achieves up to 86% precision and 59% recall, and is now directly comparable to any approach that uses the same dataset.}}, author = {{De Nies, Tom and Mannens, Erik and Van de Walle, Rik}}, booktitle = {{Lecture Notes in Computer Science}}, isbn = {{978-3-319-40592-6}}, issn = {{0302-9743}}, keywords = {{provenance,reconstruction,similarity,clustering}}, language = {{eng}}, location = {{McLean, VA, USA}}, pages = {{191--194}}, publisher = {{Springer Nature}}, title = {{Reconstructing human-generated provenance through similarity-based clustering}}, url = {{http://doi.org/10.1007/978-3-319-40593-3_19}}, year = {{2016}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: