Advanced search
1 file | 460.86 KB Add to list

Reconstructing human-generated provenance through similarity-based clustering

Tom De Nies (UGent) , Erik Mannens (UGent) and Rik Van de Walle (UGent)
Author
Organization
Abstract
In this paper, we revisit our method for reconstructing the primary sources of documents, which make up an important part of their provenance. Our method is based on the assumption that if two documents are semantically similar, there is a high chance that they also share a common source. We previously evaluated this assumption on an excerpt from a news archive, achieving 68.2% precision and 73% recall when reconstructing the primary sources of all articles. However, since we could not release this dataset to the public, it made our results hard to compare to others. In this work, we extend the flexibility of our method by adding a new parameter, and re-evaluate it on the human-generated dataset created for the 2014 Provenance Reconstruction Challenge. The extended method achieves up to 86% precision and 59% recall, and is now directly comparable to any approach that uses the same dataset.
Keywords
provenance, reconstruction, similarity, clustering

Downloads

  • 2016 - Tom De Nies et al. - IPAW2016 - Reconstructing Human-Generated Provenance Through Similarity-Based Clustering.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 460.86 KB

Citation

Please use this url to cite or link to this publication:

MLA
De Nies, Tom, et al. “Reconstructing Human-Generated Provenance through Similarity-Based Clustering.” Lecture Notes in Computer Science, Springer Nature, 2016, pp. 191–94, doi:10.1007/978-3-319-40593-3_19.
APA
De Nies, T., Mannens, E., & Van de Walle, R. (2016). Reconstructing human-generated provenance through similarity-based clustering. Lecture Notes in Computer Science, 191–194. https://doi.org/10.1007/978-3-319-40593-3_19
Chicago author-date
De Nies, Tom, Erik Mannens, and Rik Van de Walle. 2016. “Reconstructing Human-Generated Provenance through Similarity-Based Clustering.” In Lecture Notes in Computer Science, 191–94. Springer Nature. https://doi.org/10.1007/978-3-319-40593-3_19.
Chicago author-date (all authors)
De Nies, Tom, Erik Mannens, and Rik Van de Walle. 2016. “Reconstructing Human-Generated Provenance through Similarity-Based Clustering.” In Lecture Notes in Computer Science, 191–194. Springer Nature. doi:10.1007/978-3-319-40593-3_19.
Vancouver
1.
De Nies T, Mannens E, Van de Walle R. Reconstructing human-generated provenance through similarity-based clustering. In: Lecture Notes in Computer Science. Springer Nature; 2016. p. 191–4.
IEEE
[1]
T. De Nies, E. Mannens, and R. Van de Walle, “Reconstructing human-generated provenance through similarity-based clustering,” in Lecture Notes in Computer Science, McLean, VA, USA, 2016, pp. 191–194.
@inproceedings{8503548,
  abstract     = {{In this paper, we revisit our method for reconstructing the primary sources of documents, which make up an important part of their provenance. Our method is based on the assumption that if two documents are semantically similar, there is a high chance that they also share a common source. We previously evaluated this assumption on an excerpt from a news archive, achieving 68.2% precision and 73% recall when reconstructing the primary sources of all articles. However, since we could not release this dataset to the public, it made our results hard to compare to others. In this work, we extend the flexibility of our method by adding a new parameter, and re-evaluate it on the human-generated dataset created for the 2014 Provenance Reconstruction Challenge. The extended method achieves up to 86% precision and 59% recall, and is now directly comparable to any approach that uses the same dataset.}},
  author       = {{De Nies, Tom and Mannens, Erik and Van de Walle, Rik}},
  booktitle    = {{Lecture Notes in Computer Science}},
  isbn         = {{978-3-319-40592-6}},
  issn         = {{0302-9743}},
  keywords     = {{provenance,reconstruction,similarity,clustering}},
  language     = {{eng}},
  location     = {{McLean, VA, USA}},
  pages        = {{191--194}},
  publisher    = {{Springer Nature}},
  title        = {{Reconstructing human-generated provenance through similarity-based clustering}},
  url          = {{http://doi.org/10.1007/978-3-319-40593-3_19}},
  year         = {{2016}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: