Advanced search
1 file | 360.70 KB
Author
Organization
Abstract
The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as authors, keywords, and journal. Our work explores the possibility of adapting language modeling techniques to this end. The basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by finding topics and additionally interpolating with the resulting topic models. These topics are found using an adaptation of Latent Dirichlet Allocation (LDA), in which the keywords that were provided by the authors are used to guide the process.
Keywords
Research paper similarity, Language models, Latent Dirichlet Allocation, Document similarity

Downloads

  • article iswc spim2011.pdf
    • full text
    • |
    • open access
    • |
    • PDF
    • |
    • 360.70 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Hurtado Martín, Germán, Steven Schockaert, Chris Cornelis, and Helga Naessens. 2011. “Finding Similar Research Papers Using Language Models.” In 2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation, Proceedings, 106–113. Ghent, Belgium: University College Ghent.
APA
Hurtado Martín, G., Schockaert, S., Cornelis, C., & Naessens, H. (2011). Finding similar research papers using language models. 2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings (pp. 106–113). Presented at the 2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation (SPIM -2011), Ghent, Belgium: University College Ghent.
Vancouver
1.
Hurtado Martín G, Schockaert S, Cornelis C, Naessens H. Finding similar research papers using language models. 2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings. Ghent, Belgium: University College Ghent; 2011. p. 106–13.
MLA
Hurtado Martín, Germán, Steven Schockaert, Chris Cornelis, et al. “Finding Similar Research Papers Using Language Models.” 2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation, Proceedings. Ghent, Belgium: University College Ghent, 2011. 106–113. Print.
@inproceedings{1986265,
  abstract     = {The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as authors, keywords, and journal. Our work explores the possibility of adapting language modeling techniques to this end. The basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by finding topics and additionally interpolating with the resulting topic models. These topics are found using an adaptation of Latent Dirichlet Allocation (LDA), in which the keywords that were provided by the authors are used to guide the process.},
  author       = {Hurtado Mart{\'i}n, Germ{\'a}n and Schockaert, Steven and Cornelis, Chris and Naessens, Helga},
  booktitle    = {2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings},
  keyword      = {Research paper similarity,Language models,Latent Dirichlet Allocation,Document similarity},
  language     = {eng},
  location     = {Bonn, Germany},
  pages        = {106--113},
  publisher    = {University College Ghent},
  title        = {Finding similar research papers using language models},
  year         = {2011},
}