Ghent University Academic Bibliography

Advanced

Finding similar research papers using language models

Germán Hurtado Martín, Steven Schockaert UGent, Chris Cornelis UGent and Helga Naessens UGent (2011) 2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings. p.106-113
abstract
The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as authors, keywords, and journal. Our work explores the possibility of adapting language modeling techniques to this end. The basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by finding topics and additionally interpolating with the resulting topic models. These topics are found using an adaptation of Latent Dirichlet Allocation (LDA), in which the keywords that were provided by the authors are used to guide the process.
Please use this url to cite or link to this publication:
author
organization
year
type
conference
publication status
published
subject
keyword
Research paper similarity, Language models, Latent Dirichlet Allocation, Document similarity
in
2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings
pages
106 - 113
publisher
University College Ghent
place of publication
Ghent, Belgium
conference name
2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation (SPIM -2011)
conference location
Bonn, Germany
conference start
2011-10-23
conference end
2011-10-24
language
English
UGent publication?
yes
classification
C1
copyright statement
I have transferred the copyright for this publication to the publisher
id
1986265
handle
http://hdl.handle.net/1854/LU-1986265
date created
2012-01-13 15:12:01
date last changed
2017-01-02 09:53:20
@inproceedings{1986265,
  abstract     = {The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as authors, keywords, and journal. Our work explores the possibility of adapting language modeling techniques to this end. The basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by finding topics and additionally interpolating with the resulting topic models. These topics are found using an adaptation of Latent Dirichlet Allocation (LDA), in which the keywords that were provided by the authors are used to guide the process.},
  author       = {Hurtado Mart{\'i}n, Germ{\'a}n and Schockaert, Steven and Cornelis, Chris and Naessens, Helga},
  booktitle    = {2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings},
  keyword      = {Research paper similarity,Language models,Latent Dirichlet Allocation,Document similarity},
  language     = {eng},
  location     = {Bonn, Germany},
  pages        = {106--113},
  publisher    = {University College Ghent},
  title        = {Finding similar research papers using language models},
  year         = {2011},
}

Chicago
Hurtado Martín, Germán, Steven Schockaert, Chris Cornelis, and Helga Naessens. 2011. “Finding Similar Research Papers Using Language Models.” In 2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation, Proceedings, 106–113. Ghent, Belgium: University College Ghent.
APA
Hurtado Martín, G., Schockaert, S., Cornelis, C., & Naessens, H. (2011). Finding similar research papers using language models. 2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings (pp. 106–113). Presented at the 2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation (SPIM -2011), Ghent, Belgium: University College Ghent.
Vancouver
1.
Hurtado Martín G, Schockaert S, Cornelis C, Naessens H. Finding similar research papers using language models. 2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings. Ghent, Belgium: University College Ghent; 2011. p. 106–13.
MLA
Hurtado Martín, Germán, Steven Schockaert, Chris Cornelis, et al. “Finding Similar Research Papers Using Language Models.” 2nd Workshop on Semantic Personalized Information Management : Retrieval and Recommendation, Proceedings. Ghent, Belgium: University College Ghent, 2011. 106–113. Print.