Advanced search
2 files | 959.53 KB Add to list

Classifying token frequencies using angular Minkowski p-distance

Oliver Urs Lenz (UGent) and Chris Cornelis (UGent)
Author
Organization
Abstract
Angular Minkowski p-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski p-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski p-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate classification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter p, the dimensionality m of the dataset, the number of neighbours k, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski p-distance with suitable values for p than with classical cosine dissimilarity.
Keywords
cosine dissimilarity, fuzzy rough sets, minkowski distance, nearest neighbours

Downloads

  • lenz-2023-classifying-accepted.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 395.31 KB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 564.22 KB

Citation

Please use this url to cite or link to this publication:

MLA
Lenz, Oliver Urs, and Chris Cornelis. “Classifying Token Frequencies Using Angular Minkowski P-Distance.” ROUGH SETS, IJCRS 2023, edited by Andrea Campagner et al., vol. 14481, Springer, 2023, pp. 402–13, doi:10.1007/978-3-031-50959-9_28.
APA
Lenz, O. U., & Cornelis, C. (2023). Classifying token frequencies using angular Minkowski p-distance. In A. Campagner, O. U. Lenz, S. Xia, D. Ślęzak, J. Wąs, & J. Yao (Eds.), ROUGH SETS, IJCRS 2023 (Vol. 14481, pp. 402–413). https://doi.org/10.1007/978-3-031-50959-9_28
Chicago author-date
Lenz, Oliver Urs, and Chris Cornelis. 2023. “Classifying Token Frequencies Using Angular Minkowski P-Distance.” In ROUGH SETS, IJCRS 2023, edited by Andrea Campagner, Oliver Urs Lenz, Shuyin Xia, Dominik Ślęzak, Jarosław Wąs, and JingTao Yao, 14481:402–13. Cham, Switzerland: Springer. https://doi.org/10.1007/978-3-031-50959-9_28.
Chicago author-date (all authors)
Lenz, Oliver Urs, and Chris Cornelis. 2023. “Classifying Token Frequencies Using Angular Minkowski P-Distance.” In ROUGH SETS, IJCRS 2023, ed by. Andrea Campagner, Oliver Urs Lenz, Shuyin Xia, Dominik Ślęzak, Jarosław Wąs, and JingTao Yao, 14481:402–413. Cham, Switzerland: Springer. doi:10.1007/978-3-031-50959-9_28.
Vancouver
1.
Lenz OU, Cornelis C. Classifying token frequencies using angular Minkowski p-distance. In: Campagner A, Lenz OU, Xia S, Ślęzak D, Wąs J, Yao J, editors. ROUGH SETS, IJCRS 2023. Cham, Switzerland: Springer; 2023. p. 402–13.
IEEE
[1]
O. U. Lenz and C. Cornelis, “Classifying token frequencies using angular Minkowski p-distance,” in ROUGH SETS, IJCRS 2023, Krakow, Poland, 2023, vol. 14481, pp. 402–413.
@inproceedings{01HNFJ05PN4BD4W1J6CFYAJQPE,
  abstract     = {{Angular Minkowski p-distance is a dissimilarity measure that is obtained by replacing Euclidean distance in the definition of cosine dissimilarity with other Minkowski p-distances. Cosine dissimilarity is frequently used with datasets containing token frequencies, and angular Minkowski p-distance may potentially be an even better choice for certain tasks. In a case study based on the 20-newsgroups dataset, we evaluate classification performance for classical weighted nearest neighbours, as well as fuzzy rough nearest neighbours. In addition, we analyse the relationship between the hyperparameter p, the dimensionality m of the dataset, the number of neighbours k, the choice of weights and the choice of classifier. We conclude that it is possible to obtain substantially higher classification performance with angular Minkowski p-distance with suitable values for p than with classical cosine dissimilarity.}},
  author       = {{Lenz, Oliver Urs and Cornelis, Chris}},
  booktitle    = {{ROUGH SETS, IJCRS 2023}},
  editor       = {{Campagner, Andrea and Lenz, Oliver Urs and Xia, Shuyin and Ślęzak, Dominik and Wąs, Jarosław and Yao, JingTao}},
  isbn         = {{9783031509582}},
  issn         = {{0302-9743}},
  keywords     = {{cosine dissimilarity,fuzzy rough sets,minkowski distance,nearest neighbours}},
  language     = {{eng}},
  location     = {{Krakow, Poland}},
  pages        = {{402--413}},
  publisher    = {{Springer}},
  title        = {{Classifying token frequencies using angular Minkowski p-distance}},
  url          = {{http://doi.org/10.1007/978-3-031-50959-9_28}},
  volume       = {{14481}},
  year         = {{2023}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: