Advanced search
2 files | 1.49 MB Add to list

Polar encoding : a simple baseline approach for classification with missing values

Oliver Urs Lenz (UGent) , Daniel Peralta (UGent) and Chris Cornelis (UGent)
Author
Organization
Project
Abstract
We propose polar encoding, a representation of categorical and numerical [0,1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and [0,1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and — depending on the classifier — about as well or better than mean/mode imputation with missing-indicators.
Keywords
Barycentric coordinates, classification, decision trees, fuzzy partitions, missing values, missingness incorporated in attributes (MIA), nearest neighbors, one-hot encoding, PREDICTION, REGRESSION, ALGORITHM, VARIABLES

Downloads

  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 1.15 MB
  • lenz-2024-polar-accepted.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 338.76 KB

Citation

Please use this url to cite or link to this publication:

MLA
Lenz, Oliver Urs, et al. “Polar Encoding : A Simple Baseline Approach for Classification with Missing Values.” IEEE TRANSACTIONS ON FUZZY SYSTEMS, vol. 32, no. 5, 2024, pp. 3084–93, doi:10.1109/tfuzz.2024.3367419.
APA
Lenz, O. U., Peralta, D., & Cornelis, C. (2024). Polar encoding : a simple baseline approach for classification with missing values. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 32(5), 3084–3093. https://doi.org/10.1109/tfuzz.2024.3367419
Chicago author-date
Lenz, Oliver Urs, Daniel Peralta, and Chris Cornelis. 2024. “Polar Encoding : A Simple Baseline Approach for Classification with Missing Values.” IEEE TRANSACTIONS ON FUZZY SYSTEMS 32 (5): 3084–93. https://doi.org/10.1109/tfuzz.2024.3367419.
Chicago author-date (all authors)
Lenz, Oliver Urs, Daniel Peralta, and Chris Cornelis. 2024. “Polar Encoding : A Simple Baseline Approach for Classification with Missing Values.” IEEE TRANSACTIONS ON FUZZY SYSTEMS 32 (5): 3084–3093. doi:10.1109/tfuzz.2024.3367419.
Vancouver
1.
Lenz OU, Peralta D, Cornelis C. Polar encoding : a simple baseline approach for classification with missing values. IEEE TRANSACTIONS ON FUZZY SYSTEMS. 2024;32(5):3084–93.
IEEE
[1]
O. U. Lenz, D. Peralta, and C. Cornelis, “Polar encoding : a simple baseline approach for classification with missing values,” IEEE TRANSACTIONS ON FUZZY SYSTEMS, vol. 32, no. 5, pp. 3084–3093, 2024.
@article{01HXY129BH9295AW7VYKD6NZE7,
  abstract     = {{We propose polar encoding, a representation of categorical and numerical [0,1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and [0,1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies  "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and — depending on the classifier — about as well or better than mean/mode imputation with missing-indicators.}},
  author       = {{Lenz, Oliver Urs and Peralta, Daniel and Cornelis, Chris}},
  issn         = {{1063-6706}},
  journal      = {{IEEE TRANSACTIONS ON FUZZY SYSTEMS}},
  keywords     = {{Barycentric coordinates,classification,decision trees,fuzzy partitions,missing values,missingness incorporated in attributes (MIA),nearest neighbors,one-hot encoding,PREDICTION,REGRESSION,ALGORITHM,VARIABLES}},
  language     = {{eng}},
  number       = {{5}},
  pages        = {{3084--3093}},
  title        = {{Polar encoding : a simple baseline approach for classification with missing values}},
  url          = {{http://doi.org/10.1109/tfuzz.2024.3367419}},
  volume       = {{32}},
  year         = {{2024}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: