
Polar encoding : a simple baseline approach for classification with missing values
- Author
- Oliver Urs Lenz (UGent) , Daniel Peralta (UGent) and Chris Cornelis (UGent)
- Organization
- Project
- Abstract
- We propose polar encoding, a representation of categorical and numerical [0,1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and [0,1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and — depending on the classifier — about as well or better than mean/mode imputation with missing-indicators.
- Keywords
- Barycentric coordinates, classification, decision trees, fuzzy partitions, missing values, missingness incorporated in attributes (MIA), nearest neighbors, one-hot encoding, PREDICTION, REGRESSION, ALGORITHM, VARIABLES
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 1.15 MB
-
lenz-2024-polar-accepted.pdf
- full text (Accepted manuscript)
- |
- open access
- |
- |
- 338.76 KB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01HXY129BH9295AW7VYKD6NZE7
- MLA
- Lenz, Oliver Urs, et al. “Polar Encoding : A Simple Baseline Approach for Classification with Missing Values.” IEEE TRANSACTIONS ON FUZZY SYSTEMS, vol. 32, no. 5, 2024, pp. 3084–93, doi:10.1109/tfuzz.2024.3367419.
- APA
- Lenz, O. U., Peralta, D., & Cornelis, C. (2024). Polar encoding : a simple baseline approach for classification with missing values. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 32(5), 3084–3093. https://doi.org/10.1109/tfuzz.2024.3367419
- Chicago author-date
- Lenz, Oliver Urs, Daniel Peralta, and Chris Cornelis. 2024. “Polar Encoding : A Simple Baseline Approach for Classification with Missing Values.” IEEE TRANSACTIONS ON FUZZY SYSTEMS 32 (5): 3084–93. https://doi.org/10.1109/tfuzz.2024.3367419.
- Chicago author-date (all authors)
- Lenz, Oliver Urs, Daniel Peralta, and Chris Cornelis. 2024. “Polar Encoding : A Simple Baseline Approach for Classification with Missing Values.” IEEE TRANSACTIONS ON FUZZY SYSTEMS 32 (5): 3084–3093. doi:10.1109/tfuzz.2024.3367419.
- Vancouver
- 1.Lenz OU, Peralta D, Cornelis C. Polar encoding : a simple baseline approach for classification with missing values. IEEE TRANSACTIONS ON FUZZY SYSTEMS. 2024;32(5):3084–93.
- IEEE
- [1]O. U. Lenz, D. Peralta, and C. Cornelis, “Polar encoding : a simple baseline approach for classification with missing values,” IEEE TRANSACTIONS ON FUZZY SYSTEMS, vol. 32, no. 5, pp. 3084–3093, 2024.
@article{01HXY129BH9295AW7VYKD6NZE7, abstract = {{We propose polar encoding, a representation of categorical and numerical [0,1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and [0,1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies "multiple imputation by chained equations" (MICE) and "multiple imputation with denoising autoencoders" (MIDAS) and — depending on the classifier — about as well or better than mean/mode imputation with missing-indicators.}}, author = {{Lenz, Oliver Urs and Peralta, Daniel and Cornelis, Chris}}, issn = {{1063-6706}}, journal = {{IEEE TRANSACTIONS ON FUZZY SYSTEMS}}, keywords = {{Barycentric coordinates,classification,decision trees,fuzzy partitions,missing values,missingness incorporated in attributes (MIA),nearest neighbors,one-hot encoding,PREDICTION,REGRESSION,ALGORITHM,VARIABLES}}, language = {{eng}}, number = {{5}}, pages = {{3084--3093}}, title = {{Polar encoding : a simple baseline approach for classification with missing values}}, url = {{http://doi.org/10.1109/tfuzz.2024.3367419}}, volume = {{32}}, year = {{2024}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: