
Revised conditional t-SNE : looking beyond the nearest neighbors
- Author
- Edith Heiter (UGent) , Bo Kang (UGent) , Ruth Seurinck (UGent) and Jefrey Lijffijt (UGent)
- Organization
- Project
- Abstract
- Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding, to obtain a visualization revealing structure beyond label information. This is useful, for example, when one wants to factor out unwanted differences between a set of classes. We show that ct-SNE fails in many realistic settings, namely if the data is well clustered over the labels in the original high-dimensional space. We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities and storing within- and across-label nearest neighbors separately. This also enables the use of recently proposed speedups for t-SNE, improving the scalability. From experiments on synthetic data, we find that our proposed method resolves the considered problems and improves the embedding quality. On real data containing batch effects, the expected improvement is not always there. We argue revised ct-SNE is preferable overall, given its improved scalability. The results also highlight new open questions, such as how to handle distance variations between clusters.
Downloads
-
(...).pdf
- full text (Published version)
- |
- UGent only
- |
- |
- 17.58 MB
-
(...).pdf
- full text (Accepted manuscript)
- |
- UGent only (changes to open access on 2024-04-01)
- |
- |
- 3.85 MB
Citation
Please use this url to cite or link to this publication: http://hdl.handle.net/1854/LU-01GY79Q9GWP7XXJBX5HGSQ338F
- MLA
- Heiter, Edith, et al. “Revised Conditional T-SNE : Looking beyond the Nearest Neighbors.” Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings, edited by Bruno Crémilleux et al., vol. 13876, Springer, 2023, pp. 169–81, doi:10.1007/978-3-031-30047-9_14.
- APA
- Heiter, E., Kang, B., Seurinck, R., & Lijffijt, J. (2023). Revised conditional t-SNE : looking beyond the nearest neighbors. In B. Crémilleux, S. Hess, & S. Nijssen (Eds.), Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings (Vol. 13876, pp. 169–181). https://doi.org/10.1007/978-3-031-30047-9_14
- Chicago author-date
- Heiter, Edith, Bo Kang, Ruth Seurinck, and Jefrey Lijffijt. 2023. “Revised Conditional T-SNE : Looking beyond the Nearest Neighbors.” In Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings, edited by Bruno Crémilleux, Sibylle Hess, and Siegfried Nijssen, 13876:169–81. Cham: Springer. https://doi.org/10.1007/978-3-031-30047-9_14.
- Chicago author-date (all authors)
- Heiter, Edith, Bo Kang, Ruth Seurinck, and Jefrey Lijffijt. 2023. “Revised Conditional T-SNE : Looking beyond the Nearest Neighbors.” In Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings, ed by. Bruno Crémilleux, Sibylle Hess, and Siegfried Nijssen, 13876:169–181. Cham: Springer. doi:10.1007/978-3-031-30047-9_14.
- Vancouver
- 1.Heiter E, Kang B, Seurinck R, Lijffijt J. Revised conditional t-SNE : looking beyond the nearest neighbors. In: Crémilleux B, Hess S, Nijssen S, editors. Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings. Cham: Springer; 2023. p. 169–81.
- IEEE
- [1]E. Heiter, B. Kang, R. Seurinck, and J. Lijffijt, “Revised conditional t-SNE : looking beyond the nearest neighbors,” in Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings, Louvain-la-Neuve, Belgium, 2023, vol. 13876, pp. 169–181.
@inproceedings{01GY79Q9GWP7XXJBX5HGSQ338F, abstract = {{Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding, to obtain a visualization revealing structure beyond label information. This is useful, for example, when one wants to factor out unwanted differences between a set of classes. We show that ct-SNE fails in many realistic settings, namely if the data is well clustered over the labels in the original high-dimensional space. We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities and storing within- and across-label nearest neighbors separately. This also enables the use of recently proposed speedups for t-SNE, improving the scalability. From experiments on synthetic data, we find that our proposed method resolves the considered problems and improves the embedding quality. On real data containing batch effects, the expected improvement is not always there. We argue revised ct-SNE is preferable overall, given its improved scalability. The results also highlight new open questions, such as how to handle distance variations between clusters.}}, author = {{Heiter, Edith and Kang, Bo and Seurinck, Ruth and Lijffijt, Jefrey}}, booktitle = {{Advances in Intelligent Data Analysis XXI : 21st International Symposium on Intelligent Data Analysis, IDA 2023, Proceedings}}, editor = {{Crémilleux, Bruno and Hess, Sibylle and Nijssen, Siegfried}}, isbn = {{9783031300462}}, issn = {{0302-9743}}, language = {{eng}}, location = {{Louvain-la-Neuve, Belgium}}, pages = {{169--181}}, publisher = {{Springer}}, title = {{Revised conditional t-SNE : looking beyond the nearest neighbors}}, url = {{http://dx.doi.org/10.1007/978-3-031-30047-9_14}}, volume = {{13876}}, year = {{2023}}, }
- Altmetric
- View in Altmetric
- Web of Science
- Times cited: