Advanced search
1 file | 3.97 MB Add to list
Author
Organization
Abstract
Non-hierarchical clustering methods are frequently based on the idea of forming groups around 'objects'. The main exponent of this class of methods is the k-means method, where these objects are points. However, clusters in a data set may often be due to certain relationships between the measured variables. For instance, we can find linear structures such as straight lines and planes, around which the observations are grouped in a natural way. These structures are not well represented by points. We present a method that searches for linear groups in the presence of outliers. The method is based on the idea of impartial trimming. We search for the 'best' subsample containing a proportion 1-alpha of the data and the best k affine subspaces fitting to those non-discarded observations by measuring discrepancies through orthogonal distances. The population version of the sample problem is also considered. We prove the existence of solutions for the sample and population problems together with their consistency. A feasible algorithm for solving the sample problem is described as well. Finally, some examples showing how the method proposed works in practice are provided.
Keywords
FEATURES, IDENTIFICATION, FIXED-POINT CLUSTERS, COMPUTER VISION, DATA VISUALIZATION, SELF-CONSISTENCY, PRINCIPAL CURVES, REGRESSION, FAST ALGORITHM, METHODOLOGY

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 3.97 MB

Citation

Please use this url to cite or link to this publication:

MLA
Garcia-Escudero, Luis Angel et al. “Robust Linear Clustering.” JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY 71.1 (2009): 301–318. Print.
APA
Garcia-Escudero, L. A., Gordaliza, A., San Martin, R., Van Aelst, S., & Zamar, R. (2009). Robust linear clustering. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 71(1), 301–318.
Chicago author-date
Garcia-Escudero, Luis Angel, Alfonso Gordaliza, Roberto San Martin, Stefan Van Aelst, and Ruben Zamar. 2009. “Robust Linear Clustering.” Journal of the Royal Statistical Society Series B-statistical Methodology 71 (1): 301–318.
Chicago author-date (all authors)
Garcia-Escudero, Luis Angel, Alfonso Gordaliza, Roberto San Martin, Stefan Van Aelst, and Ruben Zamar. 2009. “Robust Linear Clustering.” Journal of the Royal Statistical Society Series B-statistical Methodology 71 (1): 301–318.
Vancouver
1.
Garcia-Escudero LA, Gordaliza A, San Martin R, Van Aelst S, Zamar R. Robust linear clustering. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY. 2009;71(1):301–18.
IEEE
[1]
L. A. Garcia-Escudero, A. Gordaliza, R. San Martin, S. Van Aelst, and R. Zamar, “Robust linear clustering,” JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, vol. 71, no. 1, pp. 301–318, 2009.
@article{778224,
  abstract     = {Non-hierarchical clustering methods are frequently based on the idea of forming groups around 'objects'. The main exponent of this class of methods is the k-means method, where these objects are points. However, clusters in a data set may often be due to certain relationships between the measured variables. For instance, we can find linear structures such as straight lines and planes, around which the observations are grouped in a natural way. These structures are not well represented by points. We present a method that searches for linear groups in the presence of outliers. The method is based on the idea of impartial trimming. We search for the 'best' subsample containing a proportion 1-alpha of the data and the best k affine subspaces fitting to those non-discarded observations by measuring discrepancies through orthogonal distances. The population version of the sample problem is also considered. We prove the existence of solutions for the sample and population problems together with their consistency. A feasible algorithm for solving the sample problem is described as well. Finally, some examples showing how the method proposed works in practice are provided.},
  author       = {Garcia-Escudero, Luis Angel and Gordaliza, Alfonso and San Martin, Roberto and Van Aelst, Stefan and Zamar, Ruben},
  issn         = {1369-7412},
  journal      = {JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY},
  keywords     = {FEATURES,IDENTIFICATION,FIXED-POINT CLUSTERS,COMPUTER VISION,DATA VISUALIZATION,SELF-CONSISTENCY,PRINCIPAL CURVES,REGRESSION,FAST ALGORITHM,METHODOLOGY},
  language     = {eng},
  number       = {1},
  pages        = {301--318},
  title        = {Robust linear clustering},
  url          = {http://dx.doi.org/10.1111/j.1467-9868.2008.00682.x},
  volume       = {71},
  year         = {2009},
}

Altmetric
View in Altmetric
Web of Science
Times cited: