1 file | 10.20 MB

# Advanced learning in massive fusion databases : nonlinear regression, clustering, dimensionality reduction and information retrieval

Geert Verdoolaege (UGent) and Guido Van Oost (UGent)
Author
Organization
Abstract
The use of advanced data mining techniques in fusion experiments can help both in the progress of physical insight as well as in solving current engineering challenges in a fast track approach to the realization of fusion power. We present a research program concerning several important operations that are useful for detecting structures of interest in massive fusion databases. We consider measurement uncertainty through a probabilistic approach and we exploit useful information residing in the temporal structure of signals (e.g. the $D_\alpha$ signal for plasma regime identification), explicitly taking into account nonstationarity and transient behavior. Therefore we adopt a multiscale wavelet representation, modeling the wavelet coefficients through appropriate probability distributions. We integrate data from multiple diagnostics, optionally capturing signal dependencies by multivariate distributions. Our framework is concerned with the following tasks: for learning an in general nonlinear relation between physical variables and for extrapolation toward reactor-relevant conditions (e.g. confinement prediction); of objects (e.g. discharges) into physically meaningful groups; to uncover the fundamental degrees of freedom driving certain aspects of plasma behavior. In addition, this scheme is useful for data visualization and as a preprocessing step for various machine learning algorithms, in order to mitigate issues related to a high data dimensionality; by searching in a database for plasma conditions or phenomena that are similar to a given query. In order to accomplish our program, we employ the powerful language of information geometry, i.e. the study of probabilistic manifolds using differential geometry. In this work, we present the details of our mathematical framework and we show results of an example clustering application for plasma regime identification.
Keywords
plasma confinement, pattern recognition, information geometry, Nuclear fusion

• verdoolaege eps 2011.pdf
• full text
• |
• open access
• |
• PDF
• |
• 10.20 MB

## Citation

Chicago
Verdoolaege, Geert, and Guido Van Oost. 2011. “Advanced Learning in Massive Fusion Databases : Nonlinear Regression, Clustering, Dimensionality Reduction and Information Retrieval.” In Europhysics Conference Abstracts. Vol. 35G. Mulhouse, France: European Physical Society (EPS).
APA
Verdoolaege, Geert, & Van Oost, G. (2011). Advanced learning in massive fusion databases : nonlinear regression, clustering, dimensionality reduction and information retrieval. EUROPHYSICS CONFERENCE ABSTRACTS (Vol. 35G). Presented at the 38th EPS conference on Plasma Physics, Mulhouse, France: European Physical Society (EPS).
Vancouver
1.
Verdoolaege G, Van Oost G. Advanced learning in massive fusion databases : nonlinear regression, clustering, dimensionality reduction and information retrieval. EUROPHYSICS CONFERENCE ABSTRACTS. Mulhouse, France: European Physical Society (EPS); 2011.
MLA
Verdoolaege, Geert, and Guido Van Oost. “Advanced Learning in Massive Fusion Databases : Nonlinear Regression, Clustering, Dimensionality Reduction and Information Retrieval.” Europhysics Conference Abstracts. Vol. 35G. Mulhouse, France: European Physical Society (EPS), 2011. Print.
@inproceedings{1934376,
abstract     = {The use of advanced data mining techniques in fusion experiments can help both in the progress of physical insight as well as in solving current engineering challenges in a fast track approach to the realization of fusion power. We present a research program concerning several important operations that are useful for detecting structures of interest in massive fusion databases. We consider measurement uncertainty through a probabilistic approach and we exploit useful information residing in the temporal structure of signals (e.g. the $D_\alpha$  signal for plasma regime identification), explicitly taking into account nonstationarity and transient behavior. Therefore we adopt a multiscale wavelet representation, modeling the wavelet coefficients through appropriate probability distributions. We integrate data from multiple diagnostics, optionally capturing signal dependencies by multivariate distributions. Our framework is concerned with the following tasks: for learning an in general nonlinear relation between physical variables and for extrapolation toward reactor-relevant conditions (e.g. confinement prediction); of objects (e.g. discharges) into physically meaningful groups; to uncover the fundamental degrees of freedom driving certain aspects of plasma behavior. In addition, this scheme is useful for data visualization and as a preprocessing step for various machine learning algorithms, in order to mitigate issues related to a high data dimensionality; by searching in a database for plasma conditions or phenomena that are similar to a given query.
In order to accomplish our program, we employ the powerful language of information geometry, i.e. the study of probabilistic manifolds using differential geometry. In this work, we present the details of our mathematical framework and we show results of an example clustering application for plasma regime identification.},
articleno    = {P5.053},
author       = {Verdoolaege, Geert and Van Oost, Guido},
booktitle    = {EUROPHYSICS CONFERENCE ABSTRACTS},
isbn         = {9782914771689},
issn         = {0378-2271},
keywords     = {plasma confinement,pattern recognition,information geometry,Nuclear fusion},
language     = {eng},
location     = {Strasbourg, France},
pages        = {4},
publisher    = {European Physical Society (EPS)},
title        = {Advanced learning in massive fusion databases : nonlinear regression, clustering, dimensionality reduction and information retrieval},
url          = {http://ocs.ciemat.es/EPS2011PAP/pdf/P5.053.pdf},
volume       = {35G},
year         = {2011},
}