Project: Formalizing Subjective Interestingness in Exploratory Data Mining
2015-09-01 – 2019-04-30
- Abstract
The rate at which research labs, enterprises and governments accumulate data is high and fast increasing. Often, these data are collected for no specific purpose, or they turn out to be useful for unanticipated purposes: Companies constantly look for new ways to monetize their customer databases; Governments mine various databases to detect tax fraud; Security agencies mine and cross-associate numerous heterogeneous information streams from publicly accessible and classified databases to understand and detect security threats. The objective in such Exploratory Data Mining (EDM) tasks is typically ill-defined, i.e. it is unclear how to formalize how interesting a pattern extracted from the data is. As a result, EDM is often a slow process of trial and error. During this fellowship we aim to develop the mathematical principles of what makes a pattern interesting in a very subjective sense. Crucial in this endeavour will be research into automatic mechanisms to model and duly consider the prior beliefs and expectations of the user for whom the EDM patterns are intended, thus relieving the users of the complex task to attempt to formalize themselves what makes a pattern interesting to them. This project will represent a radical change in how EDM research is done. Currently, researchers typically imagine a specific purpose for the patterns, try to formalize interestingness of such patterns given that purpose, and design an algorithm to mine them. However, given the variety of users, this strategy has led to a multitude of algorithms. As a result, users need to be data mining experts to understand which algorithm applies to their situation. To resolve this, we will develop a theoretically solid framework for the design of EDM systems that model the user's beliefs and expectations as much as the data itself, so as to maximize the amount of useful information transmitted to the user. This will ultimately bring the power of EDM within reach of the non-expert.
-
- Journal Article
- A1
- open access
Incorporating topological priors into low-dimensional visualizations through topological regularization
-
- Conference Paper
- C1
- open access
FAIRRET : a framework for differentiable fairness regularization terms
-
- Journal Article
- A1
- open access
Inherent limitations of AI fairness
-
- Journal Article
- A1
- open access
GREASE : graph imbalance reduction by adding sets of edges
-
- Journal Article
- A2
- open access
Topological data analysis of thoracic radiographic images shows improved radiomics-based lung tumor histology prediction
-
- Journal Article
- A1
- open access
Gaussian embedding of temporal networks
-
- Conference Paper
- P1
- open access
ReCon : reducing congestion in job recommendation using optimal transport
-
- Journal Article
- A2
- open access
An efficient graph-based peer selection method for financial statements
-
Maximal fairness
-
- Journal Article
- A1
- open access
An empirical evaluation of network representation learning methods