Advanced search
1 file | 2.33 MB Add to list
Author
Organization
Abstract
Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different.
Keywords
Data Quality, Pattern Mining, Consistency, Triangular Norms

Downloads

  • EUSFLAT 2019 paper 35.pdf
    • full text (Published version)
    • |
    • open access
    • |
    • PDF
    • |
    • 2.33 MB

Citation

Please use this url to cite or link to this publication:

MLA
Boeckling, Toon, et al. “Mining Data Quality Rules Based on T-Dependence.” Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019), vol. 1, Atlantis, 2019, pp. 184–91.
APA
Boeckling, T., Bronselaer, A., & De Tré, G. (2019). Mining data quality rules based on T-dependence. In Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019) (Vol. 1, pp. 184–191). Prague, Czech Republic: Atlantis.
Chicago author-date
Boeckling, Toon, Antoon Bronselaer, and Guy De Tré. 2019. “Mining Data Quality Rules Based on T-Dependence.” In Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019), 1:184–91. Atlantis.
Chicago author-date (all authors)
Boeckling, Toon, Antoon Bronselaer, and Guy De Tré. 2019. “Mining Data Quality Rules Based on T-Dependence.” In Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019), 1:184–191. Atlantis.
Vancouver
1.
Boeckling T, Bronselaer A, De Tré G. Mining data quality rules based on T-dependence. In: Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019). Atlantis; 2019. p. 184–91.
IEEE
[1]
T. Boeckling, A. Bronselaer, and G. De Tré, “Mining data quality rules based on T-dependence,” in Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019), Prague, Czech Republic, 2019, vol. 1, pp. 184–191.
@inproceedings{8628444,
  abstract     = {Since their introduction in 1976, edit rules have been a standard tool in statistical analysis. Basically, edit rules are a compact representation of non-permitted combinations of values in a dataset. In this paper, we propose a technique to automatically find edit rules by use of the concept of T-dependence. We first generalize the traditional notion of lift, to that of T-lift, where stochastic independence is generalized to T-dependence. A combination of values is declared as an edit rule under a t-norm T if there is a strong negative correlation under T-dependence. We show several interesting properties of this approach. In particular, we show that under the minimum t-norm, edit rules can be computed efficiently by use of frequent pattern trees. Experimental results show that there is a weak to medium correlation in the rank order of edit rules obtained under T_M and T_P, indicating that the semantics of these kinds of dependencies are different.},
  author       = {Boeckling, Toon and Bronselaer, Antoon and De Tré, Guy},
  booktitle    = {Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)},
  isbn         = {9789462527706},
  issn         = {2589-6644},
  keywords     = {Data Quality,Pattern Mining,Consistency,Triangular Norms},
  language     = {eng},
  location     = {Prague, Czech Republic},
  pages        = {184--191},
  publisher    = {Atlantis},
  title        = {Mining data quality rules based on T-dependence},
  url          = {http://dx.doi.org/10.2991/eusflat-19.2019.28},
  volume       = {1},
  year         = {2019},
}

Altmetric
View in Altmetric