Advanced search
2 files | 23.31 MB Add to list

Discovering non-metadata contaminant features in intrusion detection datasets

Laurens D'hooge (UGent) , Miel Verkerken (UGent) , Tim Wauters (UGent) , Bruno Volckaert (UGent) and Filip De Turck (UGent)
Author
Organization
Abstract
Most newly proposed detection methods in intrusion detection incorporate machine learning models to distinguish between benign and malicious traffic. The models are validated on a handful of academic datasets and ranked based on their classification performance. This article aims to demonstrate that unbeknownst to the new models' authors, there are features in these datasets which heavily bias the results and obscure a realistic, reliable estimate of the separability of the datasets. This paper proposes a methodology to estimate the contaminating influence of a dataset's features based on the concept of blind generalization. The novel methodology is subsequently used to assess the features of six widely adopted intrusion detection datasets. In each dataset, several features show a pattern where regardless of training attack class, the models blindly generalize towards all available attack classes with nearly identical classification metrics. These features provide undeserved boosts in the baseline classification scores for each dataset. By themselves, some contaminant features even push these baselines upwards of 90% accuracy (balanced).

Downloads

  • 8173 acc.pdf
    • full text (Accepted manuscript)
    • |
    • open access
    • |
    • PDF
    • |
    • 11.32 MB
  • (...).pdf
    • full text (Published version)
    • |
    • UGent only
    • |
    • PDF
    • |
    • 11.99 MB

Citation

Please use this url to cite or link to this publication:

MLA
D’hooge, Laurens, et al. “Discovering Non-Metadata Contaminant Features in Intrusion Detection Datasets.” 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST), IEEE, 2022, doi:10.1109/PST55820.2022.9851974.
APA
D’hooge, L., Verkerken, M., Wauters, T., Volckaert, B., & De Turck, F. (2022). Discovering non-metadata contaminant features in intrusion detection datasets. 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST). Presented at the 19th Annual International Conference on Privacy, Security and Trust (PST), Fredericton, NB, Canada. https://doi.org/10.1109/PST55820.2022.9851974
Chicago author-date
D’hooge, Laurens, Miel Verkerken, Tim Wauters, Bruno Volckaert, and Filip De Turck. 2022. “Discovering Non-Metadata Contaminant Features in Intrusion Detection Datasets.” In 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST). IEEE. https://doi.org/10.1109/PST55820.2022.9851974.
Chicago author-date (all authors)
D’hooge, Laurens, Miel Verkerken, Tim Wauters, Bruno Volckaert, and Filip De Turck. 2022. “Discovering Non-Metadata Contaminant Features in Intrusion Detection Datasets.” In 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST). IEEE. doi:10.1109/PST55820.2022.9851974.
Vancouver
1.
D’hooge L, Verkerken M, Wauters T, Volckaert B, De Turck F. Discovering non-metadata contaminant features in intrusion detection datasets. In: 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST). IEEE; 2022.
IEEE
[1]
L. D’hooge, M. Verkerken, T. Wauters, B. Volckaert, and F. De Turck, “Discovering non-metadata contaminant features in intrusion detection datasets,” in 2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST), Fredericton, NB, Canada, 2022.
@inproceedings{8769860,
  abstract     = {{Most newly proposed detection methods in intrusion detection incorporate machine learning models to distinguish between benign and malicious traffic. The models are validated on a handful of academic datasets and ranked based on their classification performance. This article aims to demonstrate that unbeknownst to the new models' authors, there are features in these datasets which heavily bias the results and obscure a realistic, reliable estimate of the separability of the datasets. This paper proposes a methodology to estimate the contaminating influence of a dataset's features based on the concept of blind generalization. The novel methodology is subsequently used to assess the features of six widely adopted intrusion detection datasets. In each dataset, several features show a pattern where regardless of training attack class, the models blindly generalize towards all available attack classes with nearly identical classification metrics. These features provide undeserved boosts in the baseline classification scores for each dataset. By themselves, some contaminant features even push these baselines upwards of 90% accuracy (balanced).}},
  author       = {{D'hooge, Laurens and Verkerken, Miel and Wauters, Tim and Volckaert, Bruno and De Turck, Filip}},
  booktitle    = {{2022 19TH ANNUAL INTERNATIONAL CONFERENCE ON PRIVACY, SECURITY & TRUST (PST)}},
  isbn         = {{9781665473989}},
  issn         = {{1712-364X}},
  language     = {{eng}},
  location     = {{Fredericton, NB, Canada}},
  pages        = {{11}},
  publisher    = {{IEEE}},
  title        = {{Discovering non-metadata contaminant features in intrusion detection datasets}},
  url          = {{http://doi.org/10.1109/PST55820.2022.9851974}},
  year         = {{2022}},
}

Altmetric
View in Altmetric
Web of Science
Times cited: