Effectiveness of Data Preprocessing Techniques for Uneven Classification

Authors

  • Sadia Kausar Author
  • mohammed Ibrahim Ali Author
  • Shaik Zayd Ahmed Farooqui Author
  • Mohammed Misban Uddin Author

Abstract

Machine learning in cybersecurity is challenging due to many factors, one of which is the severe class imbalance. A variety of dataset preparation approaches have been introduced over the years. In order to enhance the prediction capabilities of classifiers trained on the training dataset, these techniques alter it by either oversampling or undersampling, or both. A thorough, objective benchmark evaluating the efficacy of various methodologies across a number of cybersecurity situations is lacking, despite their infrequent use in cybersecurity. This paper gives a comparison of 16 preprocessing approaches on six cybersecurity datasets coupled with 17 public imbalanced datasets from different fields. Using an AutoML system to train classifiers on the preprocessed datasets, we evaluate the approaches under numerous hyperparameter configurations. This lowers the possible bias from individual hyperparameter or classifier selections. The approaches are also evaluated with a focus on how well they mimic the actual functioning of cybersecurity systems in the real world by using suitable performance metrics. Our study's key takeaways are as follows: 1) There is usually a data preprocessing method that boosts classification performance. 2) Relative to most of the strategies in the benchmark, the baseline approach of doing nothing fared better. Thirdly, approaches that oversample tend to do better than those that undersample. 4) The typical SMOTE algorithm delivers the most noticeable performance advantages, whereas more complex approaches provide only incremental improvements, which are often accompanied by lower computing performance. Search Terms—AI, security, categorization, unbalanced categorization

Downloads

Published

28-04-2025

How to Cite

Effectiveness of Data Preprocessing Techniques for Uneven Classification. (2025). International Journal of Marketing Management, 13(2), 194-203. https://ijmm.in/index.php/ijmm/article/view/262