Riaz, S.; Arshad, A.; Jiao, L. Rough Noise-Filtered Easy Ensemble for Software Fault Prediction. Preprints2018, 2018050248. https://doi.org/10.20944/preprints201805.0248.v1
APA Style
Riaz, S., Arshad, A., & Jiao, L. (2018). Rough Noise-Filtered Easy Ensemble for Software Fault Prediction. Preprints. https://doi.org/10.20944/preprints201805.0248.v1
Chicago/Turabian Style
Riaz, S., Ali Arshad and Licheng Jiao. 2018 "Rough Noise-Filtered Easy Ensemble for Software Fault Prediction" Preprints. https://doi.org/10.20944/preprints201805.0248.v1
Abstract
Software fault prediction is the very consequent research topic for software quality assurance. Data driven approaches provide robust mechanisms to deal with software fault prediction. However, the prediction performance of the model highly depends on the quality of dataset. Many software datasets suffers from the problem of class imbalance. In this regard, under-sampling is a popular data pre-processing method in dealing with class imbalance problem, Easy Ensemble (EE) present a robust approach to achieve a high classification rate and address the biasness towards majority class samples. However, imbalance class is not the only issue that harms performance of classifiers. Some noisy examples and irrelevant features may additionally reduce the rate of predictive accuracy of the classifier. In this paper, we proposed two-stage data pre-processing which incorporates feature selection and a new Rough set Easy Ensemble scheme. In feature selection stage, we eliminate the irrelevant features by feature ranking algorithm. In the second stage of a new Rough set Easy Ensemble by incorporating Rough K nearest neighbor rule filter (RK) afore executing Easy Ensemble (EE), named RKEE for short. RK can remove noisy examples from both minority and majority class. Experimental evaluation on real-world software projects, such as NASA and Eclipse dataset, is performed in order to demonstrate the effectiveness of our proposed approach. Furthermore, this paper comprehensively investigates the influencing factor in our approach. Such as, the impact of Rough set theory on noise-filter, the relationship between model performance and imbalance ratio etc. comprehensive experiments indicate that the proposed approach shows outstanding performance with significance in terms of area-under-the-curve (AUC).
Keywords
software fault prediction; data preprocessing; feature selection; rough set theory; class imbalance; noise filter; easy ensemble
Subject
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.