Barulina, M.; Okunkov, S.; Ulitin, I.; Sanbaev, A. Sensitivity of Modern Deep Learning Neural Networks to Unbalanced Datasets in Multiclass Classification Problems. Appl. Sci.2023, 13, 8614.
Barulina, M.; Okunkov, S.; Ulitin, I.; Sanbaev, A. Sensitivity of Modern Deep Learning Neural Networks to Unbalanced Datasets in Multiclass Classification Problems. Appl. Sci. 2023, 13, 8614.
Barulina, M.; Okunkov, S.; Ulitin, I.; Sanbaev, A. Sensitivity of Modern Deep Learning Neural Networks to Unbalanced Datasets in Multiclass Classification Problems. Appl. Sci.2023, 13, 8614.
Barulina, M.; Okunkov, S.; Ulitin, I.; Sanbaev, A. Sensitivity of Modern Deep Learning Neural Networks to Unbalanced Datasets in Multiclass Classification Problems. Appl. Sci. 2023, 13, 8614.
Abstract
One of the critical problems in multiclass classification tasks is the imbalance of the dataset. This is especially true when using contemporary pre-trained neural networks, where, in fact, the last layers of the neural network are retrained. Therefore, the large datasets with highly unbalanced classes are not good for models’ training since the use of such a dataset leads to overfitting and, accordingly, poor metrics on test and validation datasets. In this paper the sensitivity to a dataset imbalance of Xception, ViT-384, ViT-224, VGG19, ResNet34, ResNet50, ResNet101, Inception_v3, DenseNet201, DenseNet161, DeIT was studied using a highly imbalanced dataset of 20,971 images sorted into 7 classes. It is shown that the best metrics were obtained when using a cropped dataset with augmentation of missing images in classes up to 15% of the initial number. So, the metrics can be increased by 2-6% compared to the metrics of the models on the initial unbalanced data set. Moreover, the metrics of the rare classes' classification also improved significantly – the TruePositive value can be increased by 0.3 and more. As result, the best approach to train considered networks on an initially unbalanced dataset was formulated.
Keywords
deep learning; unbalanced dataset; augmentation; multiclass classification; metrics boosting method; sota algorithm; visual transformer; ResNet; Xception; Inception
Subject
Computer Science and Mathematics, Computer Vision and Graphics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.