Over-sampling imbalanced datasets using the Covariance Matrix
DOI:
https://doi.org/10.4108/eai.13-7-2018.163982Keywords:
Imbalanced datasets, Oversampling, Covariance Matrix, Attribute DependencyAbstract
INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to ”solve” this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses KNearest Neighbors (KNN) algorithm to select and generate new instances.
OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class.
METHODS: We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes.
RESULTS: The results show that our approach has a similar performance as the state- of-the-art approaches.
CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 EAI Endorsed Transactions on Energy Web
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 4.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.