Over-sampling imbalanced datasets using the Covariance Matrix

Authors

DOI:

https://doi.org/10.4108/eai.13-7-2018.163982

Keywords:

Imbalanced datasets, Oversampling, Covariance Matrix, Attribute Dependency

Abstract

INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to ”solve” this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses KNearest Neighbors (KNN) algorithm to select and generate new instances.

OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class.

METHODS: We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes.

RESULTS: The results show that our approach has a similar performance as the state- of-the-art approaches.

CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.

Downloads

Download data is not yet available.

Downloads

Published

15-04-2020

How to Cite

1.
Leguen-deVarona I, Madera J, Martínez-López Y, Carlos Hernández-Nieto J. Over-sampling imbalanced datasets using the Covariance Matrix. EAI Endorsed Trans Energy Web [Internet]. 2020 Apr. 15 [cited 2024 May 6];7(27):e2. Available from: https://publications.eai.eu/index.php/ew/article/view/887