Over-sampling imbalanced datasets using the Covariance Matrix

Ireimis  Leguen-deVarona; Julio  Madera; Yoan  Martínez-López; José  Carlos Hernández-Nieto

doi:10.4108/eai.13-7-2018.163982

Over-sampling imbalanced datasets using the Covariance Matrix

Authors

Ireimis Leguen-deVarona University of Camagüey
Julio Madera University of Camagüey
Yoan Martínez-López University of Camagüey
José Carlos Hernández-Nieto University of Camagüey

DOI:

https://doi.org/10.4108/eai.13-7-2018.163982

Keywords:

Imbalanced datasets, Oversampling, Covariance Matrix, Attribute Dependency

Abstract

INTRODUCTION: Nowadays, many machine learning tasks involve learning from imbalanced datasets, leading to the miss-classification of the minority class. One of the state-of-the-art approaches to ”solve” this problem at the data level is Synthetic Minority Over-sampling Technique (SMOTE) which in turn uses KNearest Neighbors (KNN) algorithm to select and generate new instances.

OBJECTIVES: This paper presents SMOTE-Cov, a modified SMOTE that use Covariance Matrix instead of KNN to balance datasets, with continuous attributes and binary class.

METHODS: We implemented two variants SMOTE-CovI, which generates new values within the interval of each attribute and SMOTE-CovO, which allows some values to be outside the interval of the attributes.

RESULTS: The results show that our approach has a similar performance as the state- of-the-art approaches.

CONCLUSION: In this paper, a new algorithm is proposed to generate synthetic instances of the minority class, using the Covariance Matrix.

Downloads

Download data is not yet available.

References

Downloads

Published

15-04-2020

Issue

Vol. 7 No. 27 (2020): EAI Endorsed Transactions on Energy Web

Section

Research articles

License

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 4.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.

How to Cite

Leguen-deVarona I, Madera J, Martínez-López Y, Carlos Hernández-Nieto J. Over-sampling imbalanced datasets using the Covariance Matrix. EAI Endorsed Trans Energy Web [Internet]. 2020 Apr. 15 [cited 2026 Jul. 16];7(27):e2. Available from: https://publications.eai.eu/index.php/ew/article/view/887

Download Citation

Over-sampling imbalanced datasets using the Covariance Matrix

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Scopus CiteScore

Latest publications