On the Choice of Appropriate Combination of Classi er and Decomposition Scheme for Multiclass Imbalanced Data Classi cation : A Comparative Analysis Sayantan

No Thumbnail Available

Date

2019-07

Journal Title

Journal ISSN

Volume Title

Publisher

Indian Statistical Institute, Kolkata

Abstract

Classifying a multiclass data set with an imbalanced distribution of class repre- sentatives in the data set is a challenging problem which is prevalent in many real-world applications. In this study,we have made a comparative analysis of di erent decomposition techniques like OneVsAll(OVA), OneVsOne(OVO), Error Correcting Output Codes(ECOC), All-and-One(A&O) and One-Against-Lower- Order(OALO) to deal with the multiclass imbalance. While OVA and OVO have been used signi cantly in the multiclass imbalance domain, our work is the rst to explore the remaining binarization approaches in this eld. We have examined the performance of these decomposition methods on two types of learning : algorith- mic approach and hybrid approach of both data-level and algorithmic solutions to solve the binary class imbalance classi cation problem. For the algorithmic ap- proach learning we have used Hellinger Distance Decision Trees and for the hybrid method, we propose Balanced Ensemble Models (BEM) that combines both sam- pling and algorithm level modi cations. It has been analyzed how e ectively the decomposition methods when applied on our approach can counter the challenges of multiclass imbalance. A detailed experimental study, supported by statistical analysis has been carried out to determine which combination of classi er(between HDDT and our proposed ensemble method) and decomposition scheme work best to produce satisfactory classi cation performance on a multiclass imbalanced data set. From our research we conclude that ECOC decomposition strategy when ap- plied on our proposed BEM outperforms all the other algorithms in dealing with multiclass imbalance problem.

Description

Dissertation under the supervision of Prof.

Keywords

Multiclass Imbalanced data classi cation, Decision tree

Citation

54p.

Endorsement

Review

Supplemented By

Referenced By