Publication Date:
2011-11-09
Description:
Many times in classification problems, particularly in critical real world applications, one of the classes has much less samples than the others (usually known as the class imbalance problem). In this work we discuss and evaluate the use of the REPMAC algorithm to solve imbalanced problems. Using a clustering method, REPMAC recursively splits the majority class in several subsets, creating a decision tree, until the resulting sub-problems are balanced or easy to solve. We use two diverse clustering methods and three different classifiers coupled with REPMAC to evaluate the new method on several benchmark datasets spanning a wide range of number of features, samples and imbalance degree. We also apply our method to a real world problem, the identification of weed seeds. We find that the good performance of REPMAC is almost independent of the classifier or the clustering method coupled to it, which suggests that its success is mostly related to the use of an appropriate strategy to cope with imbalanced problems. Content Type Journal Article Pages 199-211 DOI 10.3233/HIS-2011-0140 Authors Hernán Ahumada, CIFASIS, French Argentine International Center for Information and Systems, Sciences, UPCAM (France) / UNR--CONICET (Argentina), Bv 27 de Febrero 210 Bis, 2000 Rosario, Argentina Guillermo L. Grinblat, CIFASIS, French Argentine International Center for Information and Systems, Sciences, UPCAM (France) / UNR--CONICET (Argentina), Bv 27 de Febrero 210 Bis, 2000 Rosario, Argentina Lucas C. Uzal, CIFASIS, French Argentine International Center for Information and Systems, Sciences, UPCAM (France) / UNR--CONICET (Argentina), Bv 27 de Febrero 210 Bis, 2000 Rosario, Argentina Alejandro Ceccatto, CIFASIS, French Argentine International Center for Information and Systems, Sciences, UPCAM (France) / UNR--CONICET (Argentina), Bv 27 de Febrero 210 Bis, 2000 Rosario, Argentina Pablo M. Granitto, CIFASIS, French Argentine International Center for Information and Systems, Sciences, UPCAM (France) / UNR--CONICET (Argentina), Bv 27 de Febrero 210 Bis, 2000 Rosario, Argentina Journal International Journal of Hybrid Intelligent Systems Online ISSN 1875-8819 Print ISSN 1448-5869 Journal Volume Volume 8 Journal Issue Volume 8, Number 4 / 2011
Print ISSN:
1448-5869
Electronic ISSN:
1875-8819
Topics:
Computer Science
Permalink