ISSN:
0885-6125
Keywords:
Decision trees
;
noise
;
induction
;
unbiased attribute selection
;
information-based measures
Source:
Springer Online Journal Archives 1860-2000
Topics:
Computer Science
Notes:
Abstract A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
Type of Medium:
Electronic Resource
URL:
http://dx.doi.org/10.1023/A:1022694010754
Permalink