Publication Date:
2019-06-28
Description:
A criterion, based on Bayes' theorem, is described that defines the optimal set of classes (a classification) for a given set of examples. This criterion is transformed into an equivalent minimum message length criterion with an intuitive information interpretation. This criterion does not require that the number of classes be specified in advance, this is determined by the data. The minimum message length criterion includes the message length required to describe the classes, so there is a built in bias against adding new classes unless they lead to a reduction in the message length required to describe the data. Unfortunately, the search space of possible classifications is too large to search exhaustively, so heuristic search methods, such as simulated annealing, are applied. Tutored learning and probabilistic prediction in particular cases are an important indirect result of optimal class discovery. Extensions to the basic class induction program include the ability to combine category and real value data, hierarchical classes, independent classifications and deciding for each class which attributes are relevant.
Keywords:
NUMERICAL ANALYSIS
Type:
NASA-TM-101174
,
NAS 1.15:101174
Format:
application/pdf
Permalink