Saturday, April 24, 2010

Paper Summary - A New Technique for Combining Multiple Classifiers using The Dempster-Shafer Theory of Evidence

Al-Ani, A. & Deriche, M. (2002) A new technique for combining multiple classifiers using the dempster shafer theory of evidence. In Journal of Artificial Intelligence Research, 17, (pp. 333—361)

This paper describes a new technique based on Dempster-Shafer to combine classifiers. The basic premise is that different types of features may be used depending upon the application. With different features, the same classifier may not always be best. So based on the features, a different classifier may outperform others. By combining classifiers they propose this is an efficient way to achieve the best classification results.

There are two problems defined by others, how to determine which classifiers to use and how to combine the classifier results to get the best results. They are addressing the second question in this paper.


They categorize the output of classification algorithms into 3 levels:

  • the abstract - outputs a unique label

  • the rank - ranks all labels with label at top as first choice

  • the measurement levels - attributes to each class a value reflecting degree of confidence that input belongs to class



They state the measurement level contains the 'highest amount of information' and they use this level for their work.

Two combination scenarios mentioned:

  • all use the same representation of input pattern

  • each uses its own representation



Relating to item 2 they found from another study that using a joint probability distribution using the sum rule gave the best results. They also quote a study that used weighted sums, and another that used a cost function to minimize MSE in conjunction with A NN. In this same study that used NN, a number of NNs were used to produce linear combination. Combining the results from the NN, they used Dempster-Shafer theory. They give a few other approaches and then the rest of the paper discusses their approach.

They combine classifier results using a number of different feature sets. Each feature set is used to train a classifier. For some input x, each classifier will produce a vector that conveys the degree of confidence that the classifier has for each class given the input.

They then discuss DS. DS is said to represent uncertainties better than probabilistic techniques such as Bayesian. For classifier combination, they stress this is important since there usually exists "a certain level of uncertainty associated with the performance of each classifier". Other classifier combination methods that use DS theory do not accurately estimate the evidence of classifiers, they state and believe that their approach which uses gradient descent learning minimizes the MSE between the combined output and target output of the training set.

They then go into detail about the math behind DS and about their approach.

Note DS Belief and Plausibility formulas from wikipedia:

Belief:

Plausibility:

Note DS Rule of combination:


where:



I need to return to this to describe their method. It is detailed and involves a lot of math.

Why am I reviewing this document. Well, this is a little off topic but I thought any exposure to methods that use DS will help me understand it better.

No comments: