The Influence of Scoring Parameters on Precision-Based AdaBoost

Date
2019-12
Authors
Movahedan Peymanhagh, Marjan
Journal Title
Journal ISSN
Volume Title
Publisher
Faculty of Graduate Studies and Research, University of Regina
Abstract

Many problems in data science involve classification, i.e., dividing a dataset into dif- ferent predefined classes. Such classification problems are often solved with machine learning techniques. Using a combination of classifiers in an ensemble is an effective method to improve the accuracy of the individual classifiers' predictions. AdaBoost, a popular ensemble-based approach, combines the votes of classifiers by assigning weights to their predictions based on the classifiers' overall accuracy. A modifica- tion of AdaBoost's weighting scheme is precision-based AdaBoost, which scores the classifiers based on their accuracy in predicting specific classes, rather than their to- tal accuracy. Even though this method appears to be performing well in practice, there is no known theoretical justification or formal derivation of the scoring param- eters chosen. No formal guarantees exist on the performance of those parameters either. Furthermore, the precision-based approach so far merely focuses on two-class classification problems. This thesis proposes a theoretical justification to support the precision-based idea with a provably effective choice of scoring parameter, as well as providing a guar- antee about their performance. A modified algorithm, called PrAdaBoost, is then presented using our formally derived class-specific weight coefficients. An empirical evaluation on 23 UCI datasets confirms the effectiveness of PrAdaBoost compared to the well-known and popular AdaBoost.M1 method and compared to the most successful previously proposed precision-based variant. We also extend the precision- based idea to the general multi-class setting and formally derived suitable scoring parameters in this setting as well. The results of another empirical evaluation of PrAdaBoost on 10 UCI datasets with more than two classes confirm the superiority of PrAdaBoost over the popular multi-class boosting method SAMME. Some mean- ingful relationships between the performance of PrAdaBoost and certain properties of datasets are also revealed through the experimental analysis.

Description
A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science, University of Regina. xi, 76 p.
Keywords
Citation
Collections