|dc.description||A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science, University of Regina, ix, 166 p.||en_US
|dc.description.abstract||Bayesian inference and rough set theory provide two approaches to data analysis.
There are close connections between the two theories as they both use probabilities to
express uncertainties and knowledge about data. Several proposals have been made
to apply Bayesian approaches to rough sets. This thesis draws results from two probabilistic
rough set models, namely, decision-theoretic rough set models (DTRSM) and
confirmation-theoretic rough set models (CTRSM) to propose a new Bayesian rough
set model (BRSM) for cost-sensitive ternary classification. I argue that although the
two classes of models share many similarities in terms of making use of Bayes’ theorem
and a pair of thresholds to produce three regions, their semantic interpretations and
hence intended applications are different. By integrating the two, I propose a unified
model of Bayesian rough sets and apply the model to develop ternary classification.
In developing the Bayesian rough set model, I focus on three fundamental issues,
namely, the interpretation and calculation of a pair of thresholds, the estimation of
probabilities, and the interpretation of the three regions used by rough set theory.
Email spam filtering is used as a real world application to show the usefulness of the
proposed model. Instead of treating email spam as a binary classification problem, I
argue that a three-way decision approach will provide a way that is more meaningful to
users for precautionary handling of their incoming emails. Three email folders instead
of two are produced in a three-way spam filtering system. A suspected folder is added to allow users to further examine suspicious emails, thereby reducing the misclassification
rate. In contrast to other ternary email spam filtering methods, my approach
focuses on issues that are less studied in previous work, that is, the computation of
required thresholds to define the three email categories and the interpretation of the
cost-sensitive characteristics of spam filtering. Instead of having the user supply the
thresholds based on their intuitive understanding of the intolerance for errors, I systematically
calculate the thresholds based on the decision-theoretic rough set model.
The cost of making the decision is interpreted as the loss function for Bayesian decision
theory. The final decision is made by choosing the possible decision for which the
overall cost is minimum. Experimental results on several benchmark datasets show
that the new approach reduces the error rate of misclassifying a legitimate email to
spam and demonstrates a better performance from a cost perspective.
Finally, I propose and investigate two extensions of the basic model. One concerns
multi-class classification and the other concerns multi-stage ternary classification.
These two extensions make the model more applicable to solving real world
|dc.description.uri||A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy *, University of Regina. *, * p.||en
|dc.publisher||Faculty of Graduate Studies and Research, University of Regina||en_US
|dc.title||A Cost-Sensitive Approach to Ternary Classification||en_US
|thesis.degree.name||Doctor of Philosophy (PhD)||en_US
|thesis.degree.grantor||University of Regina||en
|thesis.degree.department||Department of Computer Science||en_US