Cost-Sensitive and Semi-Supervised Learning for Fraud
MetadataShow full item record
Given the magnitude of e-auction transactions, it becomes challenging to safeguard consumers from dishonest sellers, such as shill bidders. Shill Bidding (SB) is a predominant auction fraud that is driven by modern-day technologies and clever scammers. The difficulty of identifying the behavior of sophisticated fraudsters and the scarcity of training dataset, hinder the research on SB detection. The first part of this thesis aims to address these two difficult problems. We first define two new SB patterns and implement other existing SB patterns. Next, we develop a reliable SB dataset. This development requires crawling commercial auctions and bidder history, preprocessing the raw data, and detecting outliers. Due to the difficulty of labeling the multi-dimensional SB dataset, the second part of the thesis investigates the Semi-Supervised Classification approach (SSC). SSC requires labeling only a few SB samples. Therefore, we properly combine two data clustering methods and define an anomaly detection approach based on the SB scores of bidders in combination with the Three Sigma Rule. Our experimental analysis in developing several SSC models demonstrates that having unlabeled SB data together with a few labeled data improves the predictive performance of the supervised SB models. The SSC models are able to accurately differentiate between normal and shill bidders. Additionally, the learning curve of the models shows that the smaller the size of the labeled SB data, the more effective the model would be. Nevertheless, SSC models tackle the misclassifica- tion errors of all the classes alike. This means that identifying a fraudster as a normal bidder has the same risk as classifying a normal bidder as a fraudster. The third part of the thesis examines this serious problem based on MetaCost learning. Moreover, we propose an ensemble of a cost-sensitive and semi-supervised classifi- cation approach to deal with the problem of imbalanced data without modifying the original training SB dataset and to minimize the misclassification errors of the fraud class as well. We develop several ensemble SB models that are able to reduce the incorrect predictions of the fraud class.