Auction Shill Detection Framework Based on SWM
Abstract
Online auctioning has attracted serious in-auction fraud, such as shill bidding, given the
huge amount of money involved and the anonymity of users. Due to the fact that shill
bidding is difficult to detect as well as to prove, very few researchers have been
successful in designing online shill detection systems that can be adopted by auction
sites. We introduce an efficient SVM-based two-phased In-Auction Fraud Detection
(IAFD) model. This supervised model is first trained offline for identifying ‘Normal’ and
‘Suspicious’ bidders. For this process, we identify a collection of the most relevant fraud
classification features rather than uncertain or general features, like feedback ratings.
The model then can be launched online at the end of the bidding period and before the
auction is finalized to detect suspicious bidders and redirect for further investigation.
This will be beneficial for other legitimate bidders who otherwise might be victimized if
an infected auction is finalized and payment done. We propose a robust process to build
the optimal IAFD model, which comprises of data cleaning, scaling, clustering, labeling
and sampling, as well as learning via SVM. Since labelled auction data are lacking and
unavailable, we apply hierarchical clustering and our own labelling technique to generate
a high-quality training dataset. We utilize a hybrid method of over-sampling and undersampling
which proved to be more effective in solving the issue of highly imbalanced
fraud datasets. Numerous pre-processing and classification experiments are carried out
using different functions in Weka toolkit, firstly to verify their applicability with respect
to the training dataset and secondly to determine how these functions are impacting the model performance. Once the final model is built incorporating the relevant functions,
this model is tested with commercial auction data from eBay to detect shill bidders. The
classification results exhibit excellent performance in terms of detection and false alarm
rates. Also when compared to other SVM-based fraud detection systems, our model
outperforms the outcomes of those systems.