A Social Media Spam Detection Ensemble Methodology Utilizing Multiple Reputation Approaches
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Over the past number of years, social media has become increasingly popular. This increased popularity has made social media a target for spammers. In addition to email systems, spammers have also turned their attention towards social media message systems and spam has become an increasing problem. Spam clutters user message feeds and can also affect the outcome of social media analysis systems. Traditional approaches to spam detection for individual social media messages is less effective due to message context, character limits, special characters and the like. This thesis proposes alternative methodologies for identifying and removing spam from social media systems. The Spam Sent User History (SSUH) and User Reputation Information (URI) methods proposed in this thesis are alternative approaches to standard text classification of individual messages. Through the learned categorization of spammer behavior, the likelihood of an incoming message originating from a spammer can be determined. This is accomplished through k-means, Bayesian and Threshold analysis of the incoming message's user data. The SSUH and URI methods proposed are utilized in concert as an ensemble approach to social media message spam detection. The results in this thesis shows that this methodology provides increased accuracy over traditional spam detection approaches.