A Social Media Spam Detection Ensemble Methodology Utilizing Multiple Reputation Approaches

Santidhanyaroj, Pitiphat

A Social Media Spam Detection Ensemble Methodology Utilizing Multiple Reputation Approaches

Files

Santidhanyaroj_Pitiphat_200326051_MASC_SSE_Fall2016.pdf (7.27 MB)

Date

2016-07

Authors

Santidhanyaroj, Pitiphat

Publisher

Faculty of Graduate Studies and Research, University of Regina

Abstract

Over the past number of years, social media has become increasingly popular. This increased popularity has made social media a target for spammers. In addition to email systems, spammers have also turned their attention towards social media message systems and spam has become an increasing problem. Spam clutters user message feeds and can also affect the outcome of social media analysis systems. Traditional approaches to spam detection for individual social media messages is less effective due to message context, character limits, special characters and the like. This thesis proposes alternative methodologies for identifying and removing spam from social media systems. The Spam Sent User History (SSUH) and User Reputation Information (URI) methods proposed in this thesis are alternative approaches to standard text classification of individual messages. Through the learned categorization of spammer behavior, the likelihood of an incoming message originating from a spammer can be determined. This is accomplished through k-means, Bayesian and Threshold analysis of the incoming message's user data. The SSUH and URI methods proposed are utilized in concert as an ensemble approach to social media message spam detection. The results in this thesis shows that this methodology provides increased accuracy over traditional spam detection approaches.

Description

A Thesis Submitted to the Faculty of Graduate Studies and Research In Partial Fulfillment of the Requirements for the Degree of Master of Applied Science in Software Systems Engineering, University of Regina. viii, 89 p.

URI

https://hdl.handle.net/10294/7634

Collections

Master's Theses

Full item page