POISED: Spotting Twitter Spam Off the Beaten Paths

Submitted

Shirin Nilizadeh, Francois Labreche, Alireza Sedighian, Ali Zand, José Fernandez, Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna

Cybercriminals have found in online social networks a propitious medium to spread spam and malicious content. Existing techniques for detecting spam include predicting the trustworthiness of accounts and analyzing the content of these messages. However, advanced attackers can still successfully evade these defenses.

Online social networks bring people who have personal connections or share common interests to form communities. In this paper, we first show that users within a networked community share some topics of interest. Moreover, content shared on these social network tend to propagate according to the interests of people. Dissemination paths may emerge where some communities post similar messages, based on the interests of those communities. Spam and other malicious content, on the other hand, follow different spreading patterns.



In this paper, we follow this insight and present POISED, a system that leverages the differences in propagation between benign and malicious messages on social networks to identify spam and other unwanted content. We test our system on a dataset of 1.3M tweets collected from 64K users, and we show that our approach is effective in detecting malicious messages, reaching 91% precision and 93% recall on our dataset. We also show that POISED's detection is more comprehensive than previous systems, by comparing it to three state-of-the-art spam detection systems that have been proposed by the research community in the past. POISED significantly outperforms each of these systems. Moreover, through simulations, we show how POISED is effective in the early detection of spam messages and how it is resilient against two well-known adversarial machine learning attacks.



Methods:

Large-scale Twitter dataset, Network science, Natural language processing, Crowd-sourced a ground-truth dataset, Machine learning, Simulated possible adversarial machine learning attacks.

Prevalence and Identification of Auto Scam on Craigslist

Shirin Nilizadeh, Darya Orlova, Azadeh Nematzadeh, Apu Kapadia, and Minaxi Gupta

Craigslist ads are viewed by millions of Internet users each month, making it an attractive target for fraudsters and miscreants. Unsurprisingly, it has even been labeled a "cesspool of crime." In this project, we take a first look at automobile scam on Craigslist. Focusing on the U.S. market, we find scammers are exploiting the fact that posting ads on Craigslist is free. They post a large number of ads for the same vehicle in many cities over a short period of time, either manually or by leveraging the easy availability of automatic ad posting software. Interestingly, scams often advertise relatively new vehicles of popular makes and list them at tempting prices. They extensively use special characters to attract attention and randomize ad body to escape automatic detection. Fortunately, our study finds many features distinguishing scam from good ads. Using these features we show that an SVM based classifier can differentiate between scam and trustworthy ads with 99% accuracy.

Methods:

Web Mining, Data Mining, Text Mining, and Machine learning.