Shirin Nilizadeh, Apu Kapadia, and Yong-Yeol Ahn
In Proceedings of 21st ACM Conference on Computer and Communications Security (CCS ’14), Arizona, USA, 2014.
Online social network providers have become treasure troves of information for marketers and researchers. To profit from their data while honoring the privacy of their customers, social networking services share ‘anonymized’ social network datasets, where, for example, identities of users are removed from the social network graph. However, by using external information such as a reference social graph (from the same network or another network with similar users), researchers have shown how such datasets can be de-anonymized. These approaches use ‘network alignment’ techniques to map nodes from the reference graph into the anonymized graph and are often sensitive to larger network sizes, the number of seeds, and noise — which may be added to preserve privacy.
We propose a divide-and-conquer approach to strengthen the power of such algorithms. Our approach partitions the networks into ‘communities’ and performs a two-stage mapping: first at the community level, and then for the entire network. Through extensive simulation on real-world social network datasets, we show how such community-aware network alignment improves de- anonymization performance under high levels of noise, large network sizes, and a low number of seeds. Even when nodes cannot be explicitly mapped, the community structure can be mapped between both networks, thus reducing the anonymity of users. For example, for our (real-world) Twitter dataset with 90,000 nodes, 20% noise, and 16 seeds, the state-of-the-art technique reduces anonymity by 0 bits, whereas our approach reduces anonymity by 9.71 bits (with 40% of nodes mapped).
The Workshop on Surveillance & Technology (SAT) held with the Privacy Enhancing Technologies Symposium (PETS), June 29th, 2015.
In June 2013, Edward Snowden, a computer analyst who formerly worked for a government security contractor, provided the media with top-secret documents from the National Security Agency (NSA) about the mass collection of telephone and Internet communications. Since then, mass media have been reporting on the ongoing disclosures, which have raised questions about the proper scope of government monitoring of the Internet. A recent survey by PEN American Center has shown that these revelations have had an impact on people’s views about government surveillance. In this paper, we examine the reactions of Twitter users to these revelations. Using a sample of tweets written from June to November 2013, we performed sentiment and text analysis and compared the results with an analysis of all tweets in the same time period. We identified the words most commonly used for expressing opinions about these disclosures. We also examine how men and women responded differently to these events. By analyzing Twitter users’ profiles and their lists, we also identified communities of people who have been more active and concerned about the topic. This research helps us to develop general hypotheses about how people response to the security environment of the Internet.
Modern technologies are radically altering the privacy of everyday communication as well as people’s perception of privacy. This research seeks to understand how revelations about the security of the Internet affect how people communicate. People may view the Internet as safer or riskier than before based on various information. For example, in 2013, it was revealed and widely publicized that the National Security Agency was collecting emails, telephone calls, and other forms of data from Americans and citizens of other countries. This research will measure the effects of such revelations on the people’s communication patterns on Twitter, and help develop a more mature understanding of how online communities are shaped by their social and political environments. This understanding will ultimately allow us to better model how social and political trend affect how people present themselves in online environments.
Data Mining, Twitter user privacy, Network analysis: retweet network, community detection algorithm, Natural language processing: Topic modeling and sentimental analysis, Data Analysis