Work‎ > ‎Research‎ > ‎

De-anonymizing social networks

posted Sep 27, 2013, 1:31 PM by Botao Hu   [ updated Sep 27, 2013, 1:45 PM ]
Main Author and Experiment Conductor
Sep 2012 - Dec 2012
CS244W course project
Coauthored with Danqi Chen and Xie Shuo.
Supervised by Jure Leskovec 
Stanford University, CA, USA

The problem of de-anonymizing social networks is to identify the same users between two anonymized social networks. Network de-anonymization task is of multifold significance, with user profile enrichment as one of its most promising applications. After the de-anonymization and alignment, we can aggregate and enrich user profile information from different online networking services and make the bundled profiles available for end-users as well as third-party applications.
In our project, we aim to develop effective algorithms for de-anonymizing real-world social networks. Specifically, we focus on two tasks: one is to align the networks of Flickr and Instagram and the other is to align Flickr and Twitter. Our work is motivated by the two parts of information that network data is composed of: network structure and node attributes. Preliminary tests have shown that de-anonymizing algorithm based merely on node attributes, e.g. user names, is computationally efficient but not satisfactorily accurate. On the other hand, algorithms that rely on network structures, which bring in more relationship information, may contribute to the precision of de-anonymization. However, not only may the structure of the real-world social networks be quite different, but also the computation costs will be intractably high since the maximum common subgraph-isomorphism is a NP-hard problem. Hence it is very difficult to align two networks merely based on their structures without any auxiliary information. In view of these facts, we decide to develop approaches that can combine network structure information and node attributes to do the alignment.

In the end we developed various approaches including greedy-based approaches and network alignment methods. We also carried out a series of experiments to verify their performances on the real-world social network datasets. Our results show that we could identify nearly 70% of the users based on both the user names and the network structure.
This paper is organized as follows. We first give the problem formulation and discuss some related work. Then we introduce our collected datasets and present our observations and analysis of the real-world social networks. Based on our findings, we later introduce our algorithms and demonstrate how they are applied in practice. Finally, we show our experimental results.

Botao Hu,
Sep 27, 2013, 1:46 PM
Botao Hu,
Sep 27, 2013, 1:43 PM