Work‎ > ‎

Research

PUBLICATIONS

  • Botao Hu, Nathan N. Liu and Weizhu Chen. Learning from Click Model and Latent Factor Model for Relevance Prediction Challenge, Workshop on Web Search Click Data (WSCD), 2012. Oral
  • Si Shen, Botao Hu, Weizhu Chen, and Qiang Yang. Personalized Click Model through Collaborative Filtering, In Proceedings of 5th International Conference on Web Search and Data Mining (WSDM), 2012. Oral
  • Botao Hu, Yuchen Zhang, Gang Wang, Weizhu Chen, and Qiang Yang. Characterizing Search Intent Diversity into Click Models, In Proceedings of 20th International World Wide Web Conference (WWW), 2011. Oral.  
  • Dakan Wang, Gang Wang, Pinyan Lu, Yajun Wang, Zheng Chen, and Botao Hu. Is Pay-Per-Click Efficient? An Empirical Analysis of Click Values, In Proceedings of 20th International World Wide Web Conference (WWW), 2011. Poster.  
  • Yuchen Zhang, Dong Wang, Gang Wang, Weizhu Chen, Zhihua Zhang, Botao Hu, and Li Zhang. Learning Click Model via Probit Bayesian Inference, In Proceedings of 19th ACM Conference on Information and Knowledge Management (CIKM), 2010. Oral
  • Dong Wang, Weizhu Chen, Gang Wang, Yuchen Zhang, and Botao Hu. Explore click models for search ranking, In Proceedings of 19th ACM Conference on Information and Knowledge Management (CIKM), 2010. Poster

De-anonymizing social networks

posted Sep 27, 2013, 1:31 PM by Botao Hu   [ updated Sep 27, 2013, 1:45 PM ]

Main Author and Experiment Conductor
Sep 2012 - Dec 2012
CS244W course project
Coauthored with Danqi Chen and Xie Shuo.
Supervised by Jure Leskovec 
Stanford University, CA, USA

The problem of de-anonymizing social networks is to identify the same users between two anonymized social networks. Network de-anonymization task is of multifold significance, with user profile enrichment as one of its most promising applications. After the de-anonymization and alignment, we can aggregate and enrich user profile information from different online networking services and make the bundled profiles available for end-users as well as third-party applications.
 
In our project, we aim to develop effective algorithms for de-anonymizing real-world social networks. Specifically, we focus on two tasks: one is to align the networks of Flickr and Instagram and the other is to align Flickr and Twitter. Our work is motivated by the two parts of information that network data is composed of: network structure and node attributes. Preliminary tests have shown that de-anonymizing algorithm based merely on node attributes, e.g. user names, is computationally efficient but not satisfactorily accurate. On the other hand, algorithms that rely on network structures, which bring in more relationship information, may contribute to the precision of de-anonymization. However, not only may the structure of the real-world social networks be quite different, but also the computation costs will be intractably high since the maximum common subgraph-isomorphism is a NP-hard problem. Hence it is very difficult to align two networks merely based on their structures without any auxiliary information. In view of these facts, we decide to develop approaches that can combine network structure information and node attributes to do the alignment.

In the end we developed various approaches including greedy-based approaches and network alignment methods. We also carried out a series of experiments to verify their performances on the real-world social network datasets. Our results show that we could identify nearly 70% of the users based on both the user names and the network structure.
 
This paper is organized as follows. We first give the problem formulation and discuss some related work. Then we introduce our collected datasets and present our observations and analysis of the real-world social networks. Based on our findings, we later introduce our algorithms and demonstrate how they are applied in practice. Finally, we show our experimental results.

Learning from Click Model and Latent Factor Model for Relevance Prediction Challenge

posted Mar 1, 2012, 10:07 AM by Botao Hu   [ updated Mar 4, 2012, 11:55 AM ]

Main Author and Experiment Conductor
Jun 2011 - Jul 2011
Coauthored with Nathan Liu, Weizhu Chen.
Supervised by Qiang Yang.
Hong Kong University of Science and Technology, Hong Kong
In Proceedings of WSCD 2012

We formally championed the Relevance Prediction Challenge with the cash prize and are invited to WSCD 2012

How to accurately interpret user click behavior in search log is a key but challenging problem for search relevance. In this paper, we describe our solution to the relevance prediction challenge which achieves the first place among eligible teams. There are three stages in our solution: feature generation, feature augmentation and learning a ranking function. In the first stage, we extract features in relation to query-document pairs as well as individual queries and documents from the click log data. In the second stage, we induce additional features by click model techniques and learning latent factor models to correct different biases and discover the correlations between different queries or documents respectively. In the final stage, we apply supervised learning models on the limited labelled data to induce a model for predicting relevance based on the features generated in the previous two stages.

Personalized Click Model through Collaborative Filtering

posted Oct 23, 2011, 5:20 PM by Botao Hu   [ updated Dec 5, 2011, 5:58 AM ]

Main Author and Experiment Conductor
Jun 2011 - Jul 2011
Coauthored with Si Shen.
Supervised by Weizhu Chen, Qiang Yang.
Hong Kong University of Science and Technology, Hong Kong
In Proceedings of WSDM 2012

Click modeling aims to interpret the users' search click data in order to predict their clicking behavior. Existing models can well characterize the position bias of documents and snippets in relation to users' mainstream click behavior. Yet, current advances depict users' search actions only in a general setting by implicitly assuming that all users act in the same way, regardless of the fact that anyone, motivated with some individual interest, is more likely to click on a link than others. It is in light of this that we put forward a novel personalized click model to describe the user-oriented click preferences, which applies and extends matrix / tensor factorization from the view of collaborative filtering to connect users, queries and documents together. Our model serves as a generalized personalization framework that can be incorporated to the previously proposed click models and, in many cases, to their future extensions. Despite the sparsity of search click data, our personalized model demonstrates its advantage over the best click models previously discussed in the Web-search literature, supported by our large-scale experiments on a real dataset. A delightful bonus is the model's ability to gain insights into queries and documents through latent feature vectors, and hence to handle rare and even new query-document pairs much better than previous click models.

Characterizing Search Intent Diversity into Click Models

posted Oct 23, 2011, 4:47 PM by Botao Hu   [ updated Oct 26, 2011, 5:48 AM ]

First Author and Experiment Conductor
Mar 2010 - May 2010
Coauthored with Yuchen Zhang.
Supervised by Gang Wang, Weizhu Chen, Qiang Yang.
Microsoft Research Asia, Beijing
In Proceedings of WWW 2011

Modeling a user's click-through behavior in click logs is a challenging task due to the well-known position bias problem. Recent advances in click models have adopted the examination hypothesis which distinguishes document relevance from position bias. In this paper, we revisit the examination hypothesis and observe that user clicks cannot be completely explained by relevance and position bias. Speci.cally, users with different search intents may submit the same query to the search engine but expect different search results. Thus, there might be a bias between user search intent and the query formulated by the user, which can lead to the diversity in user clicks. This bias has not been considered in previous works such as UBM, DBN and CCM. In this paper, we propose a new intent hypothesis as a complement to the examination hypothesis. This hypothesis is used to characterize the bias between the user search intent and the query in each search session. This hypothesis is very general and can be applied to most of the existing click models to improve their capacities in learning unbiased relevance. Experimental results demonstrate that after adopting the intent hypothesis, click models can better interpret user clicks and achieve a significant NDCG improvement.

Learning Click Model via Probit Bayesian Inference

posted Oct 23, 2011, 4:24 PM by Botao Hu   [ updated Dec 4, 2011, 9:46 PM ]

Experiment Conductor
Mar 2010 - May 2010
Coauthored with Yuchen Zhang, Dong Wang.
Supervised by Gang WangWeizhu Chen.
Microsoft Research Asia, Beijing
In Proceedings of CIKM 2010

Recent advances in click models have positioned them as an effective approach to the improvement of interpreting click data, and some typical works include UBM, DBN, CCM, etc. After formulating the knowledge of user search behavior into a set of model assumptions, each click model developed an inference method to estimate its parameters. The inference method plays a critical role in terms of accuracy in interpreting clicks, and we observe that different inference methods for a click model can lead to significant accuracy differences. In this paper, we propose a novel Bayesian inference approach for click models. This approach regards click model under a unified framework.

1. This approach can be widely applied to existing click models, and we demonstrate how to infer DBN, CCM and UBM through it. This novel inference method is based on the Bayesian framework which is more exible in characterizing the uncertainty in clicks and brings higher generalization abilities. As a result, it not only excels in the inference methods originally developed in click models, but also provides a valid comparison among different models;
2. In contrast to the previous click models, which are exclusively designed for the position-bias, this approach is capable of capturing more sophisticated information such as BM25 and PageRank score into click models. This makes these models interpret click-through data more accurately. Experimental results illustrate that the click models integrated with more information can achieve significantly better performance on click perplexity and search ranking;
3. Because of the incremental nature of the Bayesian learning, this approach is scalable to process large scale and constantly growing log data.

1-5 of 5