Work‎ > ‎

Projects


Circus - detecting events in social circles

posted Sep 27, 2013, 2:12 PM by Botao Hu

Research and Development
Jul 2013 - Sep 2013
Supervised by Jiong Wang
Twitter Inc.

This project is aimed to detect circle events like "#sigir2013" from the tweet stream which 
  • is small that usually cannot be detected by trends
  • recently bursts in 1-2 day time window
  • involved users’ friendship network forms a social circle
  • strongly interacts with each others
I build a “replay” version for detecting event circles, which successfully find event circles from historical tweets. 

Compared to traditional community detection algorithms that are based on the connection structure only, this project is focusing on the interest-driven community. Only the community who are all involved in a bursty event will be extracted. There, through this project, we can obtain an interest graph -- correspondence between an interest/event and the engaged community. 

In contrast to the traditional interest mining methods, the correspondence between an interest and a group of engaged people usually tends to have higher signal quality than the correspondence between an interest and an individual. That is because the individual behavior might be recorded sparsely or noisely or even biasedly, but consistent group behavior tends not to be biased individually and have tendency to show strong group interest and inherent interest connection inside the group. 

The another contribution is that I investigate different information spreading patterns:
  • bursty events that have dense interaction graph -- circle event, e.g. #sigir2013
  • non-bursty events that have dense interaction graph -- group chat, or link farm, e.g. #sixsigma
  • bursty events that have loose interaction graph -- breaking news, e.g. #syriacrisis
  • non-bursty events that have loose interaction graph -- meme, or tv show, e.g. #morningtweet and #numb3rs

A Commercial Software of Data Mining in Large-scale Graphs and Social Networks

posted Oct 23, 2011, 9:13 PM by Botao Hu   [ updated Nov 1, 2011, 3:59 PM ]

Development Conductor and Core Developer
Jun 2011 - Oct 2011
Supervised by Qiang Yang
Hong Kong University of Science and Technology, Hong Kong

This project is aimed to build a library for graph mining, which can efficiently handle large scale graphs (5M nodes, 10M edges at least), and implements not only the basic statistics and operations of graph, but also the-state-of-the-art methods/algorithms in the five topics/problems of graph mining: Link-based Objective Ranking, Link-based Object Classification, Predicting Link Existence, Link Cardinality Estimation, and Group Detection/Community Detection. 

My plenty of time, approximately 4 months, was devoted to this project. 50k+ lines of code was involved, among which at least 30k+ lines was written by myself independently. 

SIGMA - Large Scale Machine Learning Toolkit

posted Oct 23, 2011, 9:13 PM by Botao Hu   [ updated Oct 24, 2011, 10:50 PM ]

Core Developer
Mar 2011 - Jun 2011
Mentored by Weizhu Chen
Microsoft Research Asia, Beijing

The goal of this project is to provide a group of parallel machine learning functionalities which can meet the requirements of research work and applications typically with large scale data/features. The toolkit includes but not limited to: classification, clustering, Ranking, statistical analysis, etc and makes them run on hundreds of machines, thousands of CPU cores parallel. We also provide a SDK for researchers/developers to invent their own algorithms and accumulate them into the toolkit.



CNML (Chinese News Markup Language) Management System - an XML Application in News Domain for Xinhua News Agency

posted Oct 23, 2011, 9:13 PM by Botao Hu   [ updated Oct 24, 2011, 5:45 AM ]

Core Developer
Nov 2010 - Jan 2011
Supervised by Juanzi Li
Tsinghua University, Beijing

CNML is a XML application in news domain, which includes China Standard "Chinese News Markup Language (CNML)" (GB/T20092-2006), and the project named CNML Management System. This software has been applied in many business systems of Xinhua News Agency, such as text, image and multimedia editing systems. I am in charge of developing the XML Scheme parser and CNML C++/Java API generator. We have won the second prize of Wang Xuan Award for Science and Technology Achievement in News Domain.

ClickBoost - A Commerical Large-scale Framework of Click Models for Bing.com

posted Oct 23, 2011, 9:12 PM by Botao Hu   [ updated Oct 24, 2011, 5:47 AM ]

Project Originator and Core Developer
Mar 2010 - Jun 2010
Mentored by Gang Wang
Microsoft Research Asia, Beijing

Probit click model framework is constructed based on the paper Learning Click Models via Probit Bayesian Inference written by Y. Zhang et al. They propose a novel inference approach which can be widely applied to existing click models. The new approach is based on the Bayesian framework. It replaces each probability variable in click models with a new variable following the Gaussian distribution through a probit link function, such that both the prior and the posterior distribution of the Bayesian learning can be approximated by Gaussians. 

This framework is a class library packed in a dynamic linked library file (DLL), which provides several functions and modules for data extraction from raw log, data trimming and filtering, click model training and testing, evaluation and so on. This framework is implemented on Scope cloud system, which is a Map-reduce computing system deployed over a network file system Cosmos. Thus, cloud implementation helps training and testing click model to be parallel and be able to handle large-scale click log data. Of course, this framework also supports to execute on local machine for debug. 

Take Note and Sync - An iPad Application of Reader with Noting

posted Oct 23, 2011, 9:11 PM by Botao Hu   [ updated Oct 24, 2011, 11:18 PM ]

Team Leader and Project Manager
Mar 2011 - Jun 2011
Tutored by Xin Zou
Tsinghua University, Beijing

An iPad Application for improving the quality of your academic life. You can read papers on your iPad, comment it and share your comments with your coworkers synchronously.

Key features:
   1. Reading papers, and sharing comments with co-authors
You can reading papers by our apps, add comments on it. Further, if you are in a seminar group, you can share your comments with others. Sharing is not a one-time-action, but a collaboration function. After your co-author adds new comments, your device will synchronically present these comment. 
   2. Personal Paper Library Synchronization
We store your papers (PDF), your and your co-authors' comments in the cloud. Since file managing on a mobile device is an inconvenient work, you can manage them by web on your home PC. Unlike most of iOS apps such as iPod Music, with which, you should elaborately maintain files on your devices, our apps will auto sync your library to your devices. It's cool.


Snake Jigsaw - A Prototype of an Novel Casual Game

posted Oct 23, 2011, 9:11 PM by Botao Hu   [ updated Oct 24, 2011, 10:13 PM ]

Creator and Developer.
Mar 2011
Coauthored with Yi Yang
Tsinghua University, Beijing

A novel causal game inspired by combining two simple causal games Jigsaw and Snake.  My partner, Yi YANG, and I were working in pair programming style, and finish it in 3 days. I hope that you will enjoy our interesting idea.

Migrating 265.com into Google.cn and Redesign

posted Oct 23, 2011, 9:09 PM by Botao Hu   [ updated Oct 24, 2011, 10:14 PM ]

Project Conductor and Core Developer
Jul 2009 - Nov 2009
Mentored by Steven GeTeresa Chai
Google Inc., Beijing

265.com, a well-known navigation website in China, offers the links of Chinese resources like things to download, games and more. After 265.com was acquired by Google Inc., I've been in charge to migrate, redesign and develop the whole website of 265.com regarding as the high quality of Google product, ranged from the design of user interface to the development of the backend service in Google Infrastructure. 

Play with Trees 2007 - An International Algorithm Contest

posted Oct 23, 2011, 8:32 AM by Botao Hu   [ updated Oct 24, 2011, 10:16 PM ]

Organizer and problem creator
Mar 2007 - May 2007
Fuzhou No.1 High School, Fuzhou

PT 2007 is an international algorithm contest about graph theory, held by me and my friend Thanh Vy. The problemset is completely about Tree in Graph theory. I believe the problems were very intriguing.  It is really a memorable experience during my high school years. 




1-9 of 9