Topic 3: Live Text Search

Our research aims to further improve the usability and applications that surround the retrieval of the textual elements of Web 2.0, especially news and user generated content (UGC). We highlight how our basic research enables downstream projects on news, brand and product tracking, and mobile applications.  These

1. SANDS Differential News Portal ( or )

Web users have an easy time getting news, but to understand how different regions and people differ on their opinions of news is more difficult.  Our SANDS news portal, which stands for SAme News, Different Stories, is a bilingual news portal that aggregates individual news stories about news topics. SANDS gives a holistic news experience, by incorporating social media (Twitter, Weibo) with official news (i.e.,, chinadaily) and news portals (,,, from mainland China, Singapore Chinese media and Western media. SANDS detects hot topics and differences in opinions between different stakeholders to provide users with a comparative view (from both Chinese and English publishers) of a single news event. We calculate emotion scores for each news event, and estimate the degree of happiness, anger, sadness, fear and surprise. Integrating time- and location-based analyses, these systems help us learn how different users feel about the same news topic. 

2. Product and Brand Comparison and Reputation Tracking

Product comparison system:

Brand Microblog Messages Analytics System:

Product comparison is of great importance for product sellers to track the market trends. It is now common for savvy buyers to check user and professional critic reviews to evaluate the pros and cons of specific products. Our product comparison system uses automated methods to deliver two key features: 1) help users decide which product to buy among competitors, and 2) provide an analysis of why the product is selling well.

Our system can demonstrate to prospective customers an evaluation of all attributes of different products. From user reviews obtained on the social web, we can comparatively evaluate equivalent attributes on two or more different products.  In this way, users can select the right product for their needs.  For our analytics, we can highlight the reasons why a product sells well also mined from the user comments. We assume that some key properties attract users to make their purchasing decision and by analyzing comments in bulk we can ascertain the primary reasons.

Our second subproject, BrandinBlog (brand-in-blog), is an analytical system of corporate reputation in microblog (Weibo) messages. The system tracks the real-time regional messages relevant to some commercial brands and presents the popularity, sentiments and moods of the messages. The analysis is based on a sentiment-lexicon constructed from a large microblog corpus.  We have collected over 500K brand-related microblog messages over the past year (2012-13) on brands Toshiba, Fujitsu, Sony, NEC, Hitachi and Panasonic. We analyze the messages’ sentiment, and present both the data and aggregate analyses through our interface, which shows:

(1) Popularity / sentiment / emotions of the messages, broken down by regions in China.

(2) Ability to drill down by time range, brands and/or region.

(3) Draw trends over time.

(4) Show word clouds (“wordles”) of popular words used in messages, to reveal popular topics.

3. Mobile App Recommendation

Our final project focus area is on the recommendation of mobile applications or “apps”.  This is an increasingly important area for both the use and generation of data driving our text-centric and other NExT projects -- user-generated content (UGC). We examine how to rank both mobile applications and their downloads, to forecast demand and understand demand-drivers.  Our methods use insights from social signals from UGC, such as Twitter, to better forecast recommendations accurately.

Representative Publications (over all subprojects in Live Text Search):

(1) Bo Zhou, Yiqun Liu, Min Zhang, Yijiang Jin, Shaoping Ma (2011) Incorporating web browsing activities into anchor texts for web search.  Information Retrieval, Volume 14, Issue 3, pp 290-314

(2) Hadi Amiri, Yang Bao, Anqi Cui, Anindya Datta, Fang Fang, Xiaoying Xu (2011) NUSIS at TREC 2011 Microblog Track: Refining Query Results with Hashtags. Proceedings of the 20th Text Retrieval Conference (TREC)

(3) Fang Fang, Nargis Pervin, Anindya Datta, Kaushik Dutta and Debra Vandermeer (2011) Detecting Twitter Trends in Real-Time. Proceedings of 21st Annual Workshop on Information Technologies and Systems (WITS)

(4) Anqi Cui, Liner Yang, Dejun Hou, Min-Yen Kan, Yiqun Liu, Min Zhang and Shaoping Ma (2012) PrEV: Preservation Explorer and Vault for Web 2.0 User-Generated Content. In Proceedings of the Theory and Practice of Digital Libraries (TPDL 2012), Paphos, Cyprus. pp. 101-112. Lecture Notes in Computer Science, Volume 7489/2012.

(5) Chao Wang, Yiqun Liu, Min Zhang, Shaoping Ma, Meihong Zheng, Jing Qian, Kuo Zhang (to appear) Incorporating Vertical Results into Search Click Models. The 36th ACM SIGIR conference (SIGIR 2013).

(6) Jovian Lin, Kazunari Sugiyama, Min-Yen Kan and Tat-Seng Chua (to appear) Addressing Cold-Start in App Recommendation: Latent User Models Constructed from Twitter Followers. The 36th ACM SIGIR conference (SIGIR 2013).

Representative Patents:

(1) Qi Liu, Yang Liu, Chunyang Liu, Maosong Sun. A Method and a Device for Translation Retrieval. 201210438968.3. China. Filing date: 2012.11.06.

(2) Peng Li, Yang Liu, Ping Xue, Maosong Sun. A Method and a Device for Bilingual Word Alignment. 201310003841.3. China. Filing date: 2013.01.06.