Sub-Project 2.2: Generation of Structured Knowledge from Heterogonous User-Generated-Content

With the growth in volume and variety of UGC on the web, the way that people seek and consume information and knowledge is changing. Given the sheer volume of UGCs in any given topic, it becomes harder and extremely time-consuming for users to grasp and follow the evolution of knowledge even in their domains of interests. To better enhance the aggregation, communication, and corroboration of insights and knowledge of the crowd, this research explores techniques to automatically analyse, organize, and summarize large amount of UGCs on a specific topic so as to encourage macro-level and micro-level information access and knowledge creation.

Our system embodies three major components and research topics:

1) Construction of dynamic topic-specific knowledge structures by leveraging a wide variety of UGC sources including structured knowledge from Wikipedia or Blogs, semi-structured knowledge from cQA and forums, and the unstructured but live information sources from twitters. This structure is dynamic and can be re-generated from the latest UGC sources.

2) Organization and visualization of unstructured UGC contents based on the knowledge structure for complex information needs (see Figure 2.3).

3) The browsing, querying, and question-answering of any topics based on the resulting knowledge structures (see Figure 2.4).


Figure 2.3. Knowledge structures extracted from a collection of reviews on MAC Cosmetics. It shows the results of automatic organization of major MAC products and attributes such as Mascara and Gel in the graph-based structure (a), or as a hierarchical tree structure derived from (a) as shown in (b).


Figure 2.4: User interface for browsing and searching products related community-based question answer pairs with knowledge hierarchies as guides and overviews.

Representative Publication:

[1] Zhu Xingwei, Ming Zhaoyan, Chua Tat Seng and Zhu Xiaoyan: Topic Hierarchy Construction for the Organization of Multi-source User Generated Contents. To appear in ACM SIGIR 2013.

[2] Jianxing Yu, Zheng-Jun Zha, Tat-Seng Chua: Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews. EMNLP-CoNLL 2012: 391-401.

[3] Zhaoyan Ming, Kai Wang, Tat-Seng Chua: Prototype hierarchy based clustering for the categorization and navigation of web collections. International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2010): 2-9.

[4] Kai Wang, Zhaoyan Ming, Tat-Seng Chua: A syntactic tree matching approach to finding similar questions in community-based qa services. International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2009): 187-194.