Sub-Project 2.1: SocialSense: Mining the Perception of Organizations, People and Other Entities from Social Media

The rich UGC sources contain information about organization, people and location etc. In particular, the UGC posts about organizations provide important and timely indicators on the spontaneous and often genuine views of the users and customers of the organizations. It is thus invaluable for organizations to keep track of such views to get live feedbacks from their users and perform live analytics on such data to discover both market insights and foresights and provide better services to their users. This sub-project focuses on discovering the public perception of organizations from live social media by answering several questions:

(1) What are people saying about the organization (current and emerging discussions)?

(2) Who are these people (organization users)?

(3) Where are they talking about the organization (locations of posts)?

Figure 2.1 presents the architecture of SocialSense. Given an organization, the first challenge is in gathering a representative distribution of relevant posts about the target organization considering that most of the social media platforms have unknown sampling methodology and limitation on the amount of data that can be crawled. We address this challenge by eliciting data using different strategies, including fixed and dynamic keywords, known users, and automatically identified key-users. We further develop sub-optimal solutions to identify other relevant keywords and key users to expand the gathering of representative data. The second challenge is in determining what are the emerging and evolving topics about the organization. In particular, how early can we predict the emerging topics (alerts) before they become viral? We address this challenge by learning the organization topics online through time, from which we propose a sparse coding algorithm that can quickly identify the emerging topics, keep track of the evolving topics, and purge the trivial ones as time passes. The third challenge is in identifying the user community of the organization. In particular, who are the key influencers and who are those that share the same interests as the organization (interest communities)? This challenge is addressed by identifying the active users who regularly tweets about the organization, initiates major discussions, and have many followers within the organization. The last challenge is in discriminating the user communities and topics of different organizations that share the same acronym. We address this challenge by utilizing the context of the target organization mined from the content of the known relevant data and user community. Figure 2.2 presents some sample analytics that SocialSense generates. In addition to organizations, the same approach can be applied to mining People, Topic and other entities.

Figure 2.1: The general architecture of SocialSense

Figure 2.2: What is contained in a topic: relevant posts, user community and sentiments

Representative Publication:

[1] Chen Yan, Amiri Hadi, Li Zhoujun and Chua Tat Seng: Emerging Topic Detection for Organizations from Microblogs. To appear in ACM SIGIR 2013.

[2] Yan Chen, Jichang Zhao, Xia Hu, Xiaoming Zhang, Zhoujun Li and Tat-Seng Chua. From Interest to Function: Location Estimation in Social Media. AAAI’2013.

[3] Hadi Amiri and Tat-Seng Chua: Mining Sentiment Terminology through Time. International Conference on Information and Knowledge Management (CIKM 2012): 2060-2064