Topic 6: TsingNUS: A Location-based System Towards Live City

Today’s smartphones (e.g., iPhone, HTC) are equipped with GPS receivers that can capture a user’s location. This has led to the development of location-based services that provide users with location-aware experiences. For example, AT&T Location Information Services (http://www.wireless.att.com/lbs/) can be used to counter credit card fraud by verifying if a cardholder is at the same location of a point-of-sale transaction. As another example, members of FourSquare (https://foursquare.com) can find out if their friends have checked in to certain locations. In this project, we develop a location-based system, TsingNUS, that exploits users’ locations to improve the quality of their lives in a city. TsingNUS goes beyond traditional location-aware applications that are based solely on user locations. Instead, TsingNUS facilitates continuous and spatial-keyword search for a variety of query types including range, k nearest neighbors (kNN), reverse kNN, top-k, and preference queries. As an example, TsingNUS allows a user to continuously search for the nearest “gas stations” while driving. In fact, TsingNUS also facilitates location-specific advertisements, e.g., a coffee shop such as starbucks can utilize user profiles in Facebook to disseminate product information to potential customers who are interested in its products (i.e., those users with profile containing keywords starbucks, mocha, coffee) and are spatially close to its service areas. 

Figure 6.1: Architecture of TsingNUS

Location-based Data Integration

We first crawl and integrate location-based user-generated content (UGC) from the Web, including Foursquare, Twitter, etc. We then integrate the UGC data and existing POIs. If UGC data can match existing POIs, we extract the structured data and link them with the POIs. In this case, we can enrich existing POIs with detailed structured information. If UGC data cannot match existing POIs, we take them as new POIs. The main challenge is location-aware data integration, which links location-based data from two different sources. Thus given two sets of location-based objects (with textual descriptions and locations), the location-aware data integration identifies all the similar pairs that refer to the same entity. The similarity between two objects is quantified by combing the textual similarity and the distance of two objects. We have developed a filter-and-refine framework and devise several efficient algorithms to handle this. The data integration component provides users with abundant information and better search experiences. We have more than 17 million POIs in China and 16 million POIs in USA. With these data, users do not need to search again and again in search engines to acquire details. Instead, in our system users obtain not only a map showing the POIs results, but also the detailed information when they click on the results.

User-friendly Search Interface

We provide three unique location aware search interfaces, including location-aware instant search, location-aware similarity search, and direction-aware search, to help users easily find relevant answers.

Figure 6.2: Interface of TsingNUS

As a start, we have investigated spatial-keyword search that are direction-aware to prioritize answers that are “ahead” of the users as he moves. Moreover, our search-as-you-type feature enables answers to be continuously refined as each additional letter is typed. For effective spatial-keyword search, we have also developed novel spatial-textual similarity metrics. To support these features efficiently, we have developed several novel data structures (a.k.a. indexing methods) that seamlessly integrate textual description and spatial information to index spatial data. Our experimental studies have shown that the system is efficient. For example, searching 20NN from a 5 million POIs dataset is completed within 5ms.

Location-aware Search Engine

To efficiently support the three search paradigms, we devise effective indexing structures and search algorithms. As an example, to support moving queries, in addition to finding top-k answers of a moving query, we also calculate a safe region such that if a new query with a location falling in the safe region, we can directly use the answer set to answer the query. Before the client issues a new query at another location, the system will first check whether the new location is still in the safe region. If yes, it can reuse the answer set; otherwise, the client needs to issue a query with the new location to the server.