Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.

Similar presentations


Presentation on theme: "Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology."— Presentation transcript:

1 Pete Bohman Adam Kunk

2 Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology belonging to real-time web that enables users to receive information as soon as it is published

3 Real-Time Search  In terms of real-time search, what does “online” mean? Online means that a constant stream of input data is handled as it enters the system, contrary to batch processing  Bing Social Search Bing Social Search

4 Real-Time Search Input Data  Example of what kind of input data is considered for real-time search systems:  twittervision twittervision

5 Real-Time Content  Microblogging - Entirely new type of data 1. Short temporal life span 2. Little to no context 3. Simple ideas, fast reporting of events 4. Metadata: time, location, social links 5. Less factual, more opinionated 6. Static posts 7. Furious input rate 8. Often no hyperlink structure, few traditional ranking factors  Current search engines don’t take full advantage of this new data type

6 Real-Time vs. Conventional Search  Conventional Search Ranking Relevance Authority  Real-Time Search Ranking Relevance Temporal immediacy Popularity

7 Real-Time vs. Conventional Search  Conventional search input Crawl the web periodically and update index ○ Web documents evolve Incapable of crawling and indexing the entire web in real-time  Real-time search input Stream of data. No need to poll since the posts are static  What can we do with real-time search engines?

8 User Query Analysis  Collecta real-time search engine  Analyzed ~1 Million queries Continuous Queries ○ Monitor events by frequently resubmitting the same query Different query categories ConventionalReal-Time ShoppingCommerce EntertainmentTravel AdultEconomy

9 Crowdsourcing Real-Time Data  Crowd sourcing of first hand reports

10 Value of Real-Time Search  The estimated value of real-time search is around $33 Million Value derived from types of queries entered in real-time search systems Utilized adwords to determine worth of keywords appearing in queries

11 Applications of Real-Time Search  TwitterStand: Real-time news reports Example: Coverage of MJ’s death

12 Applications of Real-Time Search  Real-time alert systems Leverages tweet metadata (time, location) to raise alerts Earthquake localization based on tweets

13 Twitter Real-Time Alerts USGS Twitter Earthquake Detector

14 Difficulties of Real-Time Search  Two factors: Efficient indexing in order to provide for fast results Effective ranking in order to return relevant results

15 Indexing: RDBMS  RDBMS Indexing Indexes built on columns commonly used in queries Improves the speed of retrieval operations

16 Indexing: Conventional Search  Conventional Search (Inverted) Indexing Non structured data If a document does not exist in the index, it will not appear in query results

17 Indexing: Real-Time Search  Index stream of data Map keywords to tweets containing those keywords  Challenge Processing the stream in a timely manor ○ 5,000 tweets per second

18 TI Indexing  Not feasible to index every incoming tweet immediately  Selective indexing based on results that are most likely to appear in queries Distinguished tweets indexed in real-time Noisy tweets indexed by batch process

19 TI Tweet Classification  Observation Users are only interested in top-K results for a query  Distinguished tweets Tweet that belongs in the top-K result set of previous query  Noisy tweet Those tweets not appearing in the top-K results for any of the systems previous queries

20 TI Indexing  Must limit the size of the query set 1.6 Billion twitter queries per day

21 Query set optimization  Observation 20% of queries represent 80% of user requests  Therefore Zipf’s distribution used statistically limit the number of queries tweets were compared against

22 Real-Time Search Ranking  How does ranking differ from traditional web ranking? Typical web search engines rank based on links to a site, and links from a site (PageRank) Microblogging data contains social networking links ○ Followers ○ Friends ○ Re-tweets

23 Real-Time Search Ranking  Ranking is not necessary in RDBMS systems In RDBMS system data is strictly defined including algebraic operators Results are complete not subjective

24 TI Ranking  Ranking function comprised of: 1) User’s PageRank ○ Combination of user weight (defaulted to 1) and how many followers they have (popularity) 2) Timestamp (self-explanatory) 3) Similarity between tweet and the query

25 TI Ranking  Ranking function also comprised of: 4) Popularity of the topic Determined by large tweet trees  Popularity of tree is equal to the sum of the U-PageRank values of all tweets in the tree Tweet Tree Structure

26 TI Ranking Comparison TI Rank Vs. Time Rank

27 What are others doing?

28  Facebook Real-Time Feed

29 Implications  New type of data not currently searchable through existing search engines New search tools developed for new data New user search behavior ○ Continuous search results (non-static) Advertisers ○ Chance for more targeted advertisements

30 Conclusion  TI makes use of two concepts in their real-time search of Twitter: Selective Indexing ○ Form of partial indexing, can’t afford to index every incoming tweet due to large volume of input Ranking ○ Ranking is a known technique, but microblogging applications provide new ranking algorithms

31 Conclusion  Real-time search engines must provide: Online algorithms to handle constant input Relevant search results

32 References  TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets http://www.comp.nus.edu.sg/~ooibc/sigmod11ti.pdf  Real Time Search User Behavior http://faculty.ist.psu.edu/jjansen/academic/jansen_real_time_search.pdf  TwitterRank: Finding Topic-Sensitive Influential Twitterers http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1503&context=sis_research  Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors http://ymatsuo.com/papers/www2010.pdf  TwitterStand: News in Tweets http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.148.1477&rep=rep1&type=pdf  Learning Effective Ranking Functions for Newsgroup Search http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.5556&rep=rep1&type=pdf  TwitterSearch: A Comparison of Microblog Search and Web Search http://www.stanford.edu/~dramage/papers/twitter-wsdm11.pdf  TwitterVision http://twittervision.com/  Bing Social http://www.bing.com/social  Reak tune search on the web: Queries, topics, and economic value http://collecta.com/RealTimeSearch.pdf

33 Discussion Questions  1) What do you think is the most innovative technique in the TI approach that led to real-time microblog search results?

34 Discussion Questions  2) Given the partial indexing optimization provided in the paper, how do you think Google could optimize their indexing algorithm in order to capture the newest content on the web?

35 Discussion Questions  3) TI makes use of a ranking function in order to select tweets based on various user characteristics. What would you change about the ranking function, if anything?


Download ppt "Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology."

Similar presentations


Ads by Google