Pete Bohman Adam Kunk. Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 Presented By Avinash Gutte Under The Guidance of Mrs. Hemangi Kulkarni Department of Computer Engineering Pimpri-Chinchwad College of Engineering, Pune.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
Search Engines & Search Engine Optimization (SEO) Presentation by Saeed El-Darahali 7 th World Congress on the Management of e-Business.
Information Retrieval in Practice
Search Engines and Information Retrieval
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Information Retrieval
Search Engine Optimization (SEO)
Overview of Search Engines
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
What’s New in Search? How destinations can leverage new search trends.
Search Engine Optimization
TwitterSearch : A Comparison of Microblog Search and Web Search
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Welcome to Social Media How to facebook, link, and tweet your way around the web.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Search Engines and Information Retrieval Chapter 1.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Search Engines & Search Engine Optimization (SEO).
How do I decide whom to follow on Twitter ? IARank: Ranking Users on Twitter in Near Real-time, Based on their Information Amplification Potential.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
A Comparison of Microblog Search and Web Search.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
The Business Model and Strategy of MBAA 609 R. Nakatsu.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Internet Skills The World Wide Web (Web) consists of billions of interconnected pages of information from a wide variety of sources. In this section: Web.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
The Road to Online Marketing. A Magic Voyage Begins!!!
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
Search Engine Optimization & Pay Per Click Advertising
Microblogs: Information and Social Network Huang Yuxin.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
The Business Model of Google MBAA 609 R. Nakatsu.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
CS315-Web Search & Data Mining. A Semester in 50 minutes or less The Web History Key technologies and developments Its future Information Retrieval (IR)
Search Engines By: Faruq Hasan.
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
Week 1 Introduction to Search Engine Optimization.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Presented by: Shahab Helmi Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Traffic Source Tell a Friend Send SMS Social Network Group chat Banners Advertisement.
Search Engine Optimization Miami (SEO Services Miami in affordable budget)
Frompo is a Next Generation Curated Search Engine. Frompo has a community of users who come together and curate search results to help improve.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Information Retrieval in Practice
SEARCH ENGINE OPTIMIZATION.
Search Engine Optimization(S.E.O)
SEARCH ENGINE OPTIMIZATION. P RESENTATION O VERVIEW  Search Engine Basics  What is SEO?  Key Concepts  Why is Search Engine marketing important? 
1 SEO is short for search engine optimization. Search engine optimization is a methodology of strategies, techniques and tactics used to increase the amount.
Information Retrieval
Data Mining Chapter 6 Search Engines
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
International Marketing and Output Database Conference 2005
Presentation transcript:

Pete Bohman Adam Kunk

Real-Time Search  Definition: A search mechanism capable of finding information in an online fashion as it is produced. Technology belonging to real-time web that enables users to receive information as soon as it is published

Real-Time Search  In terms of real-time search, what does “online” mean? Online means that a constant stream of input data is handled as it enters the system, contrary to batch processing  Bing Social Search Bing Social Search

Real-Time Search Input Data  Example of what kind of input data is considered for real-time search systems:  twittervision twittervision

Real-Time Content  Microblogging - Entirely new type of data 1. Short temporal life span 2. Little to no context 3. Simple ideas, fast reporting of events 4. Metadata: time, location, social links 5. Less factual, more opinionated 6. Static posts 7. Furious input rate 8. Often no hyperlink structure, few traditional ranking factors  Current search engines don’t take full advantage of this new data type

Real-Time vs. Conventional Search  Conventional Search Ranking Relevance Authority  Real-Time Search Ranking Relevance Temporal immediacy Popularity

Real-Time vs. Conventional Search  Conventional search input Crawl the web periodically and update index ○ Web documents evolve Incapable of crawling and indexing the entire web in real-time  Real-time search input Stream of data. No need to poll since the posts are static  What can we do with real-time search engines?

User Query Analysis  Collecta real-time search engine  Analyzed ~1 Million queries Continuous Queries ○ Monitor events by frequently resubmitting the same query Different query categories ConventionalReal-Time ShoppingCommerce EntertainmentTravel AdultEconomy

Crowdsourcing Real-Time Data  Crowd sourcing of first hand reports

Value of Real-Time Search  The estimated value of real-time search is around $33 Million Value derived from types of queries entered in real-time search systems Utilized adwords to determine worth of keywords appearing in queries

Applications of Real-Time Search  TwitterStand: Real-time news reports Example: Coverage of MJ’s death

Applications of Real-Time Search  Real-time alert systems Leverages tweet metadata (time, location) to raise alerts Earthquake localization based on tweets

Twitter Real-Time Alerts USGS Twitter Earthquake Detector

Difficulties of Real-Time Search  Two factors: Efficient indexing in order to provide for fast results Effective ranking in order to return relevant results

Indexing: RDBMS  RDBMS Indexing Indexes built on columns commonly used in queries Improves the speed of retrieval operations

Indexing: Conventional Search  Conventional Search (Inverted) Indexing Non structured data If a document does not exist in the index, it will not appear in query results

Indexing: Real-Time Search  Index stream of data Map keywords to tweets containing those keywords  Challenge Processing the stream in a timely manor ○ 5,000 tweets per second

TI Indexing  Not feasible to index every incoming tweet immediately  Selective indexing based on results that are most likely to appear in queries Distinguished tweets indexed in real-time Noisy tweets indexed by batch process

TI Tweet Classification  Observation Users are only interested in top-K results for a query  Distinguished tweets Tweet that belongs in the top-K result set of previous query  Noisy tweet Those tweets not appearing in the top-K results for any of the systems previous queries

TI Indexing  Must limit the size of the query set 1.6 Billion twitter queries per day

Query set optimization  Observation 20% of queries represent 80% of user requests  Therefore Zipf’s distribution used statistically limit the number of queries tweets were compared against

Real-Time Search Ranking  How does ranking differ from traditional web ranking? Typical web search engines rank based on links to a site, and links from a site (PageRank) Microblogging data contains social networking links ○ Followers ○ Friends ○ Re-tweets

Real-Time Search Ranking  Ranking is not necessary in RDBMS systems In RDBMS system data is strictly defined including algebraic operators Results are complete not subjective

TI Ranking  Ranking function comprised of: 1) User’s PageRank ○ Combination of user weight (defaulted to 1) and how many followers they have (popularity) 2) Timestamp (self-explanatory) 3) Similarity between tweet and the query

TI Ranking  Ranking function also comprised of: 4) Popularity of the topic Determined by large tweet trees  Popularity of tree is equal to the sum of the U-PageRank values of all tweets in the tree Tweet Tree Structure

TI Ranking Comparison TI Rank Vs. Time Rank

What are others doing?

 Facebook Real-Time Feed

Implications  New type of data not currently searchable through existing search engines New search tools developed for new data New user search behavior ○ Continuous search results (non-static) Advertisers ○ Chance for more targeted advertisements

Conclusion  TI makes use of two concepts in their real-time search of Twitter: Selective Indexing ○ Form of partial indexing, can’t afford to index every incoming tweet due to large volume of input Ranking ○ Ranking is a known technique, but microblogging applications provide new ranking algorithms

Conclusion  Real-time search engines must provide: Online algorithms to handle constant input Relevant search results

References  TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets  Real Time Search User Behavior  TwitterRank: Finding Topic-Sensitive Influential Twitterers  Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors  TwitterStand: News in Tweets  Learning Effective Ranking Functions for Newsgroup Search  TwitterSearch: A Comparison of Microblog Search and Web Search  TwitterVision  Bing Social  Reak tune search on the web: Queries, topics, and economic value

Discussion Questions  1) What do you think is the most innovative technique in the TI approach that led to real-time microblog search results?

Discussion Questions  2) Given the partial indexing optimization provided in the paper, how do you think Google could optimize their indexing algorithm in order to capture the newest content on the web?

Discussion Questions  3) TI makes use of a ranking function in order to select tweets based on various user characteristics. What would you change about the ranking function, if anything?