EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Swarm: Mining Relaxed Temporal Moving Object Clusters
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Querying for Information Integration: How to go from an Imprecise Intent to a Precise Query? Aditya Telang Sharma Chakravarthy, Chengkai Li.
Writer identification through information retrieval Ralph Niels, Franc Grootjen & Louis Vuurpijl.
1 EntityRank: Searching Entities Directly and Holistically Tao Cheng Joint work with : Xifeng Yan, Kevin Chang VLDB 2007, Vienna, Austria.
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
Distributed Search over the Hidden Web Hierarchical Database Sampling and Selection Panagiotis G. Ipeirotis Luis Gravano Computer Science Department Columbia.
Project 1 Assignment Building a mini-database for CCI in UNCC which includes entity sets: departments (CS,SIS, bioinformatics), faculties, courses given.
Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.
Using the Semantic Web for Web Searches Norman Piedade de Noronha, Mário J. Silva XLDB / LaSIGE, Faculdade de Ciências, Universidade de Lisboa.
COMP 6703 eScience Project Commercial Wiki of Academic Journal  Student : Yin Chen  Client/Technical Supervisor : Mr Tom Worthington  Academic Supervisor.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Mining in the Middle: From Search to Integration on the Web Kevin C. Chang Joint with : the UIUC and Cazoodle Teams Mining Integration Search.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Supplementing the Library Collection with Digital Content from Engineering Departments Karen Clay Stanford University.
Chapter 5: Information Retrieval and Web Search
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
Reading and Writing at the Graduate Level By Kevin Eric DePew & Julia Romberger June 26, 2007.
Combining Keyword Search and Forms for Ad Hoc Querying of Databases Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan, Jeffrey Naughton University of.
Databases & Data Warehouses Chapter 3 Database Processing.
Donghui Xu Spring 2011, COMS E6125 Prof. Gail Kaiser.
Result presentation. Search Interface Input and output functionality – helping the user to formulate complex queries – presenting the results in an intelligent.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
1 Beyond Pages: Supporting Efficient, Scalable Entity Search with Dual-Inversion Index Tao Cheng and Kevin Chang Computer.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Automated Creation of a Forms- based Database Query Interface Magesh Jayapandian H.V. Jagadish Univ. of Michigan VLDB
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
RCDL Conference, Petrozavodsk, Russia Context-Based Retrieval in Digital Libraries: Approach and Technological Framework Kurt Sandkuhl, Alexander Smirnov,
Clustering Top-Ranking Sentences for Information Access Anastasios Tombros, Joemon Jose, Ian Ruthven University of Glasgow & University of Strathclyde.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang WSDM 2010, New York, USA 1.
Chapter 6: Information Retrieval and Web Search
Search engines are used to for looking for documents. They compile their databases by employing "spiders" or "robots" to crawl through web space from.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Contextual Ranking of Keywords Using Click Data ICDE`09 Utku Irmak Vadim von Brzeski Vadim von Brzeski Reiner Kraft.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Entity Search Are you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe.
1 What is Data Mining? l Data mining is the process of automatically discovering useful information in large data repositories. l There are many other.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Kevin C. Chang. About the collaboration -- Cazoodle 2 Coming next week: Vacation Rental Search.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Vertical Search for Courses of UIUC by Jessica Bell, Alexander Loeb, Sharon Paradesi, Michael Paul, Jing Xia, Jie Zhang.
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Context-Aware Wrapping: Synchronized Data Extraction Shui-Lung Chuang, Kevin.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach Bin He Joint work with: Kevin Chen-Chuan Chang, Jiawei Han Univ.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
1 Object-Level Vertical Search CIDR, Jan 9, 2007 Zaiqing Nie Microsoft Research Asia With Ji-Rong Wen and Wei-Ying Ma.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
New data sources (such as Big Data) and Traditional Sources Work Package 2.
User Modeling for Personal Assistant
Preference Query Evaluation Over Expensive Attributes
Global Marketing Activities
Data Integration for Relational Web
Panagiotis G. Ipeirotis Luis Gravano
Teaching a Second Language:
Toward Large Scale Integration
Presentation transcript:

EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam PhD Student CSE Department, UTA

Motivating Scenario Customer service phone number of Amazon?

Search on Amazon?

Search on Google?

Many many Similar Cases The of Luis Gravano? What profs are doing databases at UIUC? The papers and presentations of ICDE 2007? Due date of SIGMOD 2008? Sale price of “Canon PowerShot A400”? “Hamlet” books available at bookstores? Often times, we are looking for data entities, e.g. s, dates, prices, etc, not pages.

What you search is not what you want.

From pages to entities Traditional SearchEntity Search Keywords Entities Results Support

Concretely, what is meant by Entity Search?

9 Entity Search Problem:   Given: Entity Collection over Document Collection  Input: where is a tuple pattern,, and is a keyword e.g. ow(David DeWitt #phone # )  Output: Ranked list of sorted by Score(q(t)), the query score of t   Given: Entity Collection over Document Collection  Input: where is a tuple pattern,, and is a keyword e.g. ow(David DeWitt #phone # )  Output: Ranked list of sorted by Score(q(t)), the query score of t Given: Input: Keywords & Entities (optionally with a pattern) E.g. Amazon Customer Service #phone Output: Ranked Entity Tuples ……

10 How to rank Entities? Challenge: Challenge:

Characteristics I: Contextual -Utilize Entities’ Surrounding Context Characteristics I: Contextual -Utilize Entities’ Surrounding Context Content Context

Characteristics II: Uncertain -Extractions are non”prefect” Characteristics II: Uncertain -Extractions are non”prefect”

Characteristics III: Holistic -Many evidences from multiple sources Characteristics III: Holistic -Many evidences from multiple sources

Characteristics IV: Discriminative - Web Pages are of Varying Quality Characteristics IV: Discriminative - Web Pages are of Varying Quality

Characteristics V: Associative -Tell True Associations from Accidental Characteristics V: Associative -Tell True Associations from Accidental Example: Finding Prof. Luis Gravano’s Observation: appears very frequently with keywords “Luis”, However, such association is only accidental as appears on many pages.

EntityRank: The Impression Model EntityRank: The Impression Model Tireless Observer ?? Access Layer: Global Aggregation Recognition Layer: Local Assessment Validation Layer: Hypothesis Testing ……

17 Recognition Layer: Local Assessment Recognition Layer: Local Assessment C ontextual U ncertain H olistic D iscriminative A ssociative Input: L1L1 L2L2 Output:

18 Access Layer: Global Aggregation Access Layer: Global Aggregation C ontextual U ncertain H olistic D iscriminative A ssociative Holistic Discriminative Output: Input:

19 Validation Layer: Hypothesis Testing Validation Layer: Hypothesis Testing C ontextual U ncertain H olistic D iscriminative A ssociative Input: Collection E over D Output: Virtual Collection E’ over D’ randomize

EntityRank: The Scoring Function EntityRank: The Scoring Function Local RecognitionGlobal Aggregation Validation

21  Sort-merge Join Query Processing 7, 33d9d9 3d7d7 10d6d6 5d3d3 8, 25d1d1 Doc Posting Doc 8, 24d7d7 66d5d5 11d3d3 Posting 44d8d8 9d7d7 12d3d3 Doc Posting AmazonCustomer Service (13, ,1.0) (78, ,1.0) d7d7 (18, ,1.0)d3d3 (42, ,0.8)d2d2 Doc Posting #phone Aggregation : p : p : p : p : p4 Hypothesis Test Result

22 Experiment Setup Experiment Setup Corpus: General crawl of the Web(Aug, 2006), around 2TB with 93M pages. Entities: Phone (8.8M distinctive instances) (4.6M distinctive instances) System: A cluster of 34 machines

23 Comparing EntityRank to the Following Different Approaches C ontextual U ncertain H olistic D iscriminative A ssociative N aïve L ocal G lobal C ombine W ithout E ntity R ank

Online Demo.

25 Example Query Results

26 Conclusions Formulate the entity search problem Study and define the characteristics of entity search Conceptual Impression Model and concrete EntityRank framework for ranking entities An online prototype with real Web corpus

Thank You ! Thank You ! Questions?

Reference EntityRank: Searching Entities Directly and Holistically. T. Cheng, X. Yan, and K. C.-C. Chang. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB 2007), pages , Vienna, Austria, September vldb07-cyc-sep07.ppt