Chen Li ( 李晨 ) Chen Li Scalable Interactive Search NFIC August 14, 2010, San Jose, CA Joint work with colleagues at UC Irvine and Tsinghua University.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advanced Piloting Cruise Plot.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
and 6.855J Spanning Tree Algorithms. 2 The Greedy Algorithm in Action
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
2010 fotografiert von Jürgen Roßberg © Fr 1 Sa 2 So 3 Mo 4 Di 5 Mi 6 Do 7 Fr 8 Sa 9 So 10 Mo 11 Di 12 Mi 13 Do 14 Fr 15 Sa 16 So 17 Mo 18 Di 19.
Richmond House, Liverpool (1) 26 th January 2004.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
15. Oktober Oktober Oktober 2012.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
“Start-to-End” Simulations Imaging of Single Molecules at the European XFEL Igor Zagorodnov S2E Meeting DESY 10. February 2014.
BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
© 2012 National Heart Foundation of Australia. Slide 2.
LO: Count up to 100 objects by grouping them and counting in 5s 10s and 2s. Mrs Criddle: Westfield Middle School.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Equal or Not. Equal or Not
Slippery Slope
H to shape fully developed personality to shape fully developed personality for successful application in life for successful.
Presenteren wij ………………….
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
VPN AND REMOTE ACCESS Mohammad S. Hasan 1 VPN and Remote Access.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Essential Cell Biology
How Cells Obtain Energy from Food
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
Chen Li ( 李晨 ) Chen Li Search As You Type Joint work with colleagues at UCI and Tsinghua.
Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1, Guoliang Li 2, Chen Li 1, Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University.
CpSc 3220 Designing a Database
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
Presentation transcript:

Chen Li ( 李晨 ) Chen Li Scalable Interactive Search NFIC August 14, 2010, San Jose, CA Joint work with colleagues at UC Irvine and Tsinghua University. Bimaple Technology

Haiti Earthquake 2010  7.0 Mw earthquake on Tuesday, 12 January  3,000,000 people affected  230,000 people died  300,000 people injured  1,000,000 people made homeless  250,000 residences and 30,000 buildings collapsed or damaged. 2

Person Finder Project 3

Search Interface 4

Search Result: “daniele” 5

Search Result: “danellie” 6

A more powerful search interface developed at UCI 7

Full-text, Interactive, Fuzzy Search 8

Embedded search widget (a news site in Miami) 9

Scalability demo: iPubMed on 19M records 10

Interactive Search Find answers as users type in keywords  Powerful interface  Increasing popularity of smart phones 11

Outline  A real story  Challenges of interactive search  Recent research progress  Conclusions 12

Challenge 1: Number of users Single-user environment Multi-user environment 13

Performance is important!  < 100 ms: server processing, network, javascript, etc  Requirement for high query throughput  20 queries per second (QPS)  50ms/query (at most)  100 QPS  10ms/query 14

Challenge 2: Query Suggestion vs Search Query suggestion Search 15

Challenge 3: Semantics-based Search Search “bill cropp” on 16

Challenge 4: Prefix search vs full-text search Search on apple.com Query: “itune” Query: “itunes music” 17

Outline  A real story  Challenges of interactive search  Recent research progress  Conclusions 18

Recent techniques to support two features  Fuzzy Search: finding results with approximate keywords  Full-text: find results with query keywords (not necessarily adjacently) 19

20  Ed(s1, s2) = minimum # of operations (insertion, deletion, substitution) to change s1 to s2 s1: v e n k a t s u b r a m a n i a n s2: w e n k a t s u b r a m a n i a n ed(s1, s2) = 1 Edit Distance 20

Problem Setting  Data  R: a set of records  W: a set of distinct words  Query  Q = {p 1, p 2, …, p l }: a set of prefixes  δ: Edit-distance threshold  Query result  R Q : a set of records such that each record has all query prefixes or their similar forms 21

Feature 1: Fuzzy Search 22

Formulation Record Strings wenkatsubra  Find strings with a prefix similar to a query keyword  Do it incrementally! venkatasubramanian carey jain nicolau smith Query: 23

Trie Indexing Computing set of active nodes Φ Q  Initialization  Incremental step e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance examp2 exampl1 example0 exempl2 exempla2 sample2 Active nodes for Q = example e

Initialization and Incremental Computatin  Q = ε e x a m p l $ $ e m p l a r $ t $ s a m p l e $ PrefixDistance PrefixDistance 0 e1 ex2 s1 sa2 PrefixDistance ε 0 Initializing Φ ε with all nodes within a depth of δ e 25

Feature 2: Full-text search  Find answers with query keywords  Not necessarily adjacently 26

Multi-Prefix Intersection IDRecord 1Li data… 2data… 3data Lin… 4Lu Lin Luis… 5Liu… 6VLDB Lin data… 7VLDB… 8Li VLDB… d a t a $ l i nu $ u $ v l d b $ $ i s $ 1818 $ li vldb 6 8  Q = vldb li More efficient algorithms possible… 27

Conclusions Interactive Search: Kill the search button 28

Thank you! Chen Li 29