Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting of search-as-you-type using sql in databases

Similar presentations


Presentation on theme: "Supporting of search-as-you-type using sql in databases"— Presentation transcript:

1 Supporting of search-as-you-type using sql in databases
Guoliang Li, Chen Li

2 introduction A search as you type system computes answer on the fly as user types in keyword query character by character Finds the solution for both multi-keyword queries and single keyword queries Finds how to support search as you type in relational database using SQL Example:is Netflix

3 contents Introduction Preliminaries Problem formulation
3 contents Introduction Preliminaries Problem formulation Exact search for single keyword Fuzzy search for single keyword Supporting multi keyword queries Supporting first N queries Supporting updates efficiently Related work conclusion

4 preliminaries Let T be the relational table A1,A2,…….,Al are the attributes R={r1,r2…..,rn} is collection of records ri[A j]-denotes content of the record ri in the attribute Aj W is the set of tokenized keywords in R

5 r1 Relational table(T) r2 r3 r4 r5 r6 r7 ID Title Authors Book title
Year r1 K-Auto morphism: A General Framework for Privacy Preserving Network Publication Lei Zou, Lei Chen, M. Tamer O¨ zsu PVLDB 2009 r2 Privacy-Preserving Singular Value Decomposition Shuguo Han, Wee Keong Ng, Philip S. Yu ICDE r3 Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri SIGMOD r4 Privacy-preserving Indexing of Documents on the Network Mayank Bawa, Roberto J. Bayardo, Rakesh Agrawal, Jaideep Vaidya VLDBJ r5 On Anti-Corruption Privacy Preserving Publication Yufei Tao, Xiaokui Xiao, Jiexing Li, Donghui Zhang 2008 r6 Preservation of Proximity Privacy in Publishing Numerical Sensitive Data Jiexing Li, Yufei Tao, Xiaokui Xiao r7 Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking Feifei Li, Jimeng Sun, Spiros Papadimitriou, George A. Mihaila, Ioana Stanoi SIGIR 2007

6 Search as you type for single keyword queries
6 Search as you type for single keyword queries Exact search: as user types a single partial key word w character by character,search as you type system finds the record that contain keyword with a prefix w It is also known as prefix search eg: if user types in a query “sig”, the system returns Returns records r3,r6 and r7

7 search as you type for single keyword queries
Fuzzy search: As user types in a single partial keyword w character by character ,system finds records with keywords similar to the query keyword Eg:when we types in query “corel” ,record r7 is relevant answer since it contain keyword”correlation” Edit distance measure is used for finding similarity between strings ed(s1,s2)

8 Search as you type for multi-keyword queries
Exact search: given a multi-keyword query Q with m keywords w1,w2,…wm Wm is considered as partial keyword and others are complete keyword Fuzzy search:finds the record that contains keyword similar to the complete keyword and a keyword with a prefix similar to the partial keyword w Wm is the partial keyword Using the edit distance

9 Different approaches for search as you type
1.Use separate application layer It can achieve high performance 2.Use database extenders not safe method to query engine Depends on API of specific database 3.Use SQL More compatible since it is using standard SQL More portable to different platforms than other method

10 Exact search for single keyword
No index methods Index based methods Issues a SQL query that scans each records and verifies whether the record is an answer to the query 1.Calling user defined functions (UDF) Add functions to the database 2.Using LIKE predicate LIKE predicate allow users to perform string matching LIKE predicate is used to check whether a record contain the query keyword

11 Index based method Building auxiliary tables as index structures
Inverted index table Prefix table Inverted- index table Inverted Index table (IT)- records are in the form<kid,rid> Assigning unique id to the keywords in Table(T) Kid-id of the keyword Rid-is the id of the record

12 Prefix table Build prefix table with records in the form<p,lkid,ukid> P-prefix of the keyword lkid-smallest id of the keyword with prefix p Ukid-largest id of the keyword with prefix p Prefix table can be used to find the range of keyword with the prefix Eg:the ids of the keyword with prefix “sig” must be In the range [k6,k7] <“sig”,k6,k7>

13 Steps for exact search For given partial keyword w
Get the keyword range [lkid,ukid] using prefix table Find the records that have a keyword in the range through the inverted index table IT Then use the following SQL to answer the prefix- search query w SELECT T .* FROM PT , IT , T WHERE PT .prefix = “w” AND PT .ukid ≥ IT .kid AND PT .lkid ≤ IT .kid AND IT .rid = T.rid.

14 The inverted-index table and prefix table.
lkid ukid ic k1 k2 p k3 k6 pr k4 pri K4 pu k5 pv K6 pvl kid keyword k1 icde K2 icdt K3 preserving K4 privacy K5 publishing K6 sigmoid k7 sigir kid rid K2 r10 K5 r6 r8 k6 r1 K7 r9 k8 r3

15 fuzzy search for single keyword
No index methods Using UDF(user defined function) for supporting prefix search LIKE predicate does not support fuzzy search PED (w,s)-returns minimal edit distance between w and the prefix of the keyword in s Eg:PED(“pvb”,r10[title])=PED(“pvb”, “privacy in database publishing”)=1 Uses the edit distance threshold and forms new UDF PEDTH(w,s,t) if edit distance is within t then UDF returns true

16 Index based method Using the inverted index table and prefix table to support search as you type Find the similar prefixes from the prefix table(PT) Finds the keyword ranges of similar prefixes Finds the answer based on the keyword ranges using inverted index table Method to finding similar prefixes Using UDF Gram-based method Neighborhood generation method

17 Using UDF Using a UDF to find similar prefixes
Issues an SQL query which scans each prefix in PT and calls the UDF to check if the prefix is similar to w Gram based method q-gram based methods are used for string matching For a string “s” its q-grams are substrings with length q

18 Neighborhood generation method
i-deletion neighborhood- for a given keyword w, the substring of w by deleting ”i” characters is called i-deletion neighborhood Di(w) denotes i-deletion neighborhood For eg: for a keyword “pvldb” D0(pvldb)={pvldb} D1(pvldb)={vldb,pldb,pvdb,pvld,pvld}

19 Supporting multi keyword queries
Computing answer from scratch Q –multi keyword query with m keyword w1,w2,..wm Using an “INTERSECT” operator first find the record for each keyword use the “INTERSECT”operator to join the records 2. Using full text indexes first find the records matching the first m-1 complete keyword Find the record matching last prefix keyword Join the result Use of CONTAINS command

20 Word level incremental computation
User types query Q with keyword w1,w2..wm Temporary table CQ is created to cache the record ids of the query Q if user types new keyword and submit a new query Q1 ,temporary table is used to incrementally answer the query two types of word level incremental computation Exact search Fuzzy search

21 Supporting first n queries
As user types in a query character by character system gives the first-N results as instant feedback Exact first-N queries “LIMIT “Syntax is used which returns first N results from the database E.g.:”LIMIT n1,n2” which returns n2 rows starting from n1th row Fuzzy first-N queries Fuzzy search for single keyword query w, find the result with edit distance=0 If there is N answer, terminate the execution otherwise progressively increase the edit distance threshold and select the record with edit distance threshold 1,2,..t until get the N answer

22 Supporting updates efficiently
Data updates includes deleting and insertion of records Insertion When a record is inserted then first assign it a new record id Insert each keyword in the inverted index table If prefix is not in the prefix table add prefixes to the prefix table Deletion If a record is deleted inverted index table use a bit to denote whether the record is deleted We can use the deleted prefix ids for future insertions

23 Related work 1.Auto completion and search as you type An auto completion system can predict a word or phrase that a user may type in next based on partial string the user has already typed 2.Approximate string search and similarity join In this method, Given a set of String and query string,all string in the set are similar to the query string 3.Keyword search in database

24 conclusion Need to leverage the existing DBMS functionalities to meet the high performance requirement achieve an interactive speed To support prefix matching,there is proposed solutions that use auxiliary tables as index structures Extended the techniques in the case of fuzzy queries Proposed incremental computation techniques to answer multi-keyword queries Studied how support first N queries

25 references [1] S. Agrawal, K. Chakrabarti, S. Chaudhuri, and V. Ganti. Scalable ad-hoc entity extraction from text collections. PVLDB, 1(1):945–957, 2008. [2] S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, pages 5–16,2002.

26 thank you


Download ppt "Supporting of search-as-you-type using sql in databases"

Similar presentations


Ads by Google