Presentation is loading. Please wait.

Presentation is loading. Please wait.

EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.

Similar presentations


Presentation on theme: "EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton."— Presentation transcript:

1 EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton

2 Traditional Integrated Interface Domain list Integrated interface of Job Q Manually

3 What does EasyQuerier look like EasyQuerier EasyQuerier …… Integrated interface of Job Q Q Q Manually Automatically

4 New Features of EasyQuerier Automatically domain mapping Automatically domain mapping User do not need to select domain from long list User do not need to select domain from long list More flexible Keyword Query More flexible Keyword Query Different kinds of data type Different kinds of data type Text, numeric, currency, date Text, numeric, currency, date More logic relation covered More logic relation covered “ and ”, “ or ”, “ between … and ” “ and ”, “ or ”, “ between … and ” Q1: New York or Washington, education, $2000-$3000 U 1 ={}, logic: or U 1 ={New York, Washington}, logic: or U 2 ={education} U 2 ={education} U 3 ={$2000, $3000}, logic: range U 3 ={$2000, $3000}, logic: range Automatically query translation Automatically query translation

5 EasyQuerier: overview Part 1: Domain Map Part 1: Domain Map Collect the domain knowledge from candidate domains Collect the domain knowledge from candidate domains Similarity based domain mapping strategy Similarity based domain mapping strategy Part 2: Query translation Part 2: Query translation Partially Keyword-attribute map Partially Keyword-attribute map Holistically Keyword-attribute map Holistically Keyword-attribute map

6 Challenge 1: Domain Mapping Problem statement Problem statement Map a user query to the correct domain automatically without domain information to be separately entered. Our solution Our solution Domain representation model Term weight assignment Query-domain similarity

7 Domain mapping(1) Domain representation model D = d_ID: unique domain identifier. CT = {ct i |i=1,2, … } is a set of Conceptual Terms, which describe the whole domain concept AT = ∪ A ∈ D DAL(d_ID, A i ) is a set of Attribute Label Terms consisting of attribute labels of the products in this domain InteLabel, LocalLabel, OtherLabel VT = ∪ A ∈ D DAV(d_ID, A i ) is a set of the Value Terms associated with the products ’ attributes in the domain Text Attribute: inteValue, LocalValue, Other Value Non-text Attribute: VT can be characterized by the pre-defined ranges available on the integrated interfaces.

8 Domain mapping(2) Different terms have different ability to differentiate the domains. “ price ” is less powerful than “ title ” in differentiating the book from others Term weight assignment Term weight assignment Adopt idea of CVV, Adopt idea of CVV, used to measure the skew of the distribution of terms across all document databases If ij means how many If ij means how many times t j appears in either AT or VT in D i CVV j as the CVV for t j Weight(D i t j ) = CVV j * if ij.

9 Domain mapping(3) Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … } Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … } Q1 example Q1 example U 1 = {}, v i 1 ={New York}, v i 2 = {Washington} U 1 = {New York, Washington}, v i 1 ={New York}, v i 2 = {Washington} For each term tj in VT or AT For each term tj in VT or AT we only record the most matching term tj we only record the most matching term tj = =

10 Challenge 2: Query translation Problem statement Problem statement Translate the query to the integrated interface Translate the query to the integrated interface Just like filling the integrated interface with a set of keywords Just like filling the integrated interface with a set of keywords Computation model Computation model Def 4.1 (Keyword-Attribute Matching (KAM)). KAM(u,A). Def 4.2 (Degree of Matching (DM)). For each KAM is has a matching degree. Def 4.3 (Query Translation Solution (QTS)) A QTS represents a strategy of filling in the query interface. A QTS is comprised of several KAMs. Def 4.4 (Conviction) This measurement determines whether a QTS is reasonable. The larger the DM of a KAM, the more reasonable the KAM is. Such KAMs combined together will generate optimal QTS

11 Query translation(1) Computation of DM Computation of DM Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … } For Q = {u 1, u 2, …, u n }, u i ={v i 1, v i 2, … }, Sim(v x i, A j ) is the maximum value of all Sim(v x i,t j ) Where the t j in the VT of A j, Sim(v x i,t j ) (same as domain map)

12 Query translation(2) Conviction Conviction Conviction value of a QTS is a weighted sum of the DMs of the related KAMs Why weight? If an attribute appears in more local interfaces of a domain, it is more important in the domain. weight w(A j ) for each attribute A j based on its interface frequency if i For an attribute within the domain D

13 Experiment Settings Settings 9 domains, each covers 50 web databases 9 domains, each covers 50 web databases 10 students, 20 keyword queries for each domain 10 students, 20 keyword queries for each domain Measurement Measurement Correct/acceptable/wrong Correct/acceptable/wrong Overall/with domain/with attribute label/value only Overall/with domain/with attribute label/value only Fig1: domain mapping accuracy Fig2: query translation accuracy

14 Conclusion In this paper, we proposed a novel keyword based interface system EasyQuerier for ordinary users to query structured data in various Web databases. We developed solutions to two technical challenges map keyword query to appropriate domains translate the keyword query to a query for the integrated search interface of the domain

15 Thank you~ Thank you~


Download ppt "EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton."

Similar presentations


Ads by Google