Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu.

Similar presentations


Presentation on theme: "Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu."— Presentation transcript:

1 Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search
Qingxia Liu

2 Content Motivation Definition of the Problem Generative Model
Alternative Models Experiments

3 Motivation Web Search Problems: Challenges: fact:
Determine users’ intent Keyword-search: Small set of keywords Challenges: Scalability issues; Instrumentation issues: user clicks; Brevity, ambiguity: E.g “jaguar” the car or the animal? Sparsity issues: E.g tail queries such as “jaguar xj engine mount” fact: Various queries with the same search intent issue Usage of intent information: provide relevant results to detect whether a query has a commercial intent to select a useful set of advertisements to learn from the user’s interaction with the search engine

4 Motivation Goal: Similar works:
extracting the hidden structure behind the observed search queries in a domain; with no manual intervention; e.g. “jaguar xj engine mount” <Brand,Model,Year,Part> pattern Similar works: require either direct supervision of the tasks or use ancillary information such web search click-through data we seek to solve these issues by enriching search queries with information about the hidden structure underlying them. analyze queries to obtain segmentations extract named entities from queries All these approaches require either direct supervision of the tasks, such as manually labeled seed data, or use ancillary information such web search click-through data, both of which might be expensive or difficult to obtain. 2, sume that the attribute-set as well as the associated vocabularies are given as inputs in form of database relations or entity hierarchies. detected templates for improving query recommendations

5 Model we seek to solve these issues by enriching search queries with information about the hidden structure underlying them. analyze queries to obtain segmentations extract named entities from queries All these approaches require either direct supervision of the tasks, such as manually labeled seed data, or use ancillary information such web search click-through data, both of which might be expensive or difficult to obtain. 2, sume that the attribute-set as well as the associated vocabularies are given as inputs in form of database relations or entity hierarchies. detected templates for improving query recommendations

6 Single-attribute template model
A set of words template1 attributea W1 W2 W3 W4 …… attributeb template2 template3 attributec attributed template4

7 multi-attribute template model
A set of words template1 attra attrb attrc W1 W2 W3 W4 …… template2 attrb attrc template3 attra attrc e.g. Brand Year Parts Honda Toyota Ford ……

8 Properties I: constrain the number of distinct templates that can be formed in the model II: each attribute in the template to generate at least one word III: Each attribute has a specific word distribution as well as a distinct tendency for the number of words it contributes in a query.

9 PROBLEM DEFINITION Given a set of queries, extract the underlying schema (templates, attribute, and their vocabularies) and learn the parameters of the generative process in a completely unsupervised manner while respecting the properties mentioned above.

10 Parameter generating process
attributes Vector Candidate pool θt ~ Multinomial(μ) μ ~ Dirichlet(α) t q ~ Multinomial(γ) γ ~ Dirichlet(σ) T q1 tq1 q2 tq2 q3 tq3 q4 tq4 …… One way to think of this is that the candidate pool denotes the set of template configurations which are appropriate for the domain. t1 t2 t3 …… tT

11 Generative Model template θ[tq] na ~ Possion(ηa) ηa ~Gamma(g1,g2) zq
wq1 tq1 aq1 q2 wq2 tq2 aq2 q wq tq aq q4 …… template θ[tq] attr1 attr2 attr3 attr4 …… na ~ Possion(ηa) ηa ~Gamma(g1,g2) zq zq1 zq2 zq3 …… W(z[q,i],i) ~ Mutinomial(φz[q,i]) φa~ Dirichlet(β) Query q w1 w2 w3 ……

12 Generative Model q1 tq1 q2 tq2 q tq q4 …… aq1 aq2 aq t1 t2 t3 …… tT
wq1 tq1 aq1 q2 wq2 tq2 aq2 q wq tq aq q4 …… t1 t2 t3 …… tT

13 Model Learning Gibbs sampling Bayes’ Theorem 分母:Gibbs sampling
分子:根据前面的式子计算出来 Gibbs sampling

14 Model Learning

15 Model Learning

16 Overview random initialization:
iterate over queries and the template set using the derived conditionals to update the vectors compute the likelihood p(φ, μ, γ, |data) ends with: query ->template, word -> attribute

17 Alternative approaches
Latent Dirichlet Allocation (LDA) attribute Multinomial distribution topics a topic vocabularies φ distribution the ith word a document query

18 Alternative approaches
Spherical k-Means nu,v : times wu and wv occur together in queries co-occurrence behavior clusters of words -> attributes wu nu,1 nu,2 nu,3 nu,4 …… co-occurrence behavior

19 Experiments queries: 100 million Yahoo search queries
43793,83387,15050 queries each domain ground truth:manually extracted

20 Experiments Automobile domain Travel domain Movies domain
correctly placed:the learnt attribute and the ground truth attribute it belongs to are mapped to each other. PRECISION(N) is the fraction of words in the first N learnt attributes (in the algorithm’s ordering)that are correctly placed. CORRECTRECALL(N), is the fraction of words in ground truth attributes mapped to the first N learnt attributes that are correctly placed. Movies domain

21 Experiments number of iterations number of attributes
φ acts as a prior to the attribute-word multinomial distributions g1 and g2 are used to generate the Poisson distributions for each attribute : attr - num of words different values of φ

22 Case Study applications: CTR on sponsored search advertisements
fq1 = ∑i>1qi tendency to attract ad-clicks from users for infering its advertisability

23 Thanks for listening ~


Download ppt "Unsupervised Extraction of Template Structure in Web Search Queries www 2012 – Session: search Qingxia Liu."

Similar presentations


Ads by Google