Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Joint Query Interpretation and Response Ranking Uma Sawant Soumen Chakrabarti IIT Bombay.

Similar presentations


Presentation on theme: "Learning Joint Query Interpretation and Response Ranking Uma Sawant Soumen Chakrabarti IIT Bombay."— Presentation transcript:

1 Learning Joint Query Interpretation and Response Ranking Uma Sawant Soumen Chakrabarti IIT Bombay

2 Searching the “Web of things” Lin et. al., WWW 2012 At least 14% of Web search queries mention target type or category

3 Telegraphic entity search queries Telegraphic queries with target type woodrow wilson president university dolly clone institute hermitage museum bank river lead singer led zeppelin band losing team baseball world series 1998  No reliable syntax clues for the search engine Free word order No or rare capitalization Rare to find quoted phrases Few function or relational words

4 Execution Ready Query Telegraphic NLQ Template Query Interpretation Ranking 2-stage process How to answer entity queries? (simplified view of related work) e1 e2 e3 Knowledge base

5 Telegraphic Query Our Proposal e1 e2 e3 Annotated Corpus Interpretation response Interpretation response Interpretation response Generative and Discriminative models Multiple Interpretations Joint Query Interpretation and Ranking

6 The annotated Web … By comparison, the Padres have been to two World Series, losing in 1984 and 1998. … Entity: San_Diego_Padres Type: Major_league_ baseball_teams Type: All subTypeOf instanceOf mentionOf Type hierarchy Annotated document

7 Query: losing team baseball world series 1998 Query = type hints + word matchers  Large type catalog Most query words match some type  Padres rarely co- occurs with hockey Can know this only from corpus stats Query: losing team baseball world series 1998 Incorrect type: World_Series_Hockey_teams Query: losing team baseball world series 1998

8  Large type catalog Most query words match some type  Padres rarely co- occurs with hockey Can know this only from corpus stats  Need joint type inference and snippet scoring Query: losing team baseball world series 1998 Correct Type: Major_league_baseball_teams Entity: San Diego Padres By comparison, the Padres have been to two World Series, losing in 1984 and 1998. mentionOf Word matches instanceOf Evidence snippet Query = type hints + word matchers

9 Generative model : generate query from entity San Diego Padres Major league baseball team type context E T Padres have been to two World Series, losing in 1984 and 1998  Type hint : baseball, team losing team baseball world series 1998 Z  Context matchers : lost, 1998, world series switch model q losing team baseball world series 1998

10 Choose type to describe entity Generative approach : plate diagram WZ  E T  Type description language model For each query Entity context language model Choose entity For each query word… “Switch” variables: word hints at type or is a matcher? Generate query word hints matchers

11 Discriminative model : separate correct and incorrect entities Chakrabarti San_Diego_Padres losing team baseball world series 1998 (baseball team) losing team baseball world series 1998 (baseball team) losing team baseball world series 1998 (t = baseball team) 1998_World_Series losing team baseball world series 1998 (series) losing team baseball world series 1998 (series) losing team baseball world series 1998 (t = series) : losing team baseball world series 1998 q

12 Compatibility between matchers and snippets that mention e Feature vector design inspired by generative Feature vector given query, entity, type, switches Models type prior Pr(t|e) Models entity prior Compatibility between hint words and type HintsMatchers Generative: Discriminative:

13 Discriminative framework  Non-convex formulation  Annealing algorithms  Constraints are formulated using the best scoring interpretation

14 Testbed  YAGO entity and type catalog ~0.2 million types and 1.9 million entities  Annotated corpus Web corpus having 500 million pages ~ 16 annotations per page  ~700 entity search queries TREC + INEX Converted to telegraphic form, with most probable type and answer entities

15 Experiment 1 : Entity ranking using joint inference  To reach : Human recommended type  To surpass : Most generic type in catalog (no type inference)  Entity level ndcg measure (map and mrr follow the same trend, details in paper)

16 Human > Discriminative > Generative > Generic  Generative significantly better than generic (lower) Generative fills 28% gap to human (upper)  Discriminative significantly better than generic (lower) Discriminative fills 43% gap to human (upper)  Discriminative significantly better than generative Easier to handle balance diverse scales of probabilities 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 12345678910 Rank NDCG human discriminative generative generic Human > ?? > Generic

17 Generic v/s discriminative Correct hint match & type choice cathedral claude monet painting Incorrect hint match & type choice amazing grace hymn writer

18 Discriminative better than human  Correct entity unreachable from human recommended type discriminative recovers using corpus feedback patsy cline producer producer manufacturer Discriminative Owen Bradley

19 Experiment 2 : Target Type Inference  Aggregate ranks of top-k interpretations to rank types  Compare type-level ndcg with B&N 2012 hermitage museum bank river (museum) hermitage museum bank river (river) hermitage museum bank river (building) river museum building possible target type...... k

20 Joint prediction improves type inference  Data : [B&N 2012], Dbpedia catalog  Joint prediction improves type inference too!

21 (river) + matchers Experiment 3 : joint v/s two-stage  Two-stage 1. Best type prediction from experiment (2) 2. Launch type restricted query on annotated corpus  Top m types to improve recall  Measure entity-level ndcg river museum building Stage 1 Type inference Form query (river OR museum) + matchers Ranking Stage 2 Ranking

22 Joint entity ranking ?? two-stage  Not much difference with the benefit of more types in 2-stage  Joint type prediction and ranking significantly better than 2-stage 0.2 0.3 0.4 0.5 0.6 12345678910 Rank NDCG Joint 2stage(m=1) 2stage(m=5) 2stage(m=10) Joint entity ranking better than two-stage

23 Conclusion  Large percentage of Web search queries contain a mention of the target type  Identification of target type hint words and type itself is rewarding, but non-trivial  Joint query interpretation and ranking approach significantly better than two stage  Joint prediction improves type inference  Datasets available at bit.ly/WSpxvr

24 Questions?

25 References 1) Patrick Pantel, Thomas Lin, Michael Gamon: Mining Entity Types from Query Logs via User Intent Modeling. ACL (1) 2012: 563-571 2) K. Balog and R. Neumayer: Hierarchical Target Type Identification for Entity-oriented Queries, In CIKM 2012, October 2012 3) T. Lin, P. Pantel, M. Gamon, A. Kannan, A. Fuxman: Active Objects: Actions for Entity-Centric Search, WWW 2012

26 Extra slides

27 Chakrabarti Components of the model  Entity prior (Weighted) fraction of snippets attached to an entity in the corpus  Type Generality or specificity of types  Hint-type compatibility Probability of generating hint words from a language model built using type description Hint sub-sequence matches some type name exactly  Matcher-entity compatibility Weighted fraction of snippets attached to an entity, retrieved using matchers Rarity of matchers + number of supporting snippets

28 Implementation details  Additive features One generic query executed on index, rest in memory  Pruned large search space using easy heuristics Continuous hint words

29 Not entity disambiguation in query  ymca in query refers to song or organization?  Similar to entity disambiguation in documents  Uses accompanying words  Misinterpreting target type: usually disastrous  Avoid early or hard commitment Query: ymca lyrics Query: ymca address Entity: YMCA_(song) Entity: YMCA_(org) Type: Music Type: Organization instanceOf Learn topic model

30  Better type description model  More generic query than “hint+matchers”  Entities as literals  Different models  Explore non-linear models (boosting)  List-wise loss  Use click data Future work

31 Choose type to describe entity Generative framework WZ  E T  Type description language model For each query… Entity context language model Choose entity to describe For each query word… “Switch” variables: decide if word hints at type or is a matcher Generate query word

32 Compatibility between matchers and snippets that mention e Discriminative framework Feature vector given query, entity, type, switches Models type prior Pr(t|e) Models entity prior Compatibility between hint words and type HintsMatchers Given q, score of response e is: Ranking model trained by distant supervision

33 Joint entity ranking better than two-stage  State of the art target type predictor Does not use corpus information  Pick top k types to improve type recall  Launch type- restricted query on annotated corpus  Significantly worse than joint type prediction and ranking

34 Execution Ready Query Telegraphic NLQ Template Query Interpretation Ranking 2-stage process How to answer entity queries? (simplified view of related work) e1 e2 e3 RDF tuples Annotated Corpus Tables Knowledge


Download ppt "Learning Joint Query Interpretation and Response Ranking Uma Sawant Soumen Chakrabarti IIT Bombay."

Similar presentations


Ads by Google