Presentation is loading. Please wait.

Presentation is loading. Please wait.

GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.

Similar presentations


Presentation on theme: "GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST."— Presentation transcript:

1 GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST

2 WHAT IS THE PROBLEM TO BE SOLVED? Query logs aren’t always available or the best tool to determine query suggestions Most user queries don’t provide enough information

3 WHO CARES ABOUT THE PROBLEM? Builders of search programs without query logs Users

4 WHAT HAVE OTHERS DONE? Most query suggestion work uses query logs Suggestion of alternate queries (in non-query log approaches): Adding frequent terms occurring in close proximity Auto-completion of last term N-gram suggestions Different than this paper’s approach due to ranking of possible completions by n-gram occurrence frequency

5 WHAT IS THE PROPOSED SOLUTION? Rank possible phrase completions by semantic relation Topical N-gram (TNG) model

6 RANKING OF PHRASES P is the set of phrases extracted by N-grams Q u is the user query, while Q c is the already completed portion and Q t is the uncompleted portion Q u = Q c + Q t Ranked by probability of the occurrence of Q t Use of hidden topics

7 N-GRAM MODELING Find bigrams in document corpus Concatenate to find larger N-gram phrases Creates a cleaner list More applicable to search engine use

8 EXPERIMENT DESIGN AP News and Labour news datasets Standard N-gram generation only extracted 1, 2, and 3 grams TNG-N-gram model found up to 10 grams Relevance and Diversity used to evaluate efficacy 20 test queries generated from titles of articles

9 RESULTS TNG with Probability performs better than standard N-grams TNG with hidden topics provides “topically diverse” and “semantically related” results

10 RESULTS Relevance is highest for TNGSim Diversity is also highest for TNGSim

11 RESULTS Clarity scores used to calculate retrieval effectiveness Difference between the query language model and the corpus language model Higher scores are better TNG model didn’t perform well Claim that clarity is less important than retrieving semantically related results AP News datasetLabour dataset NgramsProb4.93.5 TNGProb4.22.7 TNGSim4.232.8

12 CONCLUSION TNG model can be effectively used in system without query logs Good for domain-specific search engines


Download ppt "GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST."

Similar presentations


Ads by Google