Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006

2 Introduction Query expansion (QE) is a method for improving the effectiveness of IR Query expansion (QE) is a method for improving the effectiveness of IR –by providing additional contextual information to the original queries Traditional passage retrieval algorithms perform a density based weighting of query terms Traditional passage retrieval algorithms perform a density based weighting of query terms –prefer passages containing query terms that are close together

3 Introduction Local Context Analysis (LCA) [Croft, 1996] Local Context Analysis (LCA) [Croft, 1996] –A common QE technique based on term co- occurrence statistics –utilizes only statistical information instead of semantic information –unable to differentiate between noisy and good quality expansion terms

4 Introduction [Cui et al., 2005] [Cui et al., 2005] –The use of a fuzzy dependency relation matching method for passage retrieval –significant improvement in MRR over the density based passage retrieval systems –This work points towards the importance of performing syntactical analysis –The longer queries benefit more from this method Query expansion is needed for short queries Query expansion is needed for short queries

5 Introduction The main contribution of this paper is employing a relation based model to perform: The main contribution of this paper is employing a relation based model to perform: –contextual term selection to enhance density based passage retrieval –relation extraction to enhance the fuzzy dependency relation matching approach To make the expansion process more robust, it extracts relations and terms from external corpus (web). To make the expansion process more robust, it extracts relations and terms from external corpus (web).

6 Query Expansion Based on Dependency Relation Fig. Framework of Relation Based Query Expansion

7 Dependency Relation Paths from Web Snippets The Web is considered as a parallel corpus: The Web is considered as a parallel corpus: 1.Send the queries to Google and collect the top k snippets 2.Each sentence is considered as a passage, and each snippet contains 2 sentences on average (k=100, similar to LCA [Croft, 1996]) 3.Use Minipar, a dependency grammar parser, to parse the passages.

8 Examples of Parse Tree Fig. The parse trees of the sample question and sentence, is a relation path. The directions of relations are ignored in experiments.

9 Term Expansion for Density Based Retrieval System (1/2) Ranking candidate expanded terms Ranking candidate expanded terms –A variant formula of that in LCA –Global importance IDF of the expanded term IDF of the expanded term –Local importance The relation path linking to the query term The relation path linking to the query term Adding the top k terms to the original query with weight (1-0.9*i /k) Adding the top k terms to the original query with weight (1-0.9*i /k)

10 Term Expansion for Density Based Retrieval System (2/2) where Tk = the term to be ranked; idf Tk =max(1.0, log 10 (N ／ N T k ));idf ti =max(1.0, log 10 (N ／ N t i )); p j = the jth passage in the passage set P; score(Rel i ) = the score of individual relation which is obtained through training δ is set to 0.1 to avoid zero values

11 Relation Based Retrieval Method (RBM) RBM is used to perform passage re-ranking based on the initial retrieval result obtained by the density based method (DBM). RBM is used to perform passage re-ranking based on the initial retrieval result obtained by the density based method (DBM). The similarity between passage S and Q is computed by finding all possible relation path pairs (P S, P Q ) from S and Q that have the same starting and ending nodes. The similarity between passage S and Q is computed by finding all possible relation path pairs (P S, P Q ) from S and Q that have the same starting and ending nodes. The translation probability Prob(P S |P Q ) is the sum over all possible alignments: The translation probability Prob(P S |P Q ) is the sum over all possible alignments:

12 Relation Path Expansion A technique to be used on top of the fuzzy relation based retrieval [Cui, 2005] A technique to be used on top of the fuzzy relation based retrieval [Cui, 2005] The path expansion technique extracts additional relation paths linking the expanded terms with original query terms. The path expansion technique extracts additional relation paths linking the expanded terms with original query terms. Select the path associated with T k that has the maximum path_score(T k,t,j) to be expanded, weighted by (1-0.9*i /k) Select the path associated with T k that has the maximum path_score(T k,t,j) to be expanded, weighted by (1-0.9*i /k)

13 Model Training Retrieve the top 100 snippets from Google for each Q i. Retrieve the top 100 snippets from Google for each Q i. A path in the snippets is “ relevant ” if A path in the snippets is “ relevant ” if –The relevant paths are those inferring a useful term to the question. Employ unigram language model to train the weight of each relation: Employ unigram language model to train the weight of each relation:

14 Evaluations The evaluations aim to verify three hypotheses The evaluations aim to verify three hypotheses 1.It’s effective to incorporate dependency relation based query expansion technique to select high quality terms in a density based method. 2.The use of dependency relation based query expansion technique to extract relation paths further improves the precision of passage ranking when integrated with fuzzy relation matching method. 3.As short queries with fewer key terms are likely to have word mismatch problems

15 Experiment Setup Training data Training data –10,255 factoid QA pairs from TREC-8 and TREC-9 QA tasks –The top 100 snippets from Google for each question –8,892 relevant paths extracted Testing data Testing data –The AQUAINT news corpus –324 factoid questions in TREC-12 QA task Excluding 30 questions with NIL answers and 59 questions that do not have any ground truth passages Excluding 30 questions with NIL answers and 59 questions that do not have any ground truth passages 5 Comparison systems 5 Comparison systems –DBS, DBS+LCA, DBS+DRQET, RBS, RBS+DRQER

16 Experiment Result-1 Table 1. Overall performance comparison. All improvements are significant.

17 Experiment Result-2 Fig. MRR before and after query expansion vs. number of non-trivial question terms.

18 Experiment Result-3 Testing dataset 2: 356 short queries in TREC-11 and TREC-12 QA tasks Testing dataset 2: 356 short queries in TREC-11 and TREC-12 QA tasks The improvement is more significant than that in table 1. The improvement is more significant than that in table 1. DBS+DRQET performs better than RBS. DBS+DRQET performs better than RBS.

19 Conclusion and Future Work Two dependency relation based query expansion techniques, DRQET and DRQER, are presented. Two dependency relation based query expansion techniques, DRQET and DRQER, are presented. The experimental results show that RBS+DRQER performs best among the 5 systems. The experimental results show that RBS+DRQER performs best among the 5 systems. We also studied the relationship between query lengths and improvements by query expansion. We also studied the relationship between query lengths and improvements by query expansion. Directions for future work: (1) explore the use of different models and their combinations for relation based query expansion; (2) conduct detailed analysis on the performance of RBS+DRQER on different types of queries. Directions for future work: (1) explore the use of different models and their combinations for relation based query expansion; (2) conduct detailed analysis on the performance of RBS+DRQER on different types of queries.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Similar presentations

Presentation on theme: "Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Similar presentations

Presentation on theme: "Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006."— Presentation transcript:

Similar presentations

About project

Feedback