Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006.

Similar presentations


Presentation on theme: "Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006."— Presentation transcript:

1 Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006

2 Key Problems Retrieval of elements at desired level of granularity Assigning a rank order to each element that reflects its perceived relevance to the query

3 Retrieval Environment Vector Space Model INEX Environment Flexible Retrieval

4 Vector Space Model Document Indexing Term Weighting Similarity Coefficients

5 INEX- Initiative for the Evaluation of XML Retrieval INEX provides an environment for experiments in structured retrieval Traditionally contains two types of topics CO and CAS Both INEX 2004 and 2005 utilize an evaluation measure known as inex-eval Recall(the proportion of relevant information retrieved) and Precision(the proportion of retrieved items that are relevant

6 Flexible Retrieval System Systems processes XML documents Smart format(Salton’s Magic Automatic Retriever of Text) Lnu-ltu term weighting

7 A Method for Flexible Retrieval Input to Flexible Retrieval Construction of the Document Tree Ranking of Elements Output of Flexible Retrieval

8 Input to Flexible Retrieval Preorder traversal Ranked terminal leaf nodes(paragraphs) Generate document tree(schema and paragraphs)

9 Document Tree

10 Construction of the Document Tree Schema determine document tree Calculate Lnu-ltu term weights

11 Ranking of Elements Address ranking issue’s with Lnu-ltu term weighting Length and normalization issue’s Pivot and slope

12 Simple structured document

13 Lnu(weight of element vector formula) (1 + log(term frequency)) ÷ (1 + log(average term frequency)) ______________________________________________ ____ (1 − slope) + slope × ((number unique terms) ÷ pivot)

14 Ltu(weighting of query terms formula) (1 + log(term frequency) × log(N ÷ nk) _______________________________________ ____ (1 − slope) + slope × ((number unique terms) ÷ pivot)

15 Overview of flexible retrieval 1. Parse to extract leaf nodes from the original XML documents 2. Index leaf nodes and queries using Smart 3. Perform Smart retrieval to get highly correlated leaf nodes

16 Overview of flexible retrieval(cont) 4. For each document containing a retrieved leaf node a. Get its document schema b. Generate vector representations for inner nodes (elements) 5. For each term in the query a. Get its inverted file entry and corresponding xpaths b. Find nk at all levels

17 Output of Flexible Retrieval Equivalent to all-element index

18 Experiments in flexible retrieval Factors of interest Experiments and results

19 Factors of interest Slope and pivot during Lnu-ltu term weighting The n(number of paragraph)

20 Experiments and Results Attendant file size(dictionary, inverted index, element vectors reduced by 60%, 50% and 50% respectively) 30%- 40% less storage than all-element index Is dynamic element retrieval Cost Effective?

21 Conclusion Similar work(Grabs and Shek) Exhaustivity dependent Progress in specifity

22 Researchers Grabs and Shek(similar work to flexible retrieval) Govert et al.(term weights are multiplied by a collection-dependent augmentation factor as they are propagated up the doc. Tree Mass et al.(maintain separate indices for element at different levels of granularity. Solves issues of distorted statistics

23 Overview of flexible retrieval(cont) 6. Correlate element vectors at each level with query 7. Return ranked list of elements

24 Table I INEX 2004 INEX 2005 article 12,107 16,440 sections 69,577 94,421 subsections 77,397 104,746 paragraphs 1,029,747 1,378,202 elements 1,188,828 1,593,809 CO Topics 40 Topics 40 Topics (34 assessed) (29 assessed)

25 Table II. Comparison of All-Element and Flexible Retrieval under Inex-Eval (Generalized) Precision at Rank 2004 2005 Rank All Element Flexible All Element Flexible 1 0.3897 0.3971 0.4224 0.4224 5 0.3088 0.2882 0.3241 0.3413 10 0.2735 0.2669 0.2991 0.2991 20 0.2529 0.2390 0.2841 0.2939 25 0.2456 0.2379 0.2669 0.2800 50 0. 2000 0.1972 0.2364 0.2366 100 0.1523 0.1501 0.1921 0.1920 500 0.0697 0.0697 0.0943 0.0949 1500 0.0353 0.0362 0.0472 0.0483

26 Table II.(cont) Precision at Various Points of Recall 2004 2005 Recall All Element Flexible All Element Flexible 0.01 0.3395 0.3348 0.3562 0.3693 0.25 0.0971 0.0951 0.1131 0.1165 0.50 0.0257 0.0283 0.0385 0.0404 0.75 0.0017 0.0017 0.0097 0.0095 1.00 0.0013 0.0013 0.0015 0.0015 avg prec 0.0625 0.0620 0.0739 0.0750

27 Table III. Comparison of All-Element and Flexible Retrieval under Inex-Eval (Strict) Precision at Rank 2004 2005 Rank All Element Flexible All Element Flexible 1 0.2000 0.2000 0.1481 0.1481 5 0.1440 0.1200 0.0667 0.0741 10 0.1240 0.1200 0.0852 0.0778 20 0.1120 0.1020 0.0815 0.0815 25 0.1024 0.0992 0.0800 0.0830 50 0.0898 0.0832 0.0689 0.0681 100 0.0628 0.0608 0.0511 0.0500 500 0.0268 0.0259 0.0219 0.0217 1500 0.0141 0.0143 0.0096 0.0097

28 Table III.(cont) Precision at Various Points of Recall 2004 2005 Recall All Element Flexible All Element Flexible 0.01 0.2134 0.2115 0.1521 0.1535 0.25 0.1006 0.1070 0.0540 0.0515 0.50 0.0411 0.0394 0.0156 0.0191 0.75 0.0166 0.0159 0.0103 0.0104 1.00 0.0042 0.0044 0.0046 0.0048 avg prec 0.0586 0.0577 0.0318 0.0335


Download ppt "Dynamic Element Retrieval in a Structured Environment Crouch, Carolyn J. University of Minnesota Duluth, MN October 1, 2006."

Similar presentations


Ads by Google