Presentation is loading. Please wait.

Presentation is loading. Please wait.

Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

Similar presentations


Presentation on theme: "Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)"— Presentation transcript:

1 Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)

2 2 Abstract Formulation of topic properties is the goal of this paper. Formulation of topic properties is the goal of this paper. Applying statistical method to both TREC-6 and TREC-7 IR results, we identify topic pairs that exemplify topic properties useful in relating topic statements to system performance. Applying statistical method to both TREC-6 and TREC-7 IR results, we identify topic pairs that exemplify topic properties useful in relating topic statements to system performance. We formulate topic properties by relating the corresponding topic statements to what is known about IR systems. We formulate topic properties by relating the corresponding topic statements to what is known about IR systems. Some properties apparent in the topic pairs identified are linked to topic expansion, and these pairs exemplify both the need for expansion and the danger in automatic expansion. Some properties apparent in the topic pairs identified are linked to topic expansion, and these pairs exemplify both the need for expansion and the danger in automatic expansion.

3 3 Topic Properties System developers would like to be able to judge topic difficulty from reasoning involving topic properties. System developers would like to be able to judge topic difficulty from reasoning involving topic properties.  Ex: a set of topic properties to describe whether several sentences are needed to narrow the topic sufficiently? There is some doubt that formulation of topic properties is even possible. There is some doubt that formulation of topic properties is even possible.  “ Little is known about what makes a topic difficult, ” conclude Voorhees and Harman in their TREC-6 overview. Our analysis of the ad hoc task shows that one can connect system performance to generic topic properties. Our analysis of the ad hoc task shows that one can connect system performance to generic topic properties.

4 4 Topic Properties The effect of topic properties on system performance depends on the document collection. The effect of topic properties on system performance depends on the document collection.  Our statistical methods might connect two topics not because the topic statements share some property but because the document collection has some unexpected characteristic.  Topic properties formulated through the TREC ad hoc task might not be as useful with other document collections. For each topic in the ad hoc task, the TREC evaluation provides system performances, a collection of numbers that might be termed a performance profile. For each topic in the ad hoc task, the TREC evaluation provides system performances, a collection of numbers that might be termed a performance profile.  Partial answer to the question “ How do topics differ? ”

5 5 Topic Properties This paper presents a pair of TREC-6 topics and three pairs of TREC-7 topics for the reader to study. This paper presents a pair of TREC-6 topics and three pairs of TREC-7 topics for the reader to study.  It ’ s hoped that the reader will be able to offer an opinion. Presentation of the four pairs involves two alternative measures of system performance and a method for decomposing the system-by-topic table of performance measurements. Presentation of the four pairs involves two alternative measures of system performance and a method for decomposing the system-by-topic table of performance measurements.  Average precision Partially depend on the number of relevant documents Partially depend on the number of relevant documents  Depth at 25 percent recall  Two-way analysis of variance

6 6 Topic Pairs for Study Statistical analysis suggests the following two pairs of TREC-7 topics for careful study. Statistical analysis suggests the following two pairs of TREC-7 topics for careful study.  372 and 379 for study with respect to the 40 best systems  372 and 391 with respect to the 14 best automatic systems that use all parts of the topic statement

7 7 Topic Pairs for Study

8 8 The reason that statistical analysis of system performance connects these topics. The reason that statistical analysis of system performance connects these topics.  The appearance of common system successes and common system failures in expansion of these topics.  Manual systems seem to do relatively well with topics 372 and 379 whereas automatic systems do relatively poorly. In trying to conceive of the topic property common to the members of these pairs, one must also recognize other topic properties in which these topics differ. In trying to conceive of the topic property common to the members of these pairs, one must also recognize other topic properties in which these topics differ.

9 9 Data Analysis Comparison of two topics in search of a common topic property involves for each topic: Comparison of two topics in search of a common topic property involves for each topic:  The statement  The performance for a group of systems Average precision, depth at 25 percent recall Average precision, depth at 25 percent recall Performance is divided into components. Performance is divided into components.  The overall component Compare two topics in terms of the overall abilities of the systems. Compare two topics in terms of the overall abilities of the systems.  The distinctive component Compare two topics in terms of deviations from these overall abilities. Compare two topics in terms of deviations from these overall abilities. The topic-by-topic similarity of the distinctive component The topic-by-topic similarity of the distinctive component

10 10 Computation of the Components Let N s be the number of systems, N t be the number of topics, and y ij be the performance measure for system i and topic j. Let N s be the number of systems, N t be the number of topics, and y ij be the performance measure for system i and topic j. The difficulty of topic j is: The (centered) average performance of system i is: Variation from topic to topic in the effect of overall system abilities:

11 11 Computation of the Components The overall component is given by: The overall component is given by: The remainder is: The remainder is: The remainder reflects interactions, cases where after adjustment for overall performance, one system is better than another for one topic but not for another topic. The remainder reflects interactions, cases where after adjustment for overall performance, one system is better than another for one topic but not for another topic.  The remainder is not easy to interpret because it ’ s noisy. The distinctive component = the remainder excluding noise The distinctive component = the remainder excluding noise

12 12 Overall Component for Pair-1 Topic 372= “ Native American casino ”, 379= “ mainstreaming ” Topic 372= “ Native American casino ”, 379= “ mainstreaming ” “ t ” =title, “ d ” =description, “ s ” =title+description, “ l ” = “ s ” +narrative, “ m ” =manual “ t ” =title, “ d ” =description, “ s ” =title+description, “ l ” = “ s ” +narrative, “ m ” =manual

13 13 Distinctive Component for Pair-1 Too many query types …

14 14 Interpretation of the Data System-to-system variation in the overall component reflects the average performance. System-to-system variation in the overall component reflects the average performance. There is discrepancy between the two measures. There is discrepancy between the two measures.  AP depends on # of rel-docs. There is little to suggest that one topic is easier than the other. There is little to suggest that one topic is easier than the other.

15 15 Interpretation of the Data Figure 4 shows that 372-379 share one or more topic properties related to challenges in topic expansion. Figure 4 shows that 372-379 share one or more topic properties related to challenges in topic expansion.  Manual vs. automatic

16 16 Components for 372-391 Automatic systems that use the entire topic statement. Automatic systems that use the entire topic statement. What features of these systems cause them to favor topics 372 and 391 when overall performance of these systems is variable, both better and worse than the average? What features of these systems cause them to favor topics 372 and 391 when overall performance of these systems is variable, both better and worse than the average?

17 17 The Distinctive Component A N s × N t matrix with elements A N s × N t matrix with elements Analysis of the residual matrix for choosing topic pairs Analysis of the residual matrix for choosing topic pairs  Singular value decomposition  Approximation  Optimization  …

18 18 Topic 352 and 385

19 19 Components for 352-385 The title-only systems perform relatively better than description-only systems in terms of the distinctive component. The title-only systems perform relatively better than description-only systems in terms of the distinctive component. The key noun phrase differs between title and description. The key noun phrase differs between title and description.

20 20 Topic 312 and 316

21 21 Components for 312-316 Very specific key words: “ hydroponics ” and “ polygamy ”. Very specific key words: “ hydroponics ” and “ polygamy ”. The manual systems that did well seem better able to take advantage of these key words than automatic systems. The manual systems that did well seem better able to take advantage of these key words than automatic systems.

22 22 Conclusions This paper offers four pairs of topics chosen by statistical methods. This paper offers four pairs of topics chosen by statistical methods. Faced with the challenge of hypothesizing what each pair has in common, it seems that one has some basis for a response. Faced with the challenge of hypothesizing what each pair has in common, it seems that one has some basis for a response. One topic property that seems to have surfaced is the need for parsimonious expansion. One topic property that seems to have surfaced is the need for parsimonious expansion.

23 23 Depth@.25R Measure of depth at 25 percent recall = -log(r.25 )


Download ppt "Topic by Topic Performance of Information Retrieval Systems Walter Liggett National Institute of Standards and Technology TREC-7 (1999)"

Similar presentations


Ads by Google