Presentation is loading. Please wait.

Presentation is loading. Please wait.

DY 20061 An Information Retrieval Approach based on Discourse Type D. Y. Wang, R. W. P. Luk, K.F. Wong 1 and K.L. Kwok 2 NLDB 2006 Department of.

Similar presentations

Presentation on theme: "DY 20061 An Information Retrieval Approach based on Discourse Type D. Y. Wang, R. W. P. Luk, K.F. Wong 1 and K.L. Kwok 2 NLDB 2006 Department of."— Presentation transcript:

1 DY Wang @ 20061 An Information Retrieval Approach based on Discourse Type D. Y. Wang, R. W. P. Luk, K.F. Wong 1 and K.L. Kwok 2 NLDB 2006 Department of Computing The Hong Kong Polytechnic University 1 Department of Systems Engineering and Engineering Management The Chinese University of Hong Kong 2 Department of Computer Science City University of New York

2 DY Wang @ 20062 Content  Introduction Motivation Discourse Type Information Unit  Problem Formulation Score of topic terms Score of discourse type Document Re-ranking  Experimental Results  Conclusion

3 DY Wang @ 20063 Motivation  The effectiveness of information retrieval (IR) systems varies substantially from one topic to another.  One reason: Users’ Information need is very diverse  Our approach: finding the discourse type of the topic and adopt appropriate strategy

4 DY Wang @ 20064 Discourse Type Q No.Information Need (TREC query) Independent Entity Discourse Type 654What are the advantages and disadvantages of same-sex schools? same-sex school advantages and disadvantages 436What are the causes of railway accidents throughout the world? railway accident cause  Definition of discourse type: The functions (including properties and relations that cannot exist independently) of the independent entities

5 DY Wang @ 20065 Performance Difference Discourse TypeNumberMAPVariance Treatment30.1440.007 Concrete things320.1600.017 Advantage / Disadvantage80.2040.043 Reason130.2190.021 Objection70.2310.050 Number100.2530.042 General Information320.2630.048 Steps (solution)130.2790.076 Abstraction160.3020.053 Impact110.3330.020 Procedure70.4190.029 Average=0.2768

6 DY Wang @ 20066 Why Choose “ Advantage / Disadvantage ” as our example?  Its performance is worse than the average 0.204 v.s. 0.277  It is relatively abstract and therefore it is unlikely to be investigated before. Compared with concrete things (e.g. people, country)  It is related to some cue phrases (e.g., “more than”) that are composed of stop words. Conventional IR ignores stop words

7 DY Wang @ 20067 Why Choose “ Advantage / Disadvantage ” as example? (cont.)  It is a popular discourse type of information need. we found that there are at least 40 questions that are asking about advantages and disadvantages of something at a website (  It has a reasonable amount (i.e., eight) of TREC topics for investigation See next slide

8 DY Wang @ 20068 Eight Queries with discourse type Advantage / Disadvantage Query No.Query Title 308 Implant Dentistry 605 Great Britain health care 608 taxing social security 624 SDI Star Wars 637 human growth hormone (HGH) 654 same-sex schools 690 college education advantage 699 term limits

9 DY Wang @ 20069 Information Unit (IU) …………........................ term1........................ ……………............................................................. ……………................................... term2................. ……………...... term1.............................................. A document t w words

10 DY Wang @ 200610 Why IU?  Assumption: terms inside an IU (around topic terms) are more important to relevance of document than the terms outside the IU  Simplify the processing of the documents Compute score for each IU Aggregate the scores of all IU as the score of the document

11 DY Wang @ 200611 Score of Topic Terms  sumtf = 4  Dtf = 3 (d: distinct) Graph-based Model:  atS3 = 1/1+1/5+1/3  atS4 = 1/5+1/3 1 53

12 DY Wang @ 200612 Example: Score of Discourse Type FT923-7641: more companies to adopt a high- performance model of work organization giving more responsibility to entry-level employees it has also backed > reforms aimed at improving preparation for work mr clinton differs only in supporting more radical efforts to make employers train  more (comparative words)=3 support=[' back ',' confirm ',' contest ',' contrari ',' defend ',' encourag ',' endors ',' object ',' oppon ',' oppos ',' opposit ',' prove ',' quibbl ',' refer ',' sponsor ',' support '] ( from )  support =2

13 DY Wang @ 200613 Documents Re-ranking  IU score before re-ranking: S 0 S 0 : similarity score of the document that contains the IU  IU re-ranking score S’ S’= S 0 * score of topic terms S’= S 0 * score of discourse type S’= S 0 * score of topic term* score of discourse type  Aggregate the re-ranking score of all IUs in a document as the final score of the document.  Re-rank the documents by the final score.

14 DY Wang @ 200614 Re-ranking Results in MAP OriginalTopicDiscourseboth QIDBM25atS3cd4-8c2S2dc2S2d 308.459.256.513.693.547 605. 608. 624. 637.435.508.505.433.514 654. 690.001.003.005 699.353.445.441.381.433 Mean. p<= (Wilcoxon).

15 DY Wang @ 200615 Conclusion  Re-ranking based on topic terms and discourse type can both improve the retrieval performance.  Combining above two can improve the results most significantly (at 95% confidence level, already considering the sample size).  This approach is promising and is worth further investigation. Acknowledgement: We thank the Center for Intelligent Information Retrieval, University of Massachusetts, for facilitating Robert Luk to develop the basic IR system, when he was on leave there. This work is supported by the CERG Project # PolyU 5226/05E.

16 DY Wang @ 200616

Download ppt "DY 20061 An Information Retrieval Approach based on Discourse Type D. Y. Wang, R. W. P. Luk, K.F. Wong 1 and K.L. Kwok 2 NLDB 2006 Department of."

Similar presentations

Ads by Google