Presentation on theme: "Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009."— Presentation transcript:
Towards Methods for the Collective Gathering and Quality Control of Relevance Assessments SIGIR´09, July 2009
Summary Motivation Overview Related Work Methodology Pilot Study Analysis and Findings Conclusions
Motivation With the advent of the technology more and more interest and use has been given to digital files, like digital books, audio, and video. These digital files present new challenges in the constructions of test collections, more specifically collecting relevance assessments to tune system performance. This is due to: The length and cohesion of the digital item Dispersion of topics within it Proposal => Develop a method for the collective gathering of relevance assessments using a social game model to instigate participants’ engagement.
Overview Test collections consist of: A corpus of documents A set of search topics And relevance assessments collected from human judges WSJ88046-0090 AT&T Unveils Services to Upgrade Phone Networks Under Global Plan Janet Guyon New York American Telephone & Telegraph Co. Introduced the first of a new.... Number: 168 Topic: Financing AMTRAK Description: A document will address the role of the Federal Government in financing the operation of the National Railroad Transportation Corporation (AMTRAK). Narrative: A relevant document must provide information on the government’s responsability to make AMTRAK an economically viable entity. It could also discuss.. Document (TREC) Topic (TREC)
Overview Test Collection Construction (in TREC): A set of documents and a set of topics are given to the TREC participants Each participant runs the topics against the documents using their retrieval system. A ranked list of the top k documents per topic are return to TREC. TREC forms pools (selects top k documents) from the participants’ submission, which are judged by the relevance assessors. Each submission is then evaluated using the resulting relevance judgment, and the evaluation results are then returned to the participants.
Related work Gathering relevance judgments: Single judge – usually the topic author assesses the relevance of documents to the given topic. Multiple judges – assessments are collected from multiple judges and are typically converted to a single score per document. In Web search judgments are collect from a representative sample of the user population. Also often user logs are mined for indicators of user satisfaction with the retrieved documents.
Related work In their approach, they extended the use of multiple assessors per topic by: Facilitating the review and re-assessment of relevance judgments Enabling the communication between judges Providing an enrich collection of relevance labels that incoporate different user profiles and user needs. This also enables the preservation and promotion of diversity of opinions.
Pilot Study Two rounds: First last 2 weeks, the second lasted 4 weekes Data: INEX 2008 Track (50,000 digitized books,17 million Scanned pages, 70 topic TREC style) Participants: 17 Participants Collected Data Highlithed document regions Binary relevance level per page Notes and comments Relevance degree assigned to a book
Analysis and Findings Properties of the methodology: Feasibility – engagement level comparable to the INEX 2003 Completeness and Exhaustiveness – 17,6% max completeness level. Semantic Unit and Cohesion – relevance information forms a minor theme of the book. Relevant content is disperse. Browsing and Relevance Decision – assessors requerie contextual information to make a decision. Influence of incentive structures Exloring vs. Reviewing Assessment Strategies Quality of the collected Data: Assessor agreement –the level of agreement is higher comparing with TREC and INEX. Annotations
Conclusions The CRA method sucessfully expanded traditional methods and introduced new concepts for gathering relevant assessment. Encourages personalized and diverse perspectives on the topics. Promotes the collection of rich contextual data that can assist with interperting relevance assessments and their use for system optimization.