 TDT PI Meeting - November 16-17, 2000 Annotation Overview  Background  annotation strategy search-guided complete annotation work with one topic at.

Slides:



Advertisements
Similar presentations
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Advertisements

 TDT 2003 Evaluation Workshop, NIST, November 17-18, 2003 Creating the Annotated TDT-4 Y2003 Evaluation Corpus Stephanie Strassel, Meghan Glenn Linguistic.
The Internet Web Basics Dr. Dania Bilal IS 587 Fall 2007.
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
SOCI 380 INSTRUCTIONS RE. RESEARCH PAPER DUE DATE: The research paper is due on the last day of class You are required to write and submit a detailed research.
Letters & Diaries. Historians value Personal texts Narrative or story-telling Democratic sources More candid Easier to relate to since they often deal.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
WXGB6106 INFORMATION RETRIEVAL Week 3 RETRIEVAL EVALUATION.
CS 106 Introduction to Computer Science I 10 / 16 / 2006 Instructor: Michael Eckmann.
Department / Ministère Date
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
Is Homework A Good Thing??. The Overview Do you think that homework is a good thing? Let’s see if we can work out the answer. To do this, we’re going.
CEDROM-SNi’s DITA- based Project From Analysis to Delivery By France Baril Documentation Architect.
Literature Search Techniques 2 Strategic searching In this lecture you will learn: 1. The function of a literature search 2. The structure of academic.
Educator’s Guide Using Instructables With Your Students.
Categorization, views, search and retrieval Becky Bertram Covenant Technology Partners.
Topic Detection and Tracking Introduction and Overview.
Search Engines and Information Retrieval Chapter 1.
South African Education Portal
SEO Part 1 Search Engine Marketing Chapter 5 Instructor: Dawn Rauscher.
AP Literature and Composition: Course Overview AP Literature and Composition: Course Overview.
Cultural Practices of Reading II. Cultural Practices of Reading Goal: To teach rhetorical reading strategies of complex, culturally situated texts.
Practical Project of the 2006 Joint International Master’s Degree.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
CSCI-235 Micro-Computer in Science Internet Search.
INTRODUCTION TO RESEARCH. Learning to become a researcher By the time you get to college, you will be expected to advance from: Information retrieval–
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Obtaining Data for Face Recognition from the web By Tal blum Advisor: Henry Schneiderman.
Overview of the TDT-2003 Evaluation and Results Jonathan Fiscus NIST Gaithersburg, Maryland November 17-18, 2002.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
COMP 208/214/215/216 – Lecture 8 Demonstrations and Portfolios.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
TDT 2002 Straw Man TDT 2001 Workshop November 12-13, 2001.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
CPT 499 Internet Skills for Educators Session Three Class Notes.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
What is Google? Google is a popular web search engine— And learning techniques saves time and results in rewarding research.
Numeracy unit standards update. Background Government strategy to improve literacy and numeracy levels of all New Zealanders Adult Literacy Strategy (TEC)
 TDT 2004 Evaluation Workshop, NIST, December 2-3, 2004 Creating the TDT5 Corpus and 2004 Evaluation Topics at LDC Stephanie Strassel, Meghan Glenn, Junbo.
Data I.
MIT Artificial Intelligence Laboratory — Research Directions The START Information Access System Boris Katz
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
By Pamela Drake SEARCH ENGINE OPTIMIZATION. WHAT IS SEO? Search engine optimization (SEO) is the process of affecting the visibility of a website or a.
©2003 Paula Matuszek GOOGLE API l Search requests: submit a query string and a set of parameters to the Google Web APIs service and receive in return a.
How to search for relevant information. Preparing to search: PLAN WHAT am I looking for? WHY do I want it? WHEN? Time period? HOW? Document type? What.
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Test Administrator Training Spring 2016 Paper Tests.
1 Web Search What is a keyword? 2 Thinking What makes a webpage relevant to you? 3 Web Search/Thinking Does the number of words you type into.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Innovative Novartis Knowledge Center
Session 5: How Search Engines Work. Focusing Questions How do search engines work? Is one search engine better than another?
Understanding and Critically Appraising the Literature Review
Information Organization: Overview
Experimental Psychology
Internet Searching: Finding Quality Information
How to write a literature review
Independent work of students
Partnership Collections
Data Mining Chapter 6 Search Engines
The Advising Literature Review: Make it Systematic!
I can identify literary devices and explain their functions within the text.
DISSEMINATION WORKING GROUP Luxembourg, November 2011 Using Statistics Explained to produce the Eurostat Yearbook Jukka PIIRTO.
DO NOT OPEN YOUR BINDER UNTIL INSTRUCTED
Information Organization: Overview
Slide Deck 5: Journalism
Lesson: Journalism.
Presentation transcript:

 TDT PI Meeting - November 16-17, 2000 Annotation Overview  Background  annotation strategy search-guided complete annotation work with one topic at a time multiple stages for each topic  annotation resources definition/explication rules of interpretation  topic research larger role than in 1999 fed directly into topic annotation

 TDT PI Meeting - November 16-17, 2000 Annotation Strategy  STAGE 1: Initial Query  submit all known on-topic stories as query to search engine OT stories revealed during topic selection, definition & research  read through resulting relevance-ranked list, annotating all stories as YES/BRIEF/NO  stop after 5-10 additional on-topic stories identified; or  after reaching “off-topic threshold”: at least 2 off-topic stories for every 1 on-topic read AND the last 10 consecutive stories off-topic  This was possible for 51 English, 34 Mandarin topics  If no pre-existing OT stories, go directly to text-based query (Stage 3)

 TDT PI Meeting - November 16-17, 2000 Annotation Strategy  STAGE 2: Improved Queries Based on Additional On-Topic Stories  issue a new query using a concatenation of all known on-topic stories  read and annotate stories in resulting relevance-ranked list until reaching “off-topic threshold”  minimum of one docno search for all topics with 1+ hit English - maximum of 15 Mandarin - maximum of 20  STAGE 3: Initial Text-based Queries  issue a new query using the topic research document plus any additional relevant text (e.g., parts of the topic explication)  read and annotate stories in resulting relevance-ranked list until reaching “off-topic threshold”  minimum of one text search per topic English - maximum of 9 Mandarin - maximum of 14

 TDT PI Meeting - November 16-17, 2000 Annotation Strategy  STAGE 4: Creative Searching  Instructions to Annotators: You are encouraged to use your specialized knowledge (drawn from topic research and the known on-topic stories) to conduct additional manual searches through the corpus. These additional searches will be based on keywords, names, particular on-topic stories, etc. Think creatively! If you come up with a novel way to search for additional on-topic stories, let us know. If you find additional information (names, places, dates, events) about your topic, you should revise the topic research page for that topic.  Examples of Creative Searching  Topic 10: European Cold Wave Annotator comments: “In annotating this topic I had to go beyond the regular parameters. It was apparent that there were YES stories remaining beyond the “no threshold”. Many of the intervening NO stories were CNN weather reports that had nothing to do with the topic. So I did extra text searches and concentrated on stories within a particular timeframe to find additional hits.”  Topic 42: New Paris Subway Line No pre-existing OT stories Annotator searched WWW for topic Used content of story not within TDT3 collection as query

 TDT PI Meeting - November 16-17, 2000 Mandarin Hits vs. Stories Read

 TDT PI Meeting - November 16-17, 2000 English Hits vs. Stories Read

 TDT PI Meeting - November 16-17, 2000 English Hits vs. Stories Read Annotators were permitted to ignore part of the “off-topic threshold” for topics with 50+ hits...

 TDT PI Meeting - November 16-17, 2000 English Hits vs. Stories Read Annotators were permitted to ignore part of the “off-topic threshold” for topics with 50+ hits... …but thisone didn’t.

 TDT PI Meeting - November 16-17, 2000 Annotation Statistics