A Potpurri of topics Paul Kantor

Slides:

Advertisements

Similar presentations

1 CAATS Second Workshop Validation User Group VDR Survey - Results Ulrich Borkenhagen; EUROCONTROL HQ, Core Team Lanzarote, Thursday 16 th February 2006.

Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.

© Paul Kantor 2002 A Potpurri of topics Paul Kantor Project overview and cartoon How we did at TREC this year Generalized Performance Plots Remarks on.

Department of Industrial Management Engineering 1.Introduction ○Usability evaluation primarily summative ○Informal intuitive evaluations by designers even.

Rutgers Components Phase 2 Principal investigators –Paul Kantor, PI; Design, modelling and analysis –Kwong Bor Ng, Co-PI - Fusion; Experimental design.

1 Monitoring Message Streams: Retrospective and Prospective Event Detection Fred S. Roberts DIMACS, Rutgers University.

Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.

Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.

Search and Retrieval: Relevance and Evaluation Prof. Marti Hearst SIMS 202, Lecture 20.

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

Search Engines and Information Retrieval Chapter 1.

APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.

Minimal Test Collections for Retrieval Evaluation B. Carterette, J. Allan, R. Sitaraman University of Massachusetts Amherst SIGIR2006.

IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.

Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.

Chapter 14: Using the Scalable Decision Process on Large Projects The process outlined is meant to be scaleable. Individual steps can be removed, changed,

Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.

November 10, 2004Dmitriy Fradkin, CIKM'041 A Design Space Approach to Analysis of Information Retrieval Adaptive Filtering Systems Dmitriy Fradkin, Paul.

Basic Nursing: Foundations of Skills & Concepts Chapter 9

Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.

National PE Cycle of Analysis. Fitness Assessment + Gathering Data Why do we need to asses our fitness levels?? * Strengths + Weeknesses -> Develop Performance.

MM207 Statistics Welcome to the Unit 9 Seminar With Ms. Hannahs Final Project is due Tuesday, August 7 at 11:59 pm ET. No late projects will be accepted.

Understanding Populations & Samples

Certification of Reusable Software Artifacts

Strategies For Making Assessment Meaningful and Manageable

Acknowledgement: Khem Gyawali

Queensland University of Technology

AP CSP: Data Assumptions & Good and Bad Data Visualizations

How learners learn in my teaching world…

Project Communication, Tracking, and Reporting

Planning for Social Studies Instruction

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Implementing Evaluations Strategies for Success

Chapter 4 Planning in the Problem-Based Classroom

DISCOVERY & CUSTOMIZED WORK-BASED LEARNING

MMS Software Deliverables: Year 1

K Nearest Neighbors and Instance-based methods

Human Resource Management By Dr. Debashish Sengupta

INF 337 Enthusiastic Study/snaptutorial.com

Managing for Today and Tomorrow

Revision (Part II) Ke Chen

Managing for Today and Tomorrow

Revision (Part II) Ke Chen

PostPC Computing Heuristic Evaluation Prof. Scott Kirkpatrick, HUJI

Project Management Process Groups

Approaching an ML Problem

Critical Element: Implementation Plan

MONITORING MESSAGE STREAMS: RETROSPECTIVE AND PROSPECTIVE EVENT DETECTION Rutgers/DIMACS improve on existing methods for monitoring huge streams of textualized.

MONITORING MESSAGE STREAMS: RETROSPECTIVE AND PROSPECTIVE EVENT DETECTION Rutgers/DIMACS improve on existing methods for monitoring huge streams of textualized.

What is Project Human Resource Management (PHRM)?

Panagiotis G. Ipeirotis Luis Gravano

Remarks on US Costing Activities for the US ITER-TBM

BinWorlds --Very simple models

Managing for Today and Tomorrow

Automating Profitable Growth™

IDEA Student Ratings of Instruction

Relevance and Reinforcement in Interactive Browsing

Evaluation of UMD Object Tracking in Video

Retrieval Evaluation - Reference Collections

Engineering Design Process

MONITORING MESSAGE STREAMS: RETROSPECTIVE AND PROSPECTIVE EVENT DETECTION Rutgers/DIMACS improve on existing methods for monitoring huge streams of textualized.

General Discussion Conclusions:

Retrieval Performance Evaluation - Measures

Chapter 12 Analyzing Semistructured Decision Support Systems

Risk Adjusted P-chart Farrokh Alemi, Ph.D.

This is the second module of the Collaborative Backward Design series

Presentation transcript:

A Potpurri of topics Paul Kantor Project overview and cartoon How we did at TREC this year Generalized Performance Plots Remarks on the formal model of decision © Paul Kantor 2002

1. Accumulated documents Rutgers DIMACS: Automatic Event Finding in Streams of Messages Retrospective/Supervised/Tracking 4. Guided Retrieval 7. Track New documents 1. Accumulated documents 2. Unexpected event 3. Initial Profile 6. Revision and Iteration 5.Clustering Analysts Prospective/Unsupervised/Detection 5.. Guided Retrieval 1. Accumulated documents Automatic Event Finding in Streams of Messages has two phases. The first (Year 1) is retrospective -- think of it as 20-20 hindsight. Documents are accumulated (1) until an unexpected event, represented here by a train rushing at us from a tunnel (2) occurs. Analysts build an initial profile (3) , which is used to retrieve likely documents (4) which are clustered (5) to provide a rich document set. Analysts work with this set, to revise and iterate (6), supporting efforts to track down all participants and supporters of the event. In the second and third years of the project attention will focus on prospective detection of significant events. As documents are accumulated (1) continuous clustering and matching (2) identifies documents that do not fit into established patterns. These are grouped and automatically profiled (3) and the results are submitted to analysts. The result, in some cases, will be an anticipated event (4) which, if the analysis is timely, will be prevented. Guided retrieval can then be used, as before, to track down all participants and supporters of the intended event. Technically, the work will be a thorough and systematic exploration of various representations of documents; matching methods; learning methods, and fusion methods, to establish the limits of these technologies. Theoretical work will establish rates of convergence and probabilities of success. Experimental work will test methods using the established TREC collections, and other materials, as appropriate. The work will be done by 7 faculty, and 9 students, post-docs and programmers. 4. Anticipated event 3. Initial Profile 2.Clustering © Paul Kantor 2002 © Paul Kantor 2001

Rutgers-DIMACS KD-D MMS Project Matrix © Paul Kantor 2002

Communication The process converges…. Central limit theorem … What??? Pretty good fit Confidence levels And so on © Paul Kantor 2002

Measures of performance Effectiveness 1. Batch Post-hoc learning. Here there is a large set of already discovered documents, and the system must learn to recognize future instances form the same family 2. Adaptive learning of defined profiles. Here there is a small group of "seed documents" and thereafter the system must learn while it works. Realistic measures must penalize the system for sending document that are of no interest to any analyst, to the human experts. 3. Discovery of new regions of interest. Here the focus is on unexpected patterns of related documents, which are far enough from established patterns to warrant sending them for human evaluation. © Paul Kantor 2002

Measures of performance Effectiveness Efficiency is measured in both time and space resources required to accomplish a given level of effectiveness. Results are best visualized in a set o two or three dimensional plots, as suggested on the following page. © Paul Kantor 2002

Typical measures used Adaptive Filtering Basis: F-measures: precision p=g/n g=number relevant and n = number that the analyst must examine recall R=g/G. G=total number that should be sent to the analyst. F-measures: 1/F = a/p+ (1-a)/R =(1/g)(an+(1-a)G) so F=g/[an+(1-a)G] there is no persuasive argument for using this in TREC2002 a=0.8. A 4:1 weighting © Paul Kantor 2002

Typical measures used Utility-Based measures Pure Measure: U=vg -c(n-g) =-c*n+g*(v-c) Note that sending irrelevant documents drives the score negative. v=2; c=1 “Training Wheels”: To protect groups from having to report astronomically negative results: U →T11SU = [max{U/2G, -0.5} - 0.5]/[1.5] © Paul Kantor 2002

How we have done: TREC2002 Disclaimers and caveats. We report here only on those results that were achieved and validated at the TREC2002 conference. These were done primarily to convince ourselves that we can manage the entire data pipeline, and were not selected to represent the best conceptual approaches that we can think of. © Paul Kantor 2002

Disclaimers and caveats (cont). The TREC Adaptive rules are quite confusing to newcomers. It appears, in conference and post-conference discussions that the two top-ranked systems may not have followed the same set of rules as the other competitors. If this is the case, our results will actually be better than those reported here. © Paul Kantor 2002

Using measure T11SU Adaptive -- Assessor topics - 9th among all teams - 7th among those known to follow rules. Intersection topics - 7th among all teams -- 5th among known to follow the rules. Batch. 6th among all groups on Assessor topics; 3rd among all groups on Intersection topics. Scored above median on almost all. Tops on 24 of 50 © Paul Kantor 2002

Efficiency- Effectiveness Plots Strong and slow Strong and fast 100% Measure of Effectiveness Not good enough for government work Weak but fast 100% Measure of Time Required (Best Baseline method/Method_plotted) © Paul Kantor 2002

BinWorlds --Very simple models Documents live in some number (L) of bins. Some bins have only (b) irrelevant (bad) documents, a few have relevant (good) documents. Documents are delivered randomly from the world, labeled only by their bin numbers. The work has a horizon H, with a payoff v for good documents sent to be judged, and a cost c for bad documents sent to be judged. We consider a hierarchy of models. For example, if only one bin contains good documents, the optimum strategy is either QUIT or continue until seeing one good document, and thereafter only submit documents from this bin to be judged. The expected value of the game is given by: EV=-CostToLearnRightBin+GainThereafter. Since the expected time to learn the right bin is 1+Lb/g EV=-c(1+Lb/g)+(H-(1+Lb/g))(vg-cb)/(b+g). Increasing Horizon H increases EV, while increasing the number of candidate bins, L, makes the game harder. However, if we have failed once on a bin, perhaps it is not wise to test it again. In other models, g,L become parameters to determine, several adjacent bins contain good documents, and the number of good documents varies smoothly. © Paul Kantor 2002

The essential math At any step on the way to the horizon H the decision maker can know only these things: The judgments on submitted documents, and the stage at which they were submitted. Let j(b,i) be the judgment received when a document from bin b was submitted at time step i. As a result of these judgments, the decision maker has a present Bayesian estimate of the chance that each bin is the right bin © Paul Kantor 2002

The challenge Can we find a simple and effective heuristic based on the available history j …….j and the time remaining:H-i . Such an heuristic must exist because the decision rule must be of the form: if the current estimate that a bin is the right one is below some critical value, don’t submit it. Note: this is “obvious but not yet proved.” © Paul Kantor 2002