Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Potpurri of topics Paul Kantor

Similar presentations


Presentation on theme: "A Potpurri of topics Paul Kantor"— Presentation transcript:

1 A Potpurri of topics Paul Kantor
Project overview and cartoon How we did at TREC this year Generalized Performance Plots Remarks on the formal model of decision © Paul Kantor 2002

2 1. Accumulated documents
Rutgers DIMACS: Automatic Event Finding in Streams of Messages Retrospective/Supervised/Tracking 4. Guided Retrieval 7. Track New documents 1. Accumulated documents 2. Unexpected event 3. Initial Profile 6. Revision and Iteration 5.Clustering Analysts Prospective/Unsupervised/Detection 5.. Guided Retrieval 1. Accumulated documents Automatic Event Finding in Streams of Messages has two phases. The first (Year 1) is retrospective -- think of it as hindsight. Documents are accumulated (1) until an unexpected event, represented here by a train rushing at us from a tunnel (2) occurs. Analysts build an initial profile (3) , which is used to retrieve likely documents (4) which are clustered (5) to provide a rich document set. Analysts work with this set, to revise and iterate (6), supporting efforts to track down all participants and supporters of the event. In the second and third years of the project attention will focus on prospective detection of significant events. As documents are accumulated (1) continuous clustering and matching (2) identifies documents that do not fit into established patterns. These are grouped and automatically profiled (3) and the results are submitted to analysts. The result, in some cases, will be an anticipated event (4) which, if the analysis is timely, will be prevented. Guided retrieval can then be used, as before, to track down all participants and supporters of the intended event. Technically, the work will be a thorough and systematic exploration of various representations of documents; matching methods; learning methods, and fusion methods, to establish the limits of these technologies. Theoretical work will establish rates of convergence and probabilities of success. Experimental work will test methods using the established TREC collections, and other materials, as appropriate. The work will be done by 7 faculty, and 9 students, post-docs and programmers. 4. Anticipated event 3. Initial Profile 2.Clustering © Paul Kantor 2002 © Paul Kantor 2001

3 Rutgers-DIMACS KD-D MMS Project Matrix
© Paul Kantor 2002

4 Communication The process converges…. Central limit theorem … What???
Pretty good fit Confidence levels And so on © Paul Kantor 2002

5 Measures of performance Effectiveness
1. Batch Post-hoc learning. Here there is a large set of already discovered documents, and the system must learn to recognize future instances form the same family 2. Adaptive learning of defined profiles. Here there is a small group of "seed documents" and thereafter the system must learn while it works. Realistic measures must penalize the system for sending document that are of no interest to any analyst, to the human experts. 3. Discovery of new regions of interest. Here the focus is on unexpected patterns of related documents, which are far enough from established patterns to warrant sending them for human evaluation. © Paul Kantor 2002

6 Measures of performance Effectiveness
Efficiency is measured in both time and space resources required to accomplish a given level of effectiveness. Results are best visualized in a set o two or three dimensional plots, as suggested on the following page. © Paul Kantor 2002

7 Typical measures used Adaptive Filtering Basis: F-measures:
precision p=g/n g=number relevant and n = number that the analyst must examine recall R=g/G. G=total number that should be sent to the analyst. F-measures: 1/F = a/p+ (1-a)/R =(1/g)(an+(1-a)G) so F=g/[an+(1-a)G] there is no persuasive argument for using this in TREC2002 a=0.8. A 4:1 weighting © Paul Kantor 2002

8 Typical measures used Utility-Based measures
Pure Measure: U=vg -c(n-g) =-c*n+g*(v-c) Note that sending irrelevant documents drives the score negative. v=2; c=1 “Training Wheels”: To protect groups from having to report astronomically negative results: U →T11SU = [max{U/2G, -0.5} - 0.5]/[1.5] © Paul Kantor 2002

9 How we have done: TREC2002 Disclaimers and caveats.
We report here only on those results that were achieved and validated at the TREC2002 conference. These were done primarily to convince ourselves that we can manage the entire data pipeline, and were not selected to represent the best conceptual approaches that we can think of. © Paul Kantor 2002

10 Disclaimers and caveats (cont).
The TREC Adaptive rules are quite confusing to newcomers. It appears, in conference and post-conference discussions that the two top-ranked systems may not have followed the same set of rules as the other competitors. If this is the case, our results will actually be better than those reported here. © Paul Kantor 2002

11 Using measure T11SU Adaptive -- Assessor topics - 9th among all teams - 7th among those known to follow rules. Intersection topics - 7th among all teams -- 5th among known to follow the rules. Batch. 6th among all groups on Assessor topics; 3rd among all groups on Intersection topics. Scored above median on almost all. Tops on 24 of 50 © Paul Kantor 2002

12 Efficiency- Effectiveness Plots
Strong and slow Strong and fast 100% Measure of Effectiveness Not good enough for government work Weak but fast 100% Measure of Time Required (Best Baseline method/Method_plotted) © Paul Kantor 2002

13 BinWorlds --Very simple models
Documents live in some number (L) of bins. Some bins have only (b) irrelevant (bad) documents, a few have relevant (good) documents. Documents are delivered randomly from the world, labeled only by their bin numbers. The work has a horizon H, with a payoff v for good documents sent to be judged, and a cost c for bad documents sent to be judged. We consider a hierarchy of models. For example, if only one bin contains good documents, the optimum strategy is either QUIT or continue until seeing one good document, and thereafter only submit documents from this bin to be judged. The expected value of the game is given by: EV=-CostToLearnRightBin+GainThereafter. Since the expected time to learn the right bin is 1+Lb/g EV=-c(1+Lb/g)+(H-(1+Lb/g))(vg-cb)/(b+g). Increasing Horizon H increases EV, while increasing the number of candidate bins, L, makes the game harder. However, if we have failed once on a bin, perhaps it is not wise to test it again. In other models, g,L become parameters to determine, several adjacent bins contain good documents, and the number of good documents varies smoothly. © Paul Kantor 2002

14 The essential math At any step on the way to the horizon H the decision maker can know only these things: The judgments on submitted documents, and the stage at which they were submitted. Let j(b,i) be the judgment received when a document from bin b was submitted at time step i. As a result of these judgments, the decision maker has a present Bayesian estimate of the chance that each bin is the right bin © Paul Kantor 2002

15 The challenge Can we find a simple and effective heuristic based on the available history j …….j and the time remaining:H-i . Such an heuristic must exist because the decision rule must be of the form: if the current estimate that a bin is the right one is below some critical value, don’t submit it. Note: this is “obvious but not yet proved.” © Paul Kantor 2002


Download ppt "A Potpurri of topics Paul Kantor"

Similar presentations


Ads by Google