Presentation is loading. Please wait.

Presentation is loading. Please wait.

IE by Candidate Classification: Califf & Mooney

Similar presentations


Presentation on theme: "IE by Candidate Classification: Califf & Mooney"— Presentation transcript:

1 IE by Candidate Classification: Califf & Mooney
William Cohen 1/21/03

2 Where this paper fits What happens when the candidate generator becomes very general? Candidate Generator Candidate phrase Learned filter Extracted phrase

3 Example task for Rapier

4 Rapier: the 3-slide version
[Califf & Mooney, AAAI ‘99] A bottom-up rule learner: initialize RULES to be one rule per example; repeat { randomly pick N pairs of rules (Ri,Rj); let {G1…,GN} be the consistent pairwise generalizations; let G* = Gi that optimizes “compression” let RULES = RULES + {G*} – {R’: covers(G*,R’)} } where compression(G,RULES) = size of RULES- {R’: covers(G,R’)} and “covers(G,R)” means every example matching G matches R

5 <title>Course Information for CS213</title>
<h1>CS 213 C++ Programming</h1> … Differences dropped Common conditions carried over to generalization courseNum(window1) :- token(window1,’CS’), doubleton(‘CS’), prevToken(‘CS’,’CS213’), inTitle(‘CS213’), nextTok(‘CS’,’213’), numeric(‘213’), tripleton(‘213’), nextTok(‘213’,’C++’), tripleton(‘C++’), …. <title>Syllabus and meeting times for Eng 214</title> <h1>Eng 214 Software Engineering for Non-programmers </h1>… courseNum(window2) :- token(window2,’Eng’), tripleton(‘Eng’), prevToken(‘Eng’,’214’), inTitle(‘214’), nextTok(‘Eng’,’214’), numeric(‘214’), tripleton(‘214’), nextTok(‘214’,’Software’), … courseNum(X) : token(X,A), prevToken(A, B), inTitle(B), nextTok(A,C)), numeric(C), tripleton(C), nextTok(C,D), …

6 Rapier: an alternative approach
Combines top-down and bottom-up learning Bottom-up to find common restrictions on content Top-down greedy addition of restrictions on context Use of part-of-speech and semantic features (from WORDNET). Special “pattern-language” based on sequences of tokens, each of which satisfies one of a set of given constraints < <tok{‘ate’,’hit’},POS{‘vb’}>, <tok{‘the’}>, <POS{‘nn’>>

7 Rapier: the N slide version

8 Rapier: IE with “rules”
Rule consists of Pre-filler pattern Filler pattern Post-filler pattern Pattern composed of elements, and each pattern element matches a sequence of words that obeys constraints on value, POS, Wordnet class of words in sequence Total length of sequence Example: “IBM paid an undisclosed amount” matches PreFiller: <pos=nn or nnp>:1 <anyword>:2 Filler: <word=‘undisclosed’>:1 PostFiller: <semanticClass=‘price’> A rule might match many times in a document

9 Algorithm:1 Covers many pos, few neg examples ie. Redundant
Start with a huge ruleset – one rule per example Every new rule “compresses” the ruleset Expect many high-precision, low-recall rules

10 PreFiller: <word=‘Subject’> <word=‘:’> … <word=‘com’> <word=‘>’>
Filler: <word=‘SOFTWARE’> <word=‘PROGRAMMER’> PostFiller: <word=‘Position’> <word=‘available’> <word=‘for’> …. <word=‘memphisonline’> <word=‘.’> <word=‘com’> Plus POS, info for each word in each pattern (but not semantic class)

11 Algorithm:2 What is FindNewRule?

12 An “obvious choice” Some sort of search based on this primitive step:
Pick a PAIR of rules R1,R2 Form the most specific possible rule that is more general than either R1 or R2 But in Rapier’s language there are multiple generalizations of R1 and R2…

13 Algorithm: 3 Specialize by adding conditions from pairwise generalizations, starting with filler and working out

14 Algorithm 4: pairwise generalization of patterns
Heuristic search for best (in some graph sense) possible generalization of Wordnet semantic classes

15 Algorithm 4: pairwise generalization of patterns
Explore several ways to generalize differing POS of word values:

16 Algorithm 4: pairwise generalization of patterns
Patterns of the same length can be generalized element-by-element. Patterns of different length…. - special cases when one list of length zero or one - “punt” and generate a single very general pattern if the lists are both long - if both patterns are moderately short, consider many possible pairings of pattern elements

17 Algorithm 4: pairwise generalization of patterns
If both patterns are moderately short, consider many possible pairings of pattern elements A + <BC>? + B + [AD] ABCBA ABD ABCBA ABD A + B + <[ABCD]{1,4}>

18 Algorithm: 5 Evaluation of Rules

19 Results

20 Extraction by Sliding Window
GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. CMU UseNet Seminar Announcement

21 A “Naïve Bayes” Sliding Window Model
[Freitag 1997] 00 : pm Place : Wean Hall Rm Speaker : Sebastian Thrun w t-m w t-1 w t w t+n w t+n+1 w t+n+m prefix contents suffix Estimate Pr(LOCATION|window) using Bayes rule Try all “reasonable” windows (vary length, position) Assume independence for length, prefix words, suffix words, content words Estimate from data quantities like: Pr(“Place” in prefix|LOCATION) If P(“Wean Hall Rm 5409” = LOCATION) is above some threshold, extract it. Other examples of sliding window: [Baluja et al 2000] (decision tree over individual words & their context)

22 “Naïve Bayes” Sliding Window Results
Domain: CMU UseNet Seminar Announcements GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. Field F1 Person Name: 30% Location: 61% Start Time: 98%

23 Results: second domain

24 Discussion questions Is this candidate classification?
Is complexity good or bad in a learned hypothesis? In a learning system? What are the tradeoffs in expressive rule languages vs simple ones? Is RAPIER successful in using long-range information? What other ways are there to get this information?

25

26 Wrapper learning Cohen et al, WWW2002

27 Goal: learn from a human teacher how to extract certain database records from a particular web site.

28 Learner

29

30

31 Why learning from few examples is important
At training time, only four examples are available—but one would like to generalize to future pages as well… Must generalize across time as well as across a single site

32 now some details….

33 Improving A Page Classifier with Anchor Extraction and Link Analysis
William W. Cohen NIPS 2002

34 Previous work in page classification using links:
Exploit hyperlinks (Slattery&Mitchell 2000; Cohn&Hofmann, 2001; Joachims 2001): Documents pointed to by the same “hub” should have the same class. What’s new in this paper: Use structure of hub pages (as well as structure of site graph) to find better “hubs” Adapt an existing “wrapper learning” system to find structure, on the task of classifying “executive bio pages”.

35 Intuition: links from this
“hub page” are informative… …especially these links

36 Task: train a page classifier, then use it to classify pages on a new, previously-unseen web site as executiveBio or other Question: can index pages for executive biographies be used to improve classification? Idea: use the wrapper-learner to learn to extract links to execBio pages, smoothing the “noisy” data produced by the initial page classifier.

37 Background: “co-training” (Mitchell&Blum, ‘98)
Suppose examples are of the form (x1,x2,y) where x1,x2 are independent (given y), and where each xi is sufficient for classification, and unlabeled examples are cheap. (E.g., x1 = bag of words, x2 = bag of links). Co-training algorithm: 1. Use x1’s (on labeled data D) to train f1(x)=y 2. Use f1 to label additional unlabeled examples U. 3. Use x2’s (on labeled part of U+D to train f1(x)=y 4. Repeat . . .

38 Simple 1-step co-training for web pages
f1 is a bag-of-words page classifier, and S is web site containing unlabeled pages. Feature construction. Represent a page x in S as a bag of pages that link to x (“bag of hubs”). Learning. Learn f2 from the bag-of-hubs examples, labeled with f1 Labeling. Use f2(x) to label pages from S. Idea: use one round of co-training to bootstrap the bag-of words classifier to one that uses site-specific features x2/f2

39 Improved 1-step co-training for web pages
Feature construction. - Label an anchor a in S as positive iff it points to a positive page x (according to f1). Let D = {(x’,a): a is a positive anchor on x’}. - Generate many small training sets Di from D, by sliding small windows over D. - Let P be the set of all “structures” found by any builder from any subset Di - Say that p links to x if p extracts an anchor that points to x. Represent a page x as the bag of structures in P that link to x. Learning and Labeling. As before.

40 List1 builder extractor

41 builder extractor List2

42 builder extractor List3

43 Learner { List1, List3,…}, PR { List1, List2, List3,…}, PR
BOH representation: { List1, List3,…}, PR { List1, List2, List3,…}, PR { List2, List 3,…}, Other { List2, List3,…}, PR Learner

44 Experimental results Co-training hurts No improvement

45 Summary - “Builders” (from a wrapper learning system) let one discover and use structure of web sites and index pages to smooth page classification results. - Discovering good “hub structures” makes it possible to use 1-step co-training on small ( example) unlabeled datasets. – Average error rate was reduced from 8.4% to 3.6%. – Difference is statistically significant with a 2-tailed paired sign test or t-test. – EM with probabilistic learners also works—see (Blei et al, UAI 2002)


Download ppt "IE by Candidate Classification: Califf & Mooney"

Similar presentations


Ads by Google