Presentation is loading. Please wait.

Presentation is loading. Please wait.

EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI.

Similar presentations


Presentation on theme: "EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI."— Presentation transcript:

1 EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL

2 EMNLP’01 19/11/2001 Rule Induction Sequential Covering Greedy Covering Strategies for Learning a Single Rule: –Top-Down vs. Bottom-Up Sequential Covering Greedy Covering Strategies for Learning a Single Rule: –Top-Down vs. Bottom-Up ACL’99 Tutorial on: Symbolic Machine Learning for NLP (Mooney & Cardie 99) ACL’99 Tutorial on: Symbolic Machine Learning for NLP (Mooney & Cardie 99) We will follow (again):

3 EMNLP’01 19/11/2001 Rule Induction Propositional FOIL Relational Learning and Inductive Logic Programming (ILP) FOIL Applications: –Text Categorization –Information Extraction Propositional FOIL Relational Learning and Inductive Logic Programming (ILP) FOIL Applications: –Text Categorization –Information Extraction

4 EMNLP’01 19/11/2001 Rule Induction and NLP RuleInduction Text Categorization (Cohen 95,96; Craven et al. 98; Slattery & Craven 98) Semantic Parsing (Zelle & Mooney 93,94,96) Information Extraction (Soderland 95,96,99; Freitag 98a,98b,98c) (Califf & Mooney 97,99; Turmo & Rodríguez 01) Generation (Radev 98) Text Categorization (Cohen 95,96; Craven et al. 98; Slattery & Craven 98) Semantic Parsing (Zelle & Mooney 93,94,96) Information Extraction (Soderland 95,96,99; Freitag 98a,98b,98c) (Califf & Mooney 97,99; Turmo & Rodríguez 01) Generation (Radev 98)

5 EMNLP’01 19/11/2001 Information Extraction (Turmo & Rodríguez, 01) IE

6 EMNLP’01 19/11/2001 “Vira a marrón oscuro al corte” Information Extraction (Turmo & Rodríguez, 01) IE

7 EMNLP’01 19/11/2001 Information Extraction (Turmo & Rodríguez, 01) IE

8 EMNLP’01 19/11/2001 Basic concepts –Colour: Derived concepts –Color_state: Information Extraction (Turmo & Rodríguez, 01) IE

9 EMNLP’01 19/11/2001 isa_color (A, A) :- pos_s_adj(A), has_hypernym_03464977n(A), ancestor(A, C), pos_s_adj(C). isa_color (A, A) :- has_hypernym_03460270n(A), brother(C,A), pos_nc(C), has_hypernym_00009919n(C). … Using FOIL (First Order Induction Learner, Quinlan, 1990) as basic learner 38 rules were learned by FOIL for color only 1 was illformed Information Extraction (Turmo & Rodríguez, 01) IE Resultats globals?

10 EMNLP’01 19/11/2001 Drawbacks of the learning process Insufficient amount of positive examples –Active Learning –Artificial examples Relevance of negative examples –Use of empirical observations Freitag’s baseline –Use of a distance measure between examples –Use of clustering techniques Insufficient amount of positive examples –Active Learning –Artificial examples Relevance of negative examples –Use of empirical observations Freitag’s baseline –Use of a distance measure between examples –Use of clustering techniques Information Extraction (Turmo & Rodríguez, 01) IE

11 EMNLP’01 19/11/2001 Internet IE The Web  KB Project –CMU Text Learning Group (Tom Mitchell, Andrew McCallum, Mark Craven, etc.) –Situation: >350 million Web pages available from a personal workstation. However none of them are understandable for your computer –Goal: To automatically create a computer-understandable knowledge base whose content mirrors that of the WWW –Utility: Allowing much more effective information retrieval and supporting knowledge-based inference and problem solving on the World Wide Web –How: Using machine learning to create information extraction methods for each of the desired types of knowledge The Web  KB Project –CMU Text Learning Group (Tom Mitchell, Andrew McCallum, Mark Craven, etc.) –Situation: >350 million Web pages available from a personal workstation. However none of them are understandable for your computer –Goal: To automatically create a computer-understandable knowledge base whose content mirrors that of the WWW –Utility: Allowing much more effective information retrieval and supporting knowledge-based inference and problem solving on the World Wide Web –How: Using machine learning to create information extraction methods for each of the desired types of knowledge Information Extraction

12 EMNLP’01 19/11/2001 Internet IE WebKB architecture Faculty projects_led_by students_of Person department_of projects_of name_of... Student advisors_of courses_TAed_by Entities

13 EMNLP’01 19/11/2001 Internet IE WebKB architecture Web Pages Fundamentals of CS Home Page Instructors: Jim Tom Jim’s Home Page I teach several courses: Fundamentals of CS Intro to AI My research includes: Intelligent web agents Human computer interaction

14 EMNLP’01 19/11/2001 Internet IE WebKB architecture KB Instances Fundamentals-of-CS instructors_of: jim, tom home_page: Jim courses_taught_by: fundamentals-of-CS, intro-to-AI home_page:

15 EMNLP’01 19/11/2001 WebKB architecture TEST Internet IE Learning algorithm... Learning algorithm Learning algorithm TRAINING... Classification rules Relation extraction rules Extraction rules Web pages Ontology INPUT WWW WebKB RESULT

16 EMNLP’01 19/11/2001 Internet IE Learning Tasks ¶Recognizing class instances by classifying bodies of text ·Recognizing relation instances by classifying chains of hyperlinks ¸Recognizing class and relation instances by extracting small fields of text from Web pages ¶Recognizing class instances by classifying bodies of text ·Recognizing relation instances by classifying chains of hyperlinks ¸Recognizing class and relation instances by extracting small fields of text from Web pages

17 EMNLP’01 19/11/2001 Internet IE Learning Tasks ¶Recognizing class instances by classifying bodies of text –Bayesian text categorization –Several text representations –Exploiting hyperlink relations relational text categorization clustering of documents –Exploiting combination of several classifiers ¶Recognizing class instances by classifying bodies of text –Bayesian text categorization –Several text representations –Exploiting hyperlink relations relational text categorization clustering of documents –Exploiting combination of several classifiers

18 EMNLP’01 19/11/2001 Internet IE Learning Tasks ·Recognizing relation instances by classifying chains of hyperlinks –Discovering hyperlink paths of unknown and variable size. –First order representation –Induction of relational rules (FOIL) ·Recognizing relation instances by classifying chains of hyperlinks –Discovering hyperlink paths of unknown and variable size. –First order representation –Induction of relational rules (FOIL) course(A)  person(B)  link_to(B,A)    instructor_of(A,B) research_project(A)  person(C)  link_to(L 1,A,B)  link_to(L 2,B,C)  neighbour_word_ people (L 1 )   member_proj(A,C)

19 EMNLP’01 19/11/2001 Internet IE Learning Tasks ¸Recognizing class and relation instances by extracting small fields of text from Web pages –Sequence Rules with Validation (Freitag, 98; 99): –FOIL-based general-purpose relational learner for IE –Rules for extracting names of home page owners: –77.4% accuracy! ¸Recognizing class and relation instances by extracting small fields of text from Web pages –Sequence Rules with Validation (Freitag, 98; 99): –FOIL-based general-purpose relational learner for IE –Rules for extracting names of home page owners: –77.4% accuracy! length(F,<,3)  in_title(A)  prev_word(A,” GMT ”)  unknown(A)  not(length(A,=,4))  follow_word(A,B)  length(B,>,4)    ownername(F)

20 EMNLP’01 19/11/2001 Internet IE Evaluation Training corpora (hand labelled according to the prescribed ontology): –8,000 Web pages –1,400 Web-page pairs –From the computer science department Web sites at four universities: Cornell, University of Texas at Austin, University of Washington, and University of Wisconsin. Experimental test on the Web site of the computer science department at Carnegie Mellon University Training corpora (hand labelled according to the prescribed ontology): –8,000 Web pages –1,400 Web-page pairs –From the computer science department Web sites at four universities: Cornell, University of Texas at Austin, University of Washington, and University of Wisconsin. Experimental test on the Web site of the computer science department at Carnegie Mellon University

21 EMNLP’01 19/11/2001 Internet IE Evaluation

22 EMNLP’01 19/11/2001 Internet IE Evaluation Class instances Relation instances

23 EMNLP’01 19/11/2001 Rule Induction: Summary RuleInduction Connection to Dan Roth’s work at the Cognitive Computation Group (Univ. of Illinois at Urbana-Champaign)


Download ppt "EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –TBEDL ML: Classical methods from AI."

Similar presentations


Ads by Google