Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:

Similar presentations


Presentation on theme: "Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:"— Presentation transcript:

1 Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee: Ralph Grishman Satoshi Sekine I. Dan Melamed

2 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 2 Outline Introduction Research Proposal – Problem Setting – Approach – Application to Information Extraction Discussion

3 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 3 MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism. MUC Scenario Template Task DatePerpetratorWeaponVictimLocation

4 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 4 MUC Scenario Template Task DatePerpetratorWeaponVictimLocation Masked gunmen Mondaysix people three Kalashnikov rifles a Christian school MURREE, Pakistan (AP) -- Masked gunmen firing Kalashnikov rifles burst through the front gates of a Christian school Monday, killing six people and wounding three in the latest attack against Western interests since Pakistan joined the war against terrorism.

5 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 5 High Cost for Acquiring Knowledge-Base Find extraction patterns – Find relevant documents – Find relevant events – Analyze sentences Find domain-specific lexicon – Find existing KB (e.g. thesaurus, gazetteers)

6 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 6 Prior Work Automatic Knowledge Acquisition Lexical AcquisitionPattern Acquisition Mutual Bootstrapping (Riloff and Jones 1999) Simultaneous Multi-Semantic Class (Thelen and Riloff 2002) (Yangarber et al. 2002) Pattern Discovery with Document Re-ranking (Yangarber et al. 2000) Pattern Acquisition for QA (Ravichandran and Hovy 2002)

7 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 7 Challenge Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set User Knowledge Base MUC-3: Terrorism Event

8 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 8 Meeting the Challenge Seed Lexicon Seed Pattern Expanded Lexicon Expanded Pattern Set User Knowledge Base Semantic Clustering Scenario Description Semantic Cluster

9 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 9 Semantic Clustering Scenario Description Semantic Cluster – Description specific enough to define the scenario – (terrorism, bombing, kidnapping) – “ Tell me about the terrorism action, such as bombing and kidnapping.” – Find Scenario-specific Semantic Clusters each of which consists of – Semantic Lexicon – Extraction Patterns Goal: Input: Semantic Lexicon Extraction Patterns

10 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 10 Benefit for User Semantic Clustering Scenario Description Semantic Cluster Simplify Domain Analysis Low-cost Knowledge-base Acquisition for IE systems

11 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 11 Extraction Patterns Definition where c unifies with the context that is defined by semantic class L context = Case Frame: (bomb (v), x (subj), himself (obj)) Sequential: (x, bombs, himself) Dependency: himselfbombx (cf. Sudo et al. 2001) V:subjV:obj

12 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 12 Outline Introduction Research Proposal – Problem Setting – Approach – Information Extraction Evaluation

13 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 13 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source

14 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 14 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source

15 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 15 Information Retrieval Get Relevant Document set Get list of lexical items and extraction patterns ordered by relevance to the scenario – TF/IDF scoring R

16 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 16 Example of TF/IDF scoring (Management Succession: Business) 300 documents retrieved From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998)

17 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 17 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source extraction patterns lexicon

18 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 18 Bootstrapping Assumption:  Patterns provide Lexical Classes.  Lexicon provides contextual information. Riloff and Jones 1999 Agichtein and Gravano 2000 Find one cluster that consists of Lexicon and Extraction Patterns

19 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 19 Bootstrapping (Cont.) Algorithm ( cf. Riloff and Jones 1999 ) – Given the ordered list of terms the ordered list of extraction patterns Lexicon = (), Pattern = () – w  the most relevant term in the list and add it into Lexicon 1. p  the most relevant pattern among those that extract w. 2. Add p into Pattern 3. w  the most relevant term among those that are extracted by p 4. Add w into Lexicon 5. Go to 1

20 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 20 Example of Bootstrapping (Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998)

21 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 21 Example of Bootstrapping (Management Succession: Business) From WSJ (7/94 - 8/94) Extracted by MINIPAR (Lin 1998)

22 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 22 Problem: Polysemous Lexicon, Pattern Lexicon can be ambiguous – e.g. Clinton (Person, Organization, Location … ) Extraction patterns can be ambiguous – e.g. be killed in (x: Location, Date … ) Needs more study – more restriction – Probabilistic Model ??

23 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 23 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source pattern lexicon ptlex

24 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 24 Query Expansion Generalize terms in a query with a newly discovered cluster – cf. Rocchio 1971 (Vector model) – Zhai and Lafferty 2001 (Language-modeling)

25 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 25 Overview Semantic Clustering Scenario Description Semantic Cluster Information Retrieval Boot- strapping Query Expansion Source pattern lexicon ptlex

26 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 26 Outline Introduction Research Proposal – Problem Setting – Approach – Application to Information Extraction Discussion

27 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 27 Application to Information Extraction Semantic Clustering Scenario Description Semantic Cluster Preprocessing Entity Recognition Event Recognition Role Assignment Merging Pattern Matching Semantic Lexicon Extraction Patterns

28 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 28 Human Intervention Extraction patterns – Event pattern Context contains a verb or nominalization of verb Used for event extraction and role assignment e.g. ( terrorist, fire, x ) – Local pattern Context contains only enough information to recognize semantic class Used for entity recognition only e.g. ( x,Inc. ) Association of Event Pattern to Role – e.g. (company, hire, x)  PersonIn and (company, fire, x)  PersonOut

29 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 29 Outline Introduction Research Proposal – Problem Setting – Approach – Application to Information Extraction Discussion

30 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 30 Discussion Domain Portability – User only needs to specify the scenario Language Portability – Language-dependent Tools Segmentation (Lemmatization) Dependency Parsing

31 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 31 Evaluation MUC-style (Scenario-Template task) – Slot-base Precision, Recall, F-measure – Domain Portability Several pre-defined tasks that differ in difficulty – Language Portability Japanese English

32 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 32 Contribution Tool for Domain Analysis Low-cost Knowledge-base Acquisition Towards Open-domain Information Extraction

33 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 33 Conclusion Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) Discussed Evaluation with several predefined MUC- style tasks different in difficulty and across languages (Domain portability and Language portability)

34 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 34 ToDo Implementation Preparation for Evaluation Evaluation

35 August 9, 2002Kiyoshi Sudo Thesis Proposal Presentation 35 Time for Questions (Conclusion) Proposed New Approach for Knowledge-base Acquisition (Semantic Clustering) Discussed Application of Acquired KB to Information Extraction (Human Intervention and Local vs. Event patterns) Discussed Evaluation with several predefined MUC- style tasks different in difficulty and across languages (Domain portability and Language portability)


Download ppt "Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:"

Similar presentations


Ads by Google