1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with.

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

Jack Jedwab Association for Canadian Studies September 27 th, 2008 Canadian Post Olympic Survey.
1 A B C
Angstrom Care 培苗社 Quadratic Equation II
AP STUDY SESSION 2.
1
& dding ubtracting ractions.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
David Burdett May 11, 2004 Package Binding for WS CDL.
We need a common denominator to add these fractions.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Custom Statutory Programs Chapter 3. Customary Statutory Programs and Titles 3-2 Objectives Add Local Statutory Programs Create Customer Application For.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Break Time Remaining 10:00.
Factoring Quadratics — ax² + bx + c Topic
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Bellwork Do the following problem on a ½ sheet of paper and turn in.
CS 6143 COMPUTER ARCHITECTURE II SPRING 2014 ACM Principles and Practice of Parallel Programming, PPoPP, 2006 Panel Presentations Parallel Processing is.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.5 Dividing Polynomials Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Adding Up In Chunks.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Before Between After.
Subtraction: Adding UP
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
Essential Cell Biology
Converting a Fraction to %
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
ANSC644 Bioinformatics-Database Mining 1 ANSC644 Bioinformatics §Carl J. Schmidt §051 Townsend Hall §
Clock will move after 1 minute
famous photographer Ara Guler famous photographer ARA GULER.
PSSA Preparation.
& dding ubtracting ractions.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
Presentation transcript:

1 Joint Inference for Knowledge Extraction from Biomedical Literature Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Lucy Vanderwende at Microsoft Research)

2 Outline Motivation Bio-event extraction Our system Experimental results Conclusion

3 Knowledge Extraction From Web …… WWW

4 Knowledge Extraction From Web If we succeed …… Breach knowledge acquisition bottleneck Semantic search, question answering, … But where should we start? More urgent and/or amenable General approaches

5 Knowledge Extraction From Biomedical Literature PubMed: 18 million abstracts; += 2000 / mo. Success would mean: Revolutionize biomedical research Dramatic speed-up in drug design Grammatical English General challenges: Beyond traditional information extraction Complex, nested structures Naturally call for joint inference

6 BioNLP: An Emerging Field Protein name recognition Protein-protein interaction Bio-event extraction: Shared task of 2009 [Kim et al. 2009] Pathway Network ……

7 BioNLP: An Emerging Field Protein name recognition Protein-protein interaction (top F1 ~ 60%) Bio-event extraction: Shared task of 2009 [Kim et al. 2009] Pathway Network …… This talk

8 This Talk: Bio-Event Extraction We present the first joint approach that achieves state-of-the-art results Based on Markov logic [Domingos & Lowd 2009] Novel formulation that expands the scope of joint inference Adding a few joint inference formulas to simple logistic regression doubles the F1

9 Outline Motivation Bio-event extraction Our system Experimental results Conclusion

10 Bio-Event: State change of bio-molecules Gene expression Transcription Protein catabolism Localization Phosphorylation Binding Regulation Positive regulation Negative regulation

11 Example Involvement of p70(S6)-kinase activation in IL-10 up- regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1... T1 Protein p70(S6)-kinase T2 Protein IL-10 T3 Protein gp41 T4 Regulation 0 11Involvement T5 Positive_regulation 30 40activation E1 Regulation:T4 Theme:E2 Cause:T3 E2 Positive_regulation:T5 Theme:T1 …

12 Why Is It Hard? Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1...

13 Why Is It Hard? Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocytes by gp41 envelope protein of human immunodeficiency virus type 1... involvement up-regulation IL-10 human monocyte Site ThemeCause gp41 p70(S6)-kinase activation Theme Cause Theme Traditional information extraction ignores this

14 Why Is It Hard? Variations in denoting same events E.g., negative regulation 532 inhibited, 252 inhibition, 218 inhibit, 207 blocked, 175 inhibits, 157 decreased, 156 reduced, 112 suppressed, 108 decrease, 86 inhibitor, 81 Inhibition, 68 inhibitors, 67 abolished, 66 suppress, 65 block, 63 prevented, 48 suppression, 47 blocks, 44 inhibiting, 42 loss, 39 impaired, 38 reduction, 32 down-regulated, 29 abrogated, 27 prevents, 27 attenuated, 26 repression, 26 decreases, 26 down-regulation, 25 diminished, 25 downregulated, 25 suppresses, 22 interfere, 21 absence, 21 repress ……

15 Why Is It Hard? Same word denotes different events E.g., appearance in the nucleus Localization mRNA Transcription IL-2 activity Positive-regulation ……

16 Participants

17 Top System: UTurku Adopts the pipeline architecture First, determines event candidates and types Then, classifies for each pair of candidates whether the latter is a theme or cause No way to feedback information to events given evidence of arguments Decisions are made independently

18 Joint Inference for Bio-Event Extraction Complex, nested structures naturally argue for joint inference However, under-explored for this task Previous best joint approach [Riedel et al. 2009] still lags UTurku by a large margin

19 Outline Motivation Bio-event extraction Our system Experimental results Conclusion

20 Design Desiderata Jointly predict events and arguments Incorporate prior knowledge, e.g., Each event has a theme Only regulation events can have cause Expand scope of joint inference to include individual dependency edges

21 Markov Logic [Domingos & Lowd 2009] Syntax: Weighted first-order formulas Semantics: Feature templates for Markov nets A Markov Logic Network (MLN) is a set of pairs ( F i, w i ) where F i is a formula in first-order logic w i is a real number Number of true groundings of F i

22 Markov Logic Unifying framework for joint inference A plethora of efficient algorithms available Open-source implementation: Alchemy alchemy.cs.washington.edu

23 Input: Stanford Dependencies involvement up-regulation IL-10 human monocyte prep_in nnprep_by gp41 p70(S6)-kinase activation prep_in prep_of nn Involvement of p70(S6)-kinase activation in IL-10 up-regulation in human monocyte by gp41 …

24 Joint Predictions involvement up-regulation IL-10 human monocyte prep_in nnprep_by gp41 p70(S6)-kinase activation prep_in prep_of nn Trigger word? Event type? Trigger word? Event type? Trigger word? Event type? Trigger word? Event type? Trigger word? Event type? Trigger word? Event type? Trigger word? Event type?

25 Joint Predictions involvement IL-10 human monocyte prep_in nnprep_by gp41 p70(S6)-kinase activation prep_in prep_of nn In theme path? In cause path? In theme path? In cause path? In theme path? In cause path? In theme path? In cause path? In theme path? In cause path? In theme path? In cause path?

26 Why Individual Dependencies? regulate dobj IL-10 regulate dobj protein regulate dobj IL-8 IL-10 nn conj … regulate IL-10 … … regulate IL-10 protein … … regulate IL-8 and IL-10 …

27 Why Individual Dependencies? regulate dobj IL-10 regulate dobj protein regulate dobj IL-8 IL-10 nn conj … regulate IL-10 … … regulate IL-10 protein … … regulate IL-8 and IL-10 … Beginning of theme paths

28 Why Individual Dependencies? regulate dobj IL-10 regulate dobj protein regulate dobj IL-8 IL-10 nn conj … regulate IL-10 … … regulate IL-10 protein … … regulate IL-8 and IL-10 … Continuation of a path …

29 MLN For Bio-Event Extraction Logistic regression Hard constraints Linguistically motivated joint formulas

30 Logistic Regression Lexical evidence E.g.: activation probably refers to positive-regulation Syntactic evidence E.g.: nsubj probably leads to a cause Lexical-syntactic evidence E.g.: nsubj from binds probably leads to a theme

31 Hard Constraints Events E.g.: Event must have a theme Argument paths E.g.: If edge s t is in a theme path, then either s is an event or there is some p s in the theme path Decisions about events and argument edges interdependent with each other

32 Linguistically-Motivated Joint Formulas Syntactic alternations, e.g.: A increases the level of B The level of B increases Add context-specific formula E.g., if increases signifies an event, and it has both nsubj and dobj dependencies, then nsubj probably leads to a cause

33 Correct Syntactic Error with Semantic Information Coordination: expression of IL-8 and IL-10 expression IL-8 IL-10 prep_of conj expression IL-8 IL-10 prep_of conj

34 Correct Syntactic Error with Semantic Information PP-attachment: involvement of IL-8 in IL-10 regulation involvement IL-8 regulation prep_of prep_in IL-10 nn involvement IL-8 regulation prep_of prep_in IL-10 nn

35 Outline Motivation Bio-event extraction Our system Experimental results Conclusion

36 Dataset BioNLP-09 Shared Task (PubMed abstracts) Training: 800 Development: 150 Test: 260 Main evaluation criteria for the task Event-level recall, precision, F1 Account for nested event structures

37 Experiment Objectives Relative contributions of feature components Identify the bottlenecks for performance Comparison with state-of-the-art systems

38 Results: Development Set F1 LR

39 Results: Development Set F1 LRLR+HARD Add hard joint inference formulas 26

40 Results: Development Set F1 LRLR+HARDFULL Add soft joint inference formulas 2

41 Results: Development Set F1 LRLR+HARDNO-SYN-FIXFULL If no fixing syntactic errors 4

42 Results: Development Set F1 LRLR+HARDNO-SYN-FIXUTurkuFULL UTurku

43 Per-Type Performance Event F1 Catabolism92 Phosphorylation87 Expression77 Localization75 Transcription71 Binding48 Negative-Reg.46 Positive-Reg.46 Regulation37

44 Per-Type Performance Event F1Trigger-Word F1 Catabolism9291 Phosphorylation8790 Expression7780 Localization7573 Transcription7170 Binding4871 Negative-Reg.4664 Positive-Reg.4668 Regulation3751

45 Results: Test Set F1 UTurkuJULIELabRiedel et al.Our MLNConcordU Reduce F1 error by over 10% Compare to previous best joint approach

46 Future Work Incorporate more features More joint inference opportunities Leverage discourse (e.g., coreference) Joint syntactic / semantic processing

47 Conclusion First joint approach for bio-event extraction with state-of-the-art results Based on Markov Logic Novel formulation with expanded joint inference Correcting syntactic errors with semantic information helps