Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI-0317510.

Slides:



Advertisements
Similar presentations
1 I256: Applied Natural Language Processing Marti Hearst Nov 15, 2006.
Advertisements

Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario.
Basi di dati distribuite Prof. M.T. PAZIENZA a.a
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Information Extraction.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Information Extraction from the World Wide Web CSE 454 Based on Slides by William W. Cohen Carnegie Mellon University Andrew McCallum University of Massachusetts.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst October 11, 2004.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario Joint work with Marti Hearst SIMS, UC Berkeley.
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.
I256 Applied Natural Language Processing Fall 2009 Lecture 13 Information Extraction (1) Barbara Rosario.
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/2010 Overview of NLP tasks (text pre-processing)
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
Citances and What should our UI look like? Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from Genentech.
Information Extraction
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Introduction to Text Mining
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
School of Engineering and Computer Science Victoria University of Wellington COMP423 Intelligent agents.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Extraction Junichi Tsujii Graduate School of Science University of Tokyo Japan Ronen Feldman Bar Ilan University Israel.
December 2005CSA3180: Information Extraction I1 CSA3180: Natural Language Processing Information Extraction 1 – Introduction Information Extraction Named.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario SIMS UC Berkeley.
Information Extraction Yunyao Li EECS /SI /29/2006.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
December 2005CSA3180: Information Extraction I1 CSA2050: Natural Language Processing Information Extraction Named Entities IE Systems MUC Finite State.
CSC 594 Topics in AI – Text Mining and Analytics
Types of Extraction. Wrappers 2 IE from Text 3 AttributeWalmart ProductVendor Product Product NameCHAMP Bluetooth Survival Solar Multi- Function Skybox.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Finite State Parsing & Information Extraction CMSC Intro to NLP January 10, 2006.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Some Work on Information Extraction at IRL Ganesh Ramakrishnan IBM India Research Lab.
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
India Research Lab © Copyright IBM Corporation 2006 Entity Annotation using operations on the Inverted Index Ganesh Ramakrishnan, with Sreeram Balakrishnan.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Sequential Learning 1. WHAT IS SEQUENTIAL LEARNING? 2.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
CS 4705 Lecture 17 Semantic Analysis: Robust Semantics.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Introduction to “Event Extraction” Jan 18, What is “Information Extraction” Filling slots in a database from sub-segments of text. As a task: October.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Data Acquisition. Get all data necessary for the analysis task at hand Some data comes from inside the company –Need to go and talk with various data.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
CSC 594 Topics in AI – Natural Language Processing
CSC 594 Topics in AI – Text Mining and Analytics
Introduction to Information Extraction
Category-Based Pseudowords
Statistical NLP: Lecture 9
The Descent of Hierarchy, and Selection in Relational Semantics*
WHIRL – Reasoning with IE output
Classifying Semantic Relations in Bioscience Texts
Statistical NLP : Lecture 9 Word Sense Disambiguation
Marti Hearst Associate Professor SIMS, UC Berkeley
Presentation transcript:

Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from Genentech

Adapted from Dan Klein's slides (CS 294-5) Natural Language Processing Goal: Deep understand of broad language It’d be great if machines could: Translate for us Write up our research Find out information for us Summarize But they can’t Language is ambiguous, flexible, complex, subtle

NLP in practice Syntactic analysis: Part-of-Speech Tagging Parsing Shallow parsing Applications: Text Classification (sort of) Question Answering Spelling Correction (sort of) Machine Translation Information retrieval Information Extraction

Identification and classification of small units within documents

Extracting Job Openings from the Web foodscience.com-Job2 JobTitle: Ice Cream Guru Employer: foodscience.com JobCategory: Travel/Hospitality JobFunction: Food Services JobLocation: Upper Midwest Contact Phone: DateExtracted: January 8, 2001 Source: OtherCompanyJobs: foodscience.com-Job1

Adapted from slide by William Cohen6 What is Information Extraction Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION

Adapted from slide by William Cohen7 What is Information Extraction Filling slots in a database from sub-segments of text. As a task: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… NAME TITLE ORGANIZATION Bill Gates CEO Microsoft Bill Veghte VP Microsoft Richard Stallman founder Free Soft.. IE

Adapted from slide by William Cohen8 Information Extraction = segmentation + classification + association As a family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation aka “named entity extraction” What is Information Extraction

Adapted from slide by William Cohen9 Information Extraction = segmentation + classification + association A family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation What is Information Extraction

Adapted from slide by William Cohen10 What is Information Extraction Information Extraction = segmentation + classification + association A family of techniques: October 14, 2002, 4:00 a.m. PT For years, Microsoft Corporation CEO Bill Gates railed against the economic philosophy of open-source software with Orwellian fervor, denouncing its communal licensing as a "cancer" that stifled technological innovation. Today, Microsoft claims to "love" the open- source concept, by which software code is made public to encourage improvement and development by outside programmers. Gates himself says Microsoft will gladly disclose its crown jewels--the coveted code behind the Windows operating system--to select customers. "We can be open source. We love the concept of shared source," said Bill Veghte, a Microsoft VP. "That's a super-important shift for us in terms of code access.“ Richard Stallman, founder of the Free Software Foundation, countered saying… Microsoft Corporation CEO Bill Gates Microsoft Gates Microsoft Bill Veghte Microsoft VP Richard Stallman founder Free Software Foundation

Adapted from Dan Klein's CS slides Semantic Roles Define roles to be extracted Application dependent JobTitle, Employer, JobCategory, JobLocation… But we would like them to be more “general” Linguistic theories, granularity of roles Proto-agent, proto-patient Fillmore’s case theory has 9 roles (agent patient, location, experimenter, etc) Extreme view: each verb has its own set of roles Buyer, bought_thing, seller, sold_thing Middle view: roles are particular to a semantic Frame (like transaction)

Roles in the Biomedical domain Treatment and Disease A two-dose combined hepatitis A and B vaccine would facilitate immunization programs Proteins A caveolin dependent coupling of PrPc to the tyrosine kinase Fyn was observed

Relations Person-affiliation: Affiliation(Gates, Microsoft) = CEO Location: Location(Microsoft) = Redmond Protein1 inhibits (or activates, releases) protein2

Problem: Which relations hold between 2 entities? TreatmentDisease Cure? Prevent? Side Effect?

Hepatitis Examples Cure These results suggest that con A-induced hepatitis was ameliorated by pretreatment with TJ-135. Prevent A two-dose combined hepatitis A and B vaccine would facilitate immunization programs Vague Effect of interferon on hepatitis B

Two tasks Relationship Extraction: Identify the several semantic relations that can occur between the entities disease and treatment in bioscience text Entity extraction: Related problem: identify such entities Much of the important, late-breaking bioscience information is found only in textual form. We need both task to extract useful information from text and to make inference

The Approach Data: MEDLINE abstracts and titles Collection of 4,600 biomedical journals Graphical models and Neural Network Lexical, syntactic and semantic features

Data and Relations MEDLINE, abstracts and titles 3662 sentences labeled Relevant: 1724 Irrelevant: 1771 e.g., “Patients were followed up for 6 months” 2 types of Entities, many instances treatment and disease 7 Relationships between these entities The labeled data is available at

Semantic Relationships 810: Cure Intravenous immune globulin for recurrent spontaneous abortion 616: Only Disease Social ties and susceptibility to the common cold 166: Only Treatment Flucticasone propionate is safe in recommended doses 63: Prevent Statins for prevention of stroke

Semantic Relationships 36: Vague Phenylbutazone and leukemia 29: Side Effect Malignant mesodermal mixed tumor of the uterus following irradiation 4: Does NOT cure Evidence for double resistance to permethrin and malathion in head lice

Features Word Part of speech Phrase constituent Orthographic features ‘is number’, ‘all letters are capitalized’, ‘first letter is capitalized’ … MeSH (semantic features) Replace words, or sequences of words, with generalizations via MeSH categories Peritoneum -> Abdomen

Features (cont.): MeSH MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

Features (cont.): MeSH 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A ] Inguinal Canal [A ] Peritoneum [A ] + Umbilicus [A ] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)

Models Graphical Models (static and dynamic) Neural networks

Graphical Models Graph theory plus probability theory Nodes are variables Edges are conditional probabilities Absence of an edge between nodes implies conditional independence between the variables of the nodes A BC P(A) P(B|A) P(C|A)  P(C|A,B)

Graphical Models for Role and Relation Extraction StaticDynamic

Graphical Models Relation node: Semantic relation (cure, prevent, none..) expressed in the sentence

Graphical Models Role nodes: 3 choices: treatment, disease, or none

Graphical Models Feature nodes (observed): word, POS, MeSH…

Graphical Models Joint probability distribution over relation, roles and features nodes Parameters estimated with maximum likelihood and absolute discounting smoothing Task: Find P(Role | observable features) P(Relation | observable features )

Neural Networks Feed-forward network (MATLAB) Same features

Relation extraction Results in terms of classification accuracy (with and without irrelevant sentences) 2 cases: Roles hidden Roles given

Relation classification: Results SentencesInputDynamic GMNN Only relevant Only features Roles given Relevant + Irrelevant Only features Roles given

Relation classification: Results SentencesInputDynamic GMNN Only relevant Only features Roles given Relevant + Irrelevant Only features Roles given

Role extraction Results in terms of F-measure NN: Couldn’t run it (features vectors too large) Graphical models can do role extraction and relationship classification simultaneously

Role Extraction: Results F-measures SentencesDynamic GM Only relevant0.73 Relevant + irrelevant 0.71

Features impact: Role Extraction Most important features: 1)Word, 2)MeSH Models Dynamic All features 0.71 No word % No MeSH % (rel. + irrel.)

Most important features: Roles Accuracy: GM NN All feat. + roles All feat. – roles % -17.8% All feat. + roles – Word % -0.5% All feat. + roles – MeSH % 0.4% Features impact: Relation classification (rel. + irrel.)

Features impact: Relation classification Most realistic case: Roles not known Most important features: 1) Mesh for NN and word for GM Accuracy: GM NN All feat. – roles All feat. - roles – Word % -4.3% All feat. - roles – MeSH % -6.9% (rel. + irrel.)

Conclusions Classification of subtle semantic relations in bioscience text Discriminative model (neural network) achieves high classification accuracy Graphical models for the simultaneous extraction of entities and relationships Importance of lexical hierarchy Future work: A new collection of disease/treatment data Different entities/relations Unsupervised learning to discover relation types

Thank you! Barbara Rosario Marti Hearst SIMS, UC Berkeley

Additional slides

Several DIFFERENT Relations between the Same Types of Entities Thus differs from the problem statement of other work on relations Many find one relation which holds between two entities (many based on ACE) Agichtein and Gravano (2000), lexical patterns for location of Zelenko et al. (2002) SVM for person affiliation and organization-location Hasegawa et al. (ACL 2004) Person- Organization -> President “relation” Craven (1999, 2001) HMM for subcellular- location and disorder-association Doesn’t identify the actual relation

Related work: Bioscience Many hand-built rules Feldman et al. (2002), Friedman et al. (2001) Pustejovsky et al. (2002) Saric et al.; this conference

Slide by Chris Manning, based on slides by several others45 MUC: the genesis of IE DARPA funded significant efforts in IE in the early to mid 1990’s. Message Understanding Conference (MUC) was an annual event/competition where results were presented. Focused on extracting information from news articles: Terrorist events Industrial joint ventures Company management changes Information extraction of particular interest to the intelligence community (CIA, NSA). (Note: early ’90’s)

Adapted from slide by Lucian Vlad Lita46 Message Understanding Conference (MUC) Named entity Person, Organization, Location Co-reference Clinton  President Bill Clinton Template element Perpetrator, Target Template relation Incident Multilingual

Adapted from slide by Lucian Vlad Lita47 MUC Typical Text Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production of 20,000 iron and “metal wood” clubs a month

Adapted from slide by Lucian Vlad Lita48 MUC Typical Text Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production of 20,000 iron and “metal wood” clubs a month

Adapted from slide by Lucian Vlad Lita49 MUC Templates Relationship tie-up Entities: Bridgestone Sports Co, a local concern, a Japanese trading house Joint venture company Bridgestone Sports Taiwan Co Activity ACTIVITY 1 Amount NT$2,000,000

Adapted from slide by Lucian Vlad Lita50 MUC Templates ATIVITY 1 Activity Production Company Bridgestone Sports Taiwan Co Product Iron and “metal wood” clubs Start Date January 1990

Graphical Models Different dependencies between the features and the relation nodes D3 D1 S1 D2 S2

Relation classification: Confusion Matrix Computed for the model D2, “rel + irrel.”, “only features”

Smoothing: absolute discounting Lower the probability of seen events by subtracting a constant from their count (ML estimate: ) The remaining probability is evenly divided by the unseen events

F-measures for role extraction in function of smoothing factors

Relation accuracies in function of smoothing factors