Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center,

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Bio-Medical Interaction Extractor Syed Toufeeq Ahmed ASU.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
“Signal transduction biochemistry: a field afflicted with many facts and blessed with only a few unifying principles.” R. A. Weinberg.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
BioContrasts: Extracting and Exploiting Protein-protein Contrastive Relations from Biomedical Literature Jung-jae Kim 1, Zhuo Zhang 2, Jong C. Park 1 and.
IntEx: A Syntactic Role Driven Protein-Protein Interaction Extractor for Bio-Medical Text Syed Toufeeq Ahmed Deepthi Chidambaram Hasan Davulcu Chitta Baral.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Copyright © Ariadne Genomics, Inc. All Rights Reserved Molecular Networks in Mammals: Extraction from Literature and Microarray Analysis by Ilya.
Biomedical Information Extraction. Outline Intro to biomedical information extraction PASTA [Demetriou and Gaizauskas] Biomedical named entities Name.
Classification of Gene-Phenotype Co-Occurences in Biological Literature Using Maximum Entropy CIS Term Project Proposal November 1, 2002 Sharon Diskin.
Protein-protein Interactions Hsueh-Fen Juan 2003, Mar 31 NTNU.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Information Extraction from Biomedical Text Jerry R. Hobbs Artificial Intelligence Center SRI International.
A Memory-Based Approach to Semantic Role Labeling Beata Kouchnir Tübingen University 05/07/04.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Applications of protomic Presented By: Muhammad Rizwan Roll no: Department of Bioinformatics.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
Introduction Tumor necrosis factor-  (TNF  ) is a pro-inflammatory cytokine important in immune responses TNF  inhibits cAMP-stimulated Cyp17 transcription.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Using Rhetorical Grammar in the English 90 Classroom.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
1/24 Learning to Extract Genic Interactions Using Gleaner LLL05 Workshop, 7 August 2005 ICML 2005, Bonn, Germany Mark Goadrich, Louis Oliphant and Jude.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
RLIMS-P: A Rule-Based Literature Mining System for Protein Phosphorylation Hu ZZ 1, Yuan X 1, Torii M 2, Vijay-Shanker K 3, and Wu CH 1 1 Protein Information.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
Outline Quick review of GS Current problems with GS Our solutions Future work Discussion …
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Ling 570 Day 17: Named Entity Recognition Chunking.
Finding High-frequent Synonyms of a Domain- specific Verb in English Sub-language of MEDLINE Abstracts Using WordNet Chun Xiao and Dietmar Rösner Institut.
Flexible Text Mining using Interactive Information Extraction David Milward
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
DNA MODIFICATIONS AND LONG-TERM PATTERNS OF GENE EXPRESSION EPIGENETICS PART 1 Feb 19, 2015.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
A Biology Primer Part IV: Gene networks and systems biology Vasileios Hatzivassiloglou University of Texas at Dallas.
1 Two Applications of Information Extraction to Biological Science Journal Articles: Enzyme Interactions and Protein Structures Kevin Humphreys, George.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Bioinformatics and Computational Biology
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Labeling protein-protein interactions Barbara Rosario Marti Hearst Project overview The problem Identifying the interactions between proteins. Labeling.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Negative regulation of cell cycle by intracellular signals Checkpoint p53 detects DNA damage & activates p21 p21 inhibits cdk2-cyclinA Intracellular Regulation.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
BIO409/509 Cell and Molecular Biology. SECOND Methods paper assignment due Wed., 4/20 (you don’t do this assignment if you are in the 4H STEM Ambassador.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
Gully A. Burns1, Pradeep Dasigi2, Eduard H. Hovy2
CIS Term Project Proposal November 1, 2002 Sharon Diskin
Cell Communication Part II
SIGNAL TRANSDUCTION Signal Transduction Pathway Protein Modification Phosphorylation Cascade Protein Kinases.
Social Knowledge Mining
Literature Data Mining and Protein Ontology Development
Complex Sentence Processor
Batyr Charyyev.
One SNP at a Time: Moving beyond GWAS in Psoriasis
Extracting Why Text Segment from Web Based on Grammar-gram
Ras and Rho GTPases Cell
Presentation transcript:

Towards Building A Database of Phosphorylate Interactions Extracting Information from the Literature M. Narayanaswamy & K. E. Ravikumar AU-KBC Center, Chennai, India & K. Vijay-Shanker University of Delaware

Information Extraction from the Literature Wealth of information available (only) in unstructured form (scientific literature) Need to store data in structured form (databases) for bioinformatics applications Information extraction is an active field. Focus in the biological domain -- extracting information on protein pairs that interact

Phosphorylation Extraction <Agent = Frp-1 Theme = p53 Site = Ser 15> <Agent = JNK Theme = c-Jun Site = unk>

Why Phosphorylation Central role in signal transduction. One of the more widely studied IE generalize to other post-translational modifications and binding. Different challenges –Agent and target – not just proteins –Site of phosphorylation

Steps in Text Processing and Information Extraction Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

BioNEx (PSB 2003) Detects Names/Terms of following types: –Protein/gene –Protein/gene parts –Chemicals –Source –Others Two tasks – Detection and Classification

Classification -- F-Term Names are often descriptive NPs –Simian immundeficiency virus –T cell –Mitogen-activated protein kinase –Ras guanine nucleotide exchange factor Description of function/class of entities Useful to assign types

Additional Sources for Classification Using context – h-terms such as “expression” – “…IL-2 expression…” Appositives –“Mek1, a tyrosine kinase,…” Acronyms –Mitogen-activated protein kinase (MAPK)…...MAPK … Coordination High precision and recall (PSB 2003)

Steps in IE Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

Phrase Chunking Detect BaseNPs –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10

Phrase Chunking Detect BaseNPs and Verb Groups –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10

Phrase Chunking Detect BaseNPs –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10 Verb groups (passive vs. active forms)

Phrase Chunking Detect BaseNPs –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10 Verb groups Appositives –… Sic1, an inhibitor …, is phosphorylated Relative Clauses – … Ser38 which is phosphorylated …

Steps in IE Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

Why type classification? A phosphorylated B in C –ATR/FRP-1 also phosphorylated p53 in Ser 15 … –Active Chk2 phosphorylated the SQ/TQ sites in Ckk2 SCD … –cdk9/cyclinT2 could phosphorylate the retinoblastoma gene (pRb) in human cell lines

Type Classification Extensive use of type information in rules Typing done by means of –Phrase internal -- e.g., Ras guanine nucleotide exchange factor Sos –Contextual – e.g., homolog of TAF(II) –syntactic information – appositive, coordination etc.

Steps in IE Basic Text Processing – e.g., sentence boundary detection. Part of Speech Tagging Name/term Detection Phrase (esp., Noun and Verb Phrase) chunking Type Classification of Terms and noun phrases Template Pattern Matching

Patterns and Templates (in/at )? –Active p90Rsk2 was found to be able to phosphorylate histone H3 at Ser10 Active, Passive, Adjectival forms for phoshorylate/phosphorylated Different orders and optionality of arguments

Patterns for Phosphorylation Non-Verbal (not common) but frequent Phosphorylation of (by )? (in/at )? Phosphorylation of … by/via phosphorylation (at )? Altogether, large number of patterns from examining 300 abstracts and 10 journal articles.

Sentence-Based Evaluation PrecisionRecall F-measure Agent Theme Site Relation Agent Theme Site Relation

Utility in Building Databases IE on 1000 abstracts – 5m/3s Precision on 200 abstracts –Relation > 92% Scales up well Useful for constructing DBs.

Discussion High precision and recall Beyond protein-protein (e.g., site) Non-verbal Generalizes to other post-translational modifications? (acetylate, methylation,…)

Future Work Piecemeal information specification X phosphorylates Y + phosphorylation of Y at Z = X phosphorylates Y at Z Fusion/Information Merging