Extracting Disease-Gene Associations from MEDLINE abstracts Tsujii laboratory University of Tokyo.

Slides:



Advertisements
Similar presentations
Application of the NLP techniques to IE and IR CREST.
Advertisements

Natural Language Tools and Resources for Biomedical Information Extraction Yoshimasa Tsuruoka Tsujii laboratory University of Tokyo.
1 National Centre for Text Mining Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community.
Feature Forest Models for Syntactic Parsing Yusuke Miyao University of Tokyo.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Natural Language Processing Projects Heshaam Feili
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Semantic Role Labeling Abdul-Lateef Yussiff
Introduction to treebanks Session 1: 7/08/
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Classification of Gene-Phenotype Co-Occurences in Biological Literature Using Maximum Entropy CIS Term Project Proposal November 1, 2002 Sharon Diskin.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
SMBM Talks SMBM, Cambridge, April (Edinburgh May 2) NLP for Biomedical Text Mining.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Automated Patent Classification By Yu Hu. Class 706 Subclass 12.
Concept Clustering, Summarization and Annotation Qiaozhu Mei.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Natural Language Processing
Information Extraction From Medical Records by Alexander Barsky.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
NLP Tools for Biology Literature Mining Qiaozhu Mei Jing Jiang ChengXiang Zhai Nov 3, 2004.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
1 Semi-Supervised Approaches for Learning to Parse Natural Languages Rebecca Hwa
1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
University of Texas at Austin Machine Learning Group Integrating Co-occurrence Statistics with IE for Robust Retrieval of Protein Interactions from Medline.
Seeking Abbreviations From MEDLINE Jeffrey T. Chang Hinrich Schütze Russ B. Altman Presented by: Bo Han.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Richard Tzong-Han Tsai, Po-Ting Lai, Hong-Jie Dai, Chi-Hsin Huang,Yue-Yang Bow Yen-Ching Chang,Wen-Harn Pan, Wen-Lian Hsu HypertenGene: Extracting key.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.
Text Mining and Knowledge Management Junichi Tsujii GENIA Project, Kototoi Project ( tokyo.ac.jp/GENIA/) Computer Science, University.
5/6/04Biolink1 Integrated Annotation for Biomedical IE Mining the Bibliome: Information Extraction from the Biomedical Literature NSF ITR grant EIA
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Supplementary Figure 2: Representative Kaplan-Meier plots of overall survival considering alterations erbB signaling pathway genes and p53 in lung cancer.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Literature Mining and Database Annotation of Protein Phosphorylation Using a Rule-based System Z. Z. Hu 1, M. Narayanaswamy 2, K. E. Ravikumar 2, K. Vijay-Shanker.
Parallel Tools for Natural Language Processing Mark Brigham Melanie Goetz Andrew Hogue / March 16, 2004.
1 GAPSCORE: Finding Gene and Protein Names one Word at a Time Jeffery T. Chang 1, Hinrich Schutze 2 & Russ B. Altman 1 1 Department of Genetics, Stanford.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Statistical Natural Language Parsing Parsing: The rise of data and statistics.
Language Identification and Part-of-Speech Tagging
PRESENTED BY: PEAR A BHUIYAN
Fig. 1. EGFR content as determined by fluorescence in situ hybridization (FISH) and immunohistochemical staining. FISH was performed with the EGFR ( red.
Lung squamous cell carcinoma
CRF &SVM in Medication Extraction
Deep Exploration and Filtering of Text (DEFT)
CIS Term Project Proposal November 1, 2002 Sharon Diskin
Applying Key Phrase Extraction to aid Invalidity Search
LING 581: Advanced Computational Linguistics
(A) Survival time. (A) Survival time. All patients. (a) PFS since the start of EGFR-TKI (groups A, B and C). (b) OS since the start of EGFR-TKI (groups.
By Hossein Hematialam and Wlodek Zadrozny Presented by
Artificial Intelligence 2004 Speech & Natural Language Processing
Gene Regulation and Mutation
Presentation transcript:

Extracting Disease-Gene Associations from MEDLINE abstracts Tsujii laboratory University of Tokyo

Outline NLP tools –Part-of-speech tagger, HPSG parser Machine learning based approach for extracting Disease-Gene Association Evaluation –Precision / recall / f-score –Effectiveness of predicate argument structures DGA explorer Annotation tool

Part-of-speech tagger Trained on the corpus containing newspaper articles and biology texts. Training corpus WSJGENIA WSJ97.0%84.3% GENIA75.2%98.1% WSJ+GENIA96.9%98.1%

HPSG parser Output –Phrase structures (e.g. np, vp, pp) –Predicate-argument structures demonstrate ARG1: we ARG2: 1, activates ARG1: E2F-1 ARG2: promoter 1, … We demonstrate that E2F-1 activates the promoter.

Parsing MEDLINE Corpus –1,500,000 MEDLINE abstracts Parsing speed –5 secs / sentence Server –PC cluster (100 processors) Time –10 days

Extracting Disease-Gene Association Preliminary experiments –Patterns on predicate-argument structures accelerates ARG1: GENE ARG2: DISEASE demonstrates ARG1: DISEASE ARG2: GENE … Low recall and precision

Machine learning based approach Sentence selectionExtracted association Using the patterns on predicate-argument structures as the features for machine learning

Training data The latter is also implied by fibroblast-associated alterations in tumor cell morphology and ECM distribution in the system. Lung fibrosis is a fatal condition of excess extracellular matrix (ECM) deposition associated with increased transforming growth factor beta (TGF- beta) activity. All foals with OLWS were homozygous for the Ile118Lys EDNRB mutation, and adults that were homozygous were not found. Dominant radial drusen and Arg345Trp EFEMP1 mutation. The 5 year overall survival (OS) and event-free survival (EFS) were 94 and 90 +/- 8%, respectively, with a median follow-up of 48 months. These data may indicate that formation of parathyroid adenoma in young patients is related to a mechanism involving EGFR.

Maximum entropy learning Log-linear model Binary-valued feature function Weight of the feature Features –Bag-of-words –Local context –Gene/disease name –Predicate-argument structures – :