We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byAnissa Lipson
Modified about 1 year ago
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Co-Chair: Alexander Yeh, MITRE Corp. Data: FlyBase (http://www.flybase.org) July 2002 KDD Cup 2002 Task1: Information Extraction from Biomedical Articles
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Task Background Biomedical information exists in -Research literature -More than 280 semi-structured databases. Some examples: FlyBase: fruit fly genes and proteins Mouse Genome Database (MGB) Protein Information Resource (PIR) Some databases act as distillations of subsets of the literature Currently, curators (people) read the literature to manually update (curate) the databases
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. FlyBase: Example of Data Curation
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Curators Cannot Keep Up with the Literature! FlyBase References By Year
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Task Rationale and Description We want to see what can be done to help automate this curation process Start fairly simple. Try to help automate part of what one group of curators needs to do: -Determine which papers need to be curated for fruit fly gene expression information “Curate” means read in detail to make entries in the database -Want to curate those papers containing experimental results on gene products (RNA transcripts and proteins)
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Task Description: For a Set of Papers on Genetics or Molecular Biology Given for each paper -The full text of that paper -A list of the genes mentioned in that paper Determine for each paper -Does that paper contain any curatable gene product information (experimental results)? -For each gene mentioned in the paper, does that paper have experimental results for Transcript(s) of that gene? Protein(s) of that gene? Also produce a ranked list of the papers -Rank the curatable papers before the non-curatable papers
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Task is Harder Than It First Appears Interested in results applicable to “regular” (found in the wild) flies, not mutants Genes have multiple names (synonyms) -Given a list of the known synonyms -But list may be incomplete -Some names can refer to more than one gene E.g., “Clk” is a symbol for one gene (Clock) and is also a synonym for another gene (period, symbol is “per”) Contestants given evidence of experimental results found in the training data, -But only in the form that is recorded in the FlyBase database
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Evidence of Results Found in Training Data as Recorded in FlyBase Database (DB) records what evidence is found in a training paper, but not where in that paper The evidence is often recorded in a “normalized” form and domain knowledge is needed to find the corresponding text, e.g., DB: Assay mode: “immunolocalization” Text (PubMed ID#9006979): “ Figure 12. …Whole-mount tissue staining using an affinity- purified anti-PHM antibody in the CNS … This view displays only a portion of the CNS ” Term “immunolocalization” is not in the text Instead, text describes an instance of performing immunolocalization
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Task Details Task has 3 sub-tasks, that contribute equally to the overall score -1. Ranked-list of papers (curatable before non- curatable) Look at area under ROC curve ROC = Receiver Operating Characteristic Prob(detection) versus Prob(false alarm) -2. Yes/No decisions on the papers being curatable (having any results of interest) Use balanced F-score -3. Yes/No decisions for having results for each type of product (transcript, protein) for each gene mentioned Use balanced F-score
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Result Statistics 18 teams submitted 32 entries Entries from 7 “countries”: -Japan, Taiwan, Singapore, India, UK, Portugal, USA Team type (lead member, some teams had multiple types): -Company: 7 -University: 8 -Other: 3
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. WINNER Combined team from ClearForest and Celera -Contacts: Yizhar Regev, Michal Finkelstein -Also had the best score in each of the 3 sub-tasks Best Median Ranked-list: 84% 69% Yes/No curate paper: 78% 58% Yes/No gene products: 67% 35% -The top entries for the ranked-list sub-task all had close scores for this sub-task
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Honorable Mentions (East to West): Combined team from the Design Technology Institute Ltd., the Mechanical Engineering Dept. at the National University of Singapore, and the Genome Institute of Singapore -Contact: Shi Min Combined team from the data mining group at Imperial College (UK) and Inforsense Limited -Contacts: Huma Lodhi, Yong Zhang Combined team from Verity, Inc. and Exelixis, Inc. -Contact: Bin Chen
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Acknowledgements People at FlyBase, especially -William Gelbart -Beverly Matthews -Leyla Bayraktaroglu -David Emmert -Don Gilbert Project team at MITRE -Alexander Morgan -Lynette Hirschman
© 2003 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Critical Assessment of Information Extraction Systems in Biology (BioCreAtIvE) Marc Colosimo Lynette.
© 2006 The MITRE Corporation. ALL RIGHTS RESERVED. Lynette Hirschman The MITRE Corporation Bedford, MA, USA RegCreative Jamboree Nov 29-Dec 1, 2006 Text.
LESSONS FROM THE BIOCREATIVE PROTEIN- PROTEIN INTERACTION (PPI) TASK RegCreative Jamboree, Friday, December, 1st, (2006) MARTIN KRALLINGER, 2006 LESSONS.
Accomplishments and Challenges in Literature Data Mining for Biology L. Hirschman et al. Presented by Jing Jiang CS491CXZ Spring, 2004.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
BioCreAtIvE Critical Assessment for Information Extraction in Biology Granada, Spain, March28-March 31, 2004 Task 2: Functional annotation of gene products.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
Using The Gene Ontology: Gene Product Annotation.
KDD Cup Task 2 Mark Craven Department of Biostatistics & Medical Informatics Department of Computer Sciences University of Wisconsin
Sub title here KDD Cup Task 1 Information Extraction from Biomedical Articles System Description June / July 2002.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Knowledge Integration for Gene Target Selection Graciela Gonzalez, PhD Juan C. Uribe Contact:
Genetic Literature Curation at FlyBase-Cambridge Steven Marygold ABC meeting, December 2007 A Database of.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Associating Biomedical Terms: Case Study for Acetylation Aaron Buechlein Indiana University School of Informatics Advisor: Dr. Predrag Radivojac.
Gene Ontology John Pinney
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
CS276B Text Information Retrieval, Mining, and Exploitation Lecture 16 Bioinformatics II March 13, 2003 (includes slides borrowed from J. Chang, R. Altman,
A collaborative tool for sequence annotation. Contact:
Copyright OpenHelix. No use or reproduction without express written consent1.
The EST database is a collection of short single-read transcript sequences from GenBank. These sequences provide a resource to evaluate gene expression,
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Evidence-Based Information Retrieval in Bioinformatics Timothy B. Patrick, PhD Healthcare Administration and Informatics, University of Wisconsin-Milwaukee.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Toward a Unified Gene Page GMOD Meeting, April 2004 Don Gilbert,
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Computational analysis of protein-protein interactions for bench biologists 2-8 September, Berlin Protein Interaction Databases Francesca Diella.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
CACAO - Penn State Gene Function and Gene Ontology January 2011
Cis-Regulatory/ Text Mining Interface Discussion.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
CACAO Training Jim Hu and Suzi Aleksander Fall 2015.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Writing an original research paper Part one: Important considerations Amin Bredan, PhD, Editor Department of Biomedical Molecular Biology Ghent University.
Analysis Environments For Scientific Communities From Bases to Spaces Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign.
Bioinformatics and Computational Biology. Bioinformatics collection and storage of biological information derives knowledge from computer analysis.
BIOBASE Training TRANSFAC ® Containing data on eukaryotic transcription factors, their experimentally-proven binding sites, and regulated genes ExPlain™
© 2017 SlidePlayer.com Inc. All rights reserved.