Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010.

Similar presentations


Presentation on theme: "Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010."— Presentation transcript:

1 Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010

2 EBI Core Databases..... and ~ 40 others

3 Use Cases Query support and expansion Data visualization and exploration Summary level data presentation Data integration via ontology terms Meta analysis – human and mouse Semantic distance queries across experiments Cross products between – cell lines, tissues, cell types, diseases... Users – curators, biologists, engineers Intelligent template generation for different experiment types in submission or data presentation Detection of annotation inconsistency Annotator support, term suggestion Text mining at acquisition/submission for GEO data and post-hoc Literature text mining

4 Integration challenges 1,000,000 sample annotations in ArrayExpress (Aug 2009) Seq DBs, tissues, metagenomics, reactions, etc Cross database integration issues EGA/AE/ERA etc Name value pairs ‘Disease’ =‘cancer’, semi-controlled text, papers Algorithms, software, methods, Parameter annotation e.g. Virtual Physiological Human Complex phenotypes, clinical information Embedded literature, Pubmed abstracts, full text papers, supplemental information Most of the data relate to cell lines, tissues, disease samples, clinical information and phenotypes Millions of records, legacy data, since ~1985 www.ebi.ac.uk/efo

5 Phenotypes EBI Sample Database Molecular databases Genomes, genes ENSEMBL Proteins UniPROT Chemicals ChEBI Archives of supporting data Molecular Atlases European Nucleotide Archive Proteomics measurements Pride Metabolomics experiments A new database Molecules Pathways (Reactome) Transcript measurements ArrayExpress DBs European Sample database

6 Archive data – Lucene full text searching plus ontology searching Nikolay Kolesnikov, Anna Zhukova

7 Atlas Querying All genes under/over expressed in cell types per species, where cell type is annotated as a variable

8 EFO Vital Statistics May 2010, release 2.3 (23 monthly releases), 2888 classes (832 no xrefs) Built in Protégé, OWL, uses DL converted to OBO Available via OLS, BioPortal, www.ebi.ac.uk/EFOwww.ebi.ac.uk/EFO Focus on diseases, cell types, cell lines, ‘mammalian anatomy’, plant terms, compound, experimental processes and hardware OWL tools available – ontology differ Mapped to 24 semantic resources Malaria Ontology (MALIDO) ver0.2b Mammalian phenotype (MP) ver1.309 Medical Subject Headings (MSH) ver2009_2009_02_13 International Classification of Diseases (ICD-9) ver9 Phenotypic quality (PATO) ver1.188 CRISP Thesaurus Version 2.5.2.0 Mosquito gross anatomy (TGMA) ver1.10 Human disease (DOID) ver1.88 Chemical entities of biological interest (CHEBI) ver1.59 Drosophila gross anatomy (FBbt) ver1.30 Foundational Model of Anatomy (FMA) ver3.0 The Arabidopsis Information Resource (TAIR) (various dates) The Jackson Lab mouse database SNOMED Clinical Terms (SNOMEDCT) ver2009_01_31 Ontology for Biomedical Investigations (OBI) ver2009-11-06 Philly Units of measurement (UO) ver1.21 Microarray experimental conditions (MO) ver1.3.1.1 Plant structure (PO) Minimal anatomical terminology (MAT) ver1.1 NIFSTD (nif) ver1.4 NCI Thesaurus (NCIt) ver09.07 Cell type (CL) ver1.40 Zebrafish anatomy and development (ZFA) ver1.23 BRENDA tissue / enzyme source (BTO) ver1.3, Relations ontology 1.2, BFO.

9 Building the Experimental Factor Ontology Position of EFO in the ‘bigger picture’ Key is orthogonal coverage, reuse of existing resources and shared frameworks Disease Ontology Anatomy Reference Ontology EFO Cell Type Ontology Chemical Entities of Biological Interest (ChEBI) Various Species Anatomy Ontologies Relation Ontology Text mining

10 Deploying EFO Text mining at data acquisition Ontology driven queries Data mining Data driven ontology development Term requests for source ontologies AE/GEO acquire 310,000 assays Experiment Archive Re-annotate, summarize, add semantics ATLAS Gene Expression Atlas

11 11 Sample Annotations ~ 1,000,000 in Archive, ~ 10,000 Atlas

12 Desiderata for the Cell Type Ontology Release with hematopoietic cell types ASAP Mass deprecation release ASAP All leaf nodes defined in text/logically Cross products – anatomy, GO process Cell line x cell types More orthogonality - CTO as a definitive source MIREOT for appropriate terms EFO will import CTO name spaces (when?) Synonyms - non-exact=bad

13 VBO – Vertebrate bridging ontology Collaboration between ArrayExpress, MRC Harwell, Cambridge Anatomy and Genetics Scope: mouse, human, rat, teleosts FMA view creation – ‘mammalian view’ of anatomy Mapping to existing ontologies – single species, Uberon Modelling using ‘homologous to’ relationship Skeletal focus, adult stages Evidence for homology – literature, experts, phylogeny Workshop June 2010

14 Production People Tomasz Adamusiak Tony Burdett Emma HastingsAnna Farne Ele Holloway Margus Lukk James Malone Natalja Kurbatova Helen Parkinson Morris Swertz Raven Travillian Eleanor Williams Alvis Ugis Misha Kapushesky Alvis Brazma Ugis Sarkans Gen2Phen Visitor


Download ppt "Gene Expression Data Annotation – an application of the cell type ontology Helen Parkinson, PhD 19 May 2010."

Similar presentations


Ads by Google