VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to.

Slides:



Advertisements
Similar presentations
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Advertisements

Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
Mouse Genome Informatics November 2008 Paul Szauter MGI User Support.
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Genome Annotation BCB 660 October 20, From Carson Holt.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Microrray Data Standardisation Microarray Gene Expression Database group -- MGED December, 2000.
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Title: GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes By Peter F. Hallin, Hans-Henrik Stærfeldt, Eva Rotenberg, Tim T. Binnewies,
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
VectorBase Seth Redmond Imperial College, London
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
1 Update on ArrayExpress & standards Ugis Sarkans, EBI.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
Gene Expression Omnibus (GEO)
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
ArrayExpress – a public database for microarray gene expression data Helen Parkinson Microarray Informatics Team European Bioinformatics Institute MGED.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Sook Jung, Jill Wegrzyn, David Neale and Dorrie Main An easy to use, web-based solution for specialty.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
DATA MANAGEMENT AND CURATION AT TAIR
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Lao H. Saal 1,3,*, Carl Troein 2,*, Johan Vallon-Christersson 1,*, Sofia Gruvberger 1, Björn Samuelsson 2, Åke Borg 1 and Carsten.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
EnsMart: A Generic System for Fast and Flexible Access to Biological Data Arek Kasprzyk et al (2004) 14: , Genome research EBI, Wellcome Trust.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Variation data in VectorBase NIH/NIAID VectorBase site visit March 2015.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
VectorBase Vectorbase probe mapping. VectorBase Automatic Annotation browser Array data CHADO Manual Annotation XML vectorbase Automatic Annotation.
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
TEMBLOR mid-term review Participation in DESPRAD project Bernd Drescher Robert Wagner.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Student Submissions Integrity Diagnosis System (SSID) Min-Yen Kan.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
Transcriptomics: GeneSpring/EST integration Joe Wood.
The State of Microarrays The Scientist: 2003 By: Hien Dang.
ArrayExpress Ugis Sarkans EMBL - EBI
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Data Mining with BioMart
Transcriptomics on Bio-Linux
Using ArrayExpress.
Functional Annotation of the Horse Genome
An ecosystem of contributions
Ensembl Genome Repository.
Presentation transcript:

VectorBase Gene expression data in VectorBase Fotis Kafatos, George Christophides, Bob MacCallum & Seth Redmond Imperial College London (thanks also to EBI, Sanger and ND)

VectorBase Outline 1.Project goals 2.What’s currently available 3.Current challenges and future plans

VectorBase Project goals For vector biologists: –Easy access to gene expression data consistent data processing For array specialists: –ArrayExpress submission –Advanced analysis tools –Array annotation

VectorBase BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS BASE: BioArray Software Environment Open source, active development and user community LIMS, data storage, export and analysis Web-based, user/group access control BASE 2.x adoption will bring Affy support

Data submission Community submission guidelines available First batch of experiments loaded by us Bulk data loader Sample/experiment annotation requires intervention from curators

VectorBase BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS ArrayExpress ‘PUBLIC’ STORAGE Data held in BASE is largely MIAME compliant Script for semi- automated export in TAB2MAGE format One experiment submitted so far

VectorBase BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS ArrayExpress ‘PUBLIC’ STORAGE

VectorBase BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS ArrayExpress ‘PUBLIC’ STORAGE DATA SUMMARIES BASE web interface offers powerful and extendable analysis environment Can be used for multi- site collaborations on pre-publication data Steep learning curve/not 100% intuitive Not easily linked to We provide simpler views so the casual user can quickly draw biological inferences

VectorBase

Standardised data All displayed data is processed in the same way: 1.Poor quality spots removed Currently using submitted spot flags 2.Normalisation “lowess” for two-colour experiments

VectorBase

BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS ArrayExpress ‘PUBLIC’ STORAGE DATA SUMMARIES PROBE MAPPING 3 probe types 6 array designs Mapping handled via Ensembl pipeline: –Oligo  exonerate –PCR  e-PCR –cDNA  exonerate2genes

VectorBase GENOMIC DATA AUTOMATIC ANNOTATION GENOME BROWSER VectorBase BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS ArrayExpress ‘PUBLIC’ STORAGE DATA SUMMARIES PROBE MAPPING GFF3

VectorBase contigview

VectorBase featureview

VectorBase

BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS VECTOR BIOLOGISTS ARRAY BIOLOGISTSGENOME BIOLOGISTS ArrayExpress ‘PUBLIC’ STORAGE VectorBase GENOMIC DATA AUTOMATIC ANNOTATION GENOME BROWSER DATA SUMMARIES PROBE MAPPING DATA MINING

VectorBase BioMart Beta version currently available – Improvements still needed: –experiment annotations –Alignments (i.e. handle split alignments) Federation with current marts Integration with new data?

VectorBase Current challenges and future plans How do you want to query? CVs & ontologies APIs Community submission Manual annotation

VectorBase Querying strategy What do you want to query on? –Fetch all genes upregulated under condition X –Fetch all experiments with gene X and condition Y –Fetch all probes with expression similar to probe X All essentially boil down to: –Define probe (genes etc) –Define significant expression ANOVA? Up/down-regulation WRT what? –Define experimental conditions Sample annotation Experimental design

BULK LOADER EXPRESSION DATA STORAGE & ANALYSIS VECTOR BIOLOGISTS ARRAY BIOLOGISTSGENOME BIOLOGISTS CV / ONTOLOGY ArrayExpress ‘PUBLIC’ STORAGE GENOMIC DATA AUTOMATIC ANNOTATION GENOME BROWSER DATA SUMMARIES PROBE MAPPING DATA MINING

STORAGE & ANALYSIS ‘PUBLIC’ STORAGE GENOME BROWSER DATA SUMMARIES DATA MINING BULK LOADER EXPRESSION DATA GENOMIC DATA AUTOMATIC ANNOTATION CV / ONTOLOGY ArrayExpress Array API ? AE API ?e! API MartJ / MQL PROBE MAPPING

VectorBase Array API Perl / Java objects for retrieval / handling of array data –Dual purpose: Consistency & efficiency of VB expression website Computational access to VB data for all –Objects must be: General, DB-independent Compatible with pre-existing Bio API (BioPerl / BioJava) –Nb. May be pre-existing solution: ArrayExpress API? BioPerl-Expression? MAGE-OM-stk

VectorBase

Community data submission Carrot? –Help with ArrayExpress submission –Analysis tools –Dissemination Stick? –Outreach (courses, conferences) –Networking

VectorBase GE data  manual annotators Gene-build designed arrays –Negative evidence less compelling EST clone-based arrays –

VectorBase Longer term plans  Host-parasite GE data integration & analysis  GE-clusters  “upstream” regions  regulatory elements, upstream TFs  RNAi phenotypes  Images

VectorBase

CVs & ontologies Integrate MGED and specialist ontologies for –Body parts –Developmental stages –Disease processes –… Allows comparison across experiments with similar experimental conditions

BioMart Most biomarts: Gene-based Mostly ‘binary’ data –e.g. a gene either has a signal domain or doesn’t Easily linked with other (gene-based) biomarts VB Biomart: Probe based –Many probes not aligned Exp data less clear –e.g. define ‘differential expression’ Exports gene/trans IDs for linking to other Marts

VectorBase Clustering A priority? Easy to do on reporter level within experiments Harder to do at gene level across all experiments –Binary gene profile: “yes/no differentially expressed in experiment” ? Amazon-style links to “genes which may have similar expression profiles”?

VectorBase BASE 2.x Adoption delayed, now in progress Brings Affymetrix support Cleaner/modern interface Better API (Java)