Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011.

Slides:



Advertisements
Similar presentations
The ArrayExpress Gene Expression Database: a Software Engineering and Implementation Perspective Ugis Sarkans European Bioinformatics Institute.
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you structure your database?
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Abstract BarleyBase ( is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Minimum Information About a Microarray Experiment - MIAME MGED 5 workshop.
Reproducible HTS research: MINSEQE and more Chris Stoeckert Dept. of Genetics, Perelman School of Medicine CHOP/Penn NGS Symposium June 17, 2011.
Systems Biology Data Dissemination Working Group 25FEB2015.
MIAME and Data Standards Phillip Lord. Why Standards? "However, there is a subtle implication that standardization (fixation) is a good thing". An anonymous.
Using ArrayExpress. ArrayExpress is an international public repository for well-annotated microarray data, including gene expression, comparative genomic.
MARS: Microarray analysis, retrieval, and storage system Albert F. Cervantes.
1 ArrayExpress and MAGE Jamboree II Ugis Sarkans, EBI.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
EBI is an Outstation of the European Molecular Biology Laboratory. MAGE-TAB - The ArrayExpress Production Experience Helen Parkinson, PhD.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.
Call in: Participant Passcode: Centra: Meeting ID: ICR_meetinghttp://ncicb.centra.com April 1, 2009 caArray.
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008.
The MGED Society Facilitating Data Sharing and Integration with Standards CTSA Omics Data Standards Working Group Chris Stoeckert Dept. of Genetics and.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
EBI is an Outstation of the European Molecular Biology Laboratory. EBI Bioinformatics Roadshow ILRI/BecA Nairobi Campus 2 nd - 3 rd March 2011.
Bioinformatics: Making sense of functional genomics data Chris Stoeckert, Ph.D. Dept of Genetics and Penn Center for Bioinformatics, University of Pennsylvania.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright OpenHelix. No use or reproduction without express written consent1.
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
PLEXdb Plant Expression database Ethalinda Cannon Iowa State University January 15th, 2007.
K-12 Web Content Development Process
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
The European Bioinformatics Institute MGED ontology for consistent annotation of microarray experiments Manchester Bioinformatics Week Ontologies Workshop1.
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
SAGExplore web server tutorial for Module II: Genome Mapping.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
NGS data analysis CCM Seminar series Michael Liang:
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
Introduction to caArray caBIG ® Molecular Analysis Tools Knowledge Center April 3, 2011.
MIAMExpress development October 2002 Mohammad shojatalab
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
MIAMExpress and the development of annotation ontologies for gene expression experiments Ele Holloway Microarray Informatics European Bioinformatics Institute.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
A plant-specific annotation and submission tool for the incorporation of Arabidopsis gene expression data into ArrayExpress, the EBI’s public DNA microarray.
RADical microarray data: standards, databases, and analysis Chris Stoeckert, Ph.D. University of Pennsylvania Yale Microarray Data Analysis Workshop December.
Overview of Bioinformatics 1 Module Denis Manley..
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Gene Expression Omnibus (GEO)
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Annotation-based meta-analysis of microarray experiments Chris Stoeckert Yale Biostatistics Seminar Series Feb. 26, 2008.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
The MGED Ontology W3C Workshop on Semantic Web for life Sciences October 27, 2004 Presented by Liju Fan MGED Ontology Working Group Senior Scientist, KEVRIC.
A collaborative tool for sequence annotation. Contact:
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
CaArray User Community Meeting Feature Overview and Review of MAGE-TAB Update and Export Specification Call in: Participant Passcode:
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Describing and Annotating Experimental Data: Hands On.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
ArrayExpress Ugis Sarkans EMBL - EBI
Overview and Demo of CaIntegrator2 A Tool for Publishing and Analyzing Integrated Study Data.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Using ArrayExpress.
ArrayExpress and Gene Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
How to store and visualize RNA-seq data
Presentation transcript:

Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011

Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

Datasets to Contact us about Your deliverables –Microarray experiments –High Throughput sequencing experiments (RNA-seq, ChIP-seq, FAIRE-seq, etc.) –RT-PCR screens –Other deliverables – we can discuss how to integrate Other key datasets –From your lab but from different funding –From the literature

Steps to get a study into Beta Cell Contact us. Let us know what is coming and when so we can schedule working with you. Fill out the Ten Questions. When we get this from you, we can generate an initial spreadsheet (MAGE-TAB) for you to complete. Fill out highlighted areas of the MAGE-TAB. We will go back and forth with you on details to get it right. Send us your data. We will set up a FTP account for you. Send us the raw data (e.g., Affymetrix CEL files, FASTQ sequence reads) and the processed data that the conclusions are based upon. Set a release schedule. We will load the dataset and incorporate into queries and web pages as appropriate. We need to set when to release to the BCBC and to the general public. –We can also submit your data to ArrayExpress or, if desired, GEO. View/Query your dataset. Beta Cell has releases every 3 to 4 months.

Timeline Completion of MAGE-TAB: –Requires back and forth between the CC and the contact person in the investigator’s lab –Time to completion depends on responsiveness of such a contact person –Until the MAGE-TAB is completed, data loading cannot occur Data loading: –Once the MAGE-TAB is completed and all necessary files have been delivered, time to load the data depends on the size of your study –For a typical study data loading takes a few weeks –Missing files will delay the process Keep in mind that when you contact us to submit a study, you will be put in a queue and the process of getting your study into Beta Cell Genomics will start once you reach the top of the queue Studies that are meant to be viewable on the BCBC website (either by the general public or by BCBC investigators only) have priority over private studies, i.e. a study which is to be kept private will be placed lower in the queue

Policies to follow and documents to use Ten Questions about your dataset –Available as a BCBC miscellaneous resource – aneous/ Bioinformatics/Epigenomics Working group –RNA-seq and ChIP-seq recommendations Includes checklists for data and information to provide –Mike Snyder will provide overview and discuss

Meeting Deliverables For a study to be considered fully “delivered”, the following is required on the investigator’s part: –Provide answers to the initial 10 questions and all necessary data files –Respond to all inquiries needed to generate an accurate MAGE-TAB –Allow your study to be visible (at least by other BCBC investigators) on the Beta Cell website

Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

MGED Standards What information is needed for a microarray experiment? –MIAME: Minimal Information About a Microarray Experiment. Brazma et al., Nature Genetics 2001 How do you “code up” microarray data? –MAGE-OM: MicroArray Gene Expression Object Model. Spellman et al., Genome Biology 2002 –MAGE-TAB Rayner et al., BMC Bioinformatics 2006 What words do you use to describe a microarray experiment? –MO: MGED Ontology. Whetzel et al. Bioinformatics 2006

hybridisation labelled nucleic acid arrayRNA extractSample Array design hybridisation labelled nucleic acid arrayRNA extractSamplehybridisation labelled nucleic acid arrayRNA extractSample hybridisation labelled nucleic acid arrayRNA extractSample hybridization labeled nucleic acid MicroarrayRNA extractSample Experiment Gene expression data matrix normalization integration Protocol genes MIAME in a nutshell (ala Alvis Brazma) Stoeckert et al. Drug Discovery Today TARGETS 2004

hybridisation labelled nucleic acid arrayRNA extractSample Array designhybridisation labelled nucleic acid arrayRNA extractSamplehybridisation labelled nucleic acid arrayRNA extractSample hybridisation labelled nucleic acid arrayRNA extractSample hybridisation nucleic acid Microarray RNA extractSample Experiment Gene expression data matrix normalization integration Protocol genes Sequencing is replacing array GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB

hybridisation labelled nucleic acid arrayRNA extractSample Array designhybridisation labelled nucleic acid arrayRNA extractSamplehybridisation labelled nucleic acid arrayRNA extractSample hybridisation labelled nucleic acid arrayRNA extractSample hybridisation nucleic acid Microarray Chromatin, DNA extract Sample Experiment ChiP-Seq MeDIP-Seq Etc. normalization integration Protocol genes Sequencing is replacing array GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB

From MGED to FGED What information is needed for an HTS experiment? –MINSEQE: Minimum Information about a high- throughput SeQuencing Experiment How do you “code up” functional genomics data? –MAGE-TAB can still be utlized What words do you use to describe a functional genomics experiment? –OBI: Ontology for Biomedical Investigations, incorporates MO

MAGE-TAB Format What is MAGE-TAB? A simple spreadsheet view consisting of 2 files: –IDF: describing the experiment design, contact details, variables, and protocols –SDRF: a spreadsheet with columns that describe samples, annotations, protocol references, assays, and data –Linked data files (e.g. CEL files) are referenced by the SDRF Where can I get MAGE-TAB from? ~10,000 MAGE-TAB files are available from ArrayExpress (includes GEO derived and ArrayExpress data) caArray also provides MAGE-TAB files for download Who is using MAGE-TAB? BioConductor GenePattern MeV and Beta Cell Genomics!

IDF file for E-TABM-34 IDF = Investigation Description Format

SDRF file for E-TABM-34 SDRF = Sample and Data Relationship Format

A microarray expression study IDF

Experimental Design

Following 1 sample: bench component OrganismPart black border = biomaterials red border = treatments

in-silico component image acquisition feature extraction summarization (feature extraction II) and quantile normalization

SDRF Let’s focus on the highlighted row

From design to MAGE-TAB

Viewing the Annotation

Querying the Annotation

Loading and Analyzing the Data Image and.CEL files are archived and their location stored in the database Raw and processed data loaded into the database Downstream analyses (e.g. differential expression) are performed, generating gene lists Analysis results loaded into the database

Querying the Data

A ChIP-Seq study IDF

Experimental Design

Bench Component

In-silico Component Ptf1a_s5_seq.txts5_eland.txt Ptf1a_s4_seq.txts4_eland.txt Input_s8_seq.txt s8_eland.txt Rbpjl_s6_seq.txts6_eland.txt Input_s2_seq.txt s2_eland.txt Rbpjl_s4_seq.txts4_eland.txt Ptf1a_s5 Ptf1a_s4 Input_s8 Rbpjl_s6 Input_s2 Rbpjl_s4 Ptf1a_peaks Rbpjl_peaks cluster generation image acquisition sequencing alignment peak calling

SDRF

Viewing the Annotation

Querying the Annotation

Viewing the Data

Querying the Data

Topics to cover Time line for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

Annotare - An open source standalone MAGE-TAB editor Shankar R, Parkinson H, Burdett T, Hastings E, Liu J, Miller M, Srinivasa R, White J, Brazma A, Sherlock G, Stoeckert CJ Jr, Ball CA. Annotare - a tool for annotating high-throughput biomedical investigations and resulting data. Bioinformatics Aug 23.

Annotare - an open source MAGE-TAB Editor Annotare is an annotation tool for high throughput gene expression experiments in MAGE-TAB format. Researchers can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed, references to publications, details of biological samples, arrays, and experimental data produced in the investigation.

Annotare Features Intuitive graphical user interface forms for editing Ontology support, an inbuilt ontology and web services connectivity to bioportal Searchable standard templates Design wizard Validation module Mac and Windows Support

Annotare Demo File Gallery: Three different ways to get started Looking at an existing MAGE-TAB –Form versus sheet view Using a template Using the wizard