Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011.

Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011

Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

Datasets to Contact us about Your deliverables –Microarray experiments –High Throughput sequencing experiments (RNA-seq, ChIP-seq, FAIRE-seq, etc.) –RT-PCR screens –Other deliverables – we can discuss how to integrate Other key datasets –From your lab but from different funding –From the literature

Steps to get a study into Beta Cell Contact us. Let us know what is coming and when so we can schedule working with you. Fill out the Ten Questions. When we get this from you, we can generate an initial spreadsheet (MAGE-TAB) for you to complete. Fill out highlighted areas of the MAGE-TAB. We will go back and forth with you on details to get it right. Send us your data. We will set up a FTP account for you. Send us the raw data (e.g., Affymetrix CEL files, FASTQ sequence reads) and the processed data that the conclusions are based upon. Set a release schedule. We will load the dataset and incorporate into queries and web pages as appropriate. We need to set when to release to the BCBC and to the general public. –We can also submit your data to ArrayExpress or, if desired, GEO. View/Query your dataset. Beta Cell has releases every 3 to 4 months.

Timeline Completion of MAGE-TAB: –Requires back and forth between the CC and the contact person in the investigator’s lab –Time to completion depends on responsiveness of such a contact person –Until the MAGE-TAB is completed, data loading cannot occur Data loading: –Once the MAGE-TAB is completed and all necessary files have been delivered, time to load the data depends on the size of your study –For a typical study data loading takes a few weeks –Missing files will delay the process Keep in mind that when you contact us to submit a study, you will be put in a queue and the process of getting your study into Beta Cell Genomics will start once you reach the top of the queue Studies that are meant to be viewable on the BCBC website (either by the general public or by BCBC investigators only) have priority over private studies, i.e. a study which is to be kept private will be placed lower in the queue

Policies to follow and documents to use Ten Questions about your dataset –Available as a BCBC miscellaneous resource –http://www.betacell.org/resources/data/miscell aneous/ Bioinformatics/Epigenomics Working group –RNA-seq and ChIP-seq recommendations Includes checklists for data and information to provide –Mike Snyder will provide overview and discuss

Meeting Deliverables For a study to be considered fully “delivered”, the following is required on the investigator’s part: –Provide answers to the initial 10 questions and all necessary data files –Respond to all inquiries needed to generate an accurate MAGE-TAB –Allow your study to be visible (at least by other BCBC investigators) on the Beta Cell website

Topics to cover Timeline for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

MGED Standards What information is needed for a microarray experiment? –MIAME: Minimal Information About a Microarray Experiment. Brazma et al., Nature Genetics 2001 How do you “code up” microarray data? –MAGE-OM: MicroArray Gene Expression Object Model. Spellman et al., Genome Biology 2002 –MAGE-TAB Rayner et al., BMC Bioinformatics 2006 What words do you use to describe a microarray experiment? –MO: MGED Ontology. Whetzel et al. Bioinformatics 2006

hybridisation labelled nucleic acid arrayRNA extractSample Array design hybridisation labelled nucleic acid arrayRNA extractSamplehybridisation labelled nucleic acid arrayRNA extractSample hybridisation labelled nucleic acid arrayRNA extractSample hybridization labeled nucleic acid MicroarrayRNA extractSample Experiment Gene expression data matrix normalization integration Protocol genes MIAME in a nutshell (ala Alvis Brazma) Stoeckert et al. Drug Discovery Today TARGETS 2004

hybridisation labelled nucleic acid arrayRNA extractSample Array designhybridisation labelled nucleic acid arrayRNA extractSamplehybridisation labelled nucleic acid arrayRNA extractSample hybridisation labelled nucleic acid arrayRNA extractSample hybridisation nucleic acid Microarray RNA extractSample Experiment Gene expression data matrix normalization integration Protocol genes Sequencing is replacing array technology @HWI-EAS266_0011:8:1:6:969#0/1 GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 _abbà[DZàabaa_a`b]___^âa_àa_aâ[\\aZTZVY @HWI-EAS266_0011:8:1:7:1688#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 a`âb`^D\a]a`b``b_bbbaabbâbaa``â_^_aa\]_VR @HWI-EAS266_0011:8:1:7:593#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_ @HWI-EAS266_0011:8:1:7:139#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 aab`[^YDY]Z\baaàabaaaaàaà]aa```\aY]^\]ZVX @HWI-EAS266_0011:8:1:7:1390#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 _U^b_`]D\__a_a`S```Y[a__]a\aa_`]àTVZ__\HYVX @HWI-EAS266_0011:8:1:7:1663#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB

hybridisation labelled nucleic acid arrayRNA extractSample Array designhybridisation labelled nucleic acid arrayRNA extractSamplehybridisation labelled nucleic acid arrayRNA extractSample hybridisation labelled nucleic acid arrayRNA extractSample hybridisation nucleic acid Microarray Chromatin, DNA extract Sample Experiment ChiP-Seq MeDIP-Seq Etc. normalization integration Protocol genes Sequencing is replacing array technology @HWI-EAS266_0011:8:1:6:969#0/1 GTTTGCCNGTGTGTACGCTACCCCCTTCTTGTGTGTGTGTGTCT +HWI-EAS266_0011:8:1:6:969#0/1 _abbà[DZàabaa_a`b]___^âa_àa_aâ[\\aZTZVY @HWI-EAS266_0011:8:1:7:1688#0/1 AAGATGANGGCAGGGTGCAAGATGGCAGGATGCAAGATGGCAGG +HWI-EAS266_0011:8:1:7:1688#0/1 a`âb`^D\a]a`b``b_bbbaabbâbaa``â_^_aa\]_VR @HWI-EAS266_0011:8:1:7:593#0/1 CAGTTCANTTCTCAGCACCACACTGGGATGCTCACACATGCCTG +HWI-EAS266_0011:8:1:7:593#0/1 abbbb_VD[bbbba_`bbbbbbbbbbbaa_`bbaabaabb_aa_ @HWI-EAS266_0011:8:1:7:139#0/1 CATGGGGNATAATTGCAATCCCCGATCCCCATCACGAATGGGGT +HWI-EAS266_0011:8:1:7:139#0/1 aab`[^YDY]Z\baaàabaaaaàaà]aa```\aY]^\]ZVX @HWI-EAS266_0011:8:1:7:1390#0/1 GAATAATNGAATAGGACCGCGGTTCTATTTTGTTGGTTTTCGGA +HWI-EAS266_0011:8:1:7:1390#0/1 _U^b_`]D\__a_a`S```Y[a__]a\aa_`]àTVZ__\HYVX @HWI-EAS266_0011:8:1:7:1663#0/1 TGATGTTNGTGGCAATAATGGGGGTAGCGGCAATGGTGGCGGGG +HWI-EAS266_0011:8:1:7:1663#0/1 a`[_X]\DQTZ[^YYa[[aXV[PZUUYSYBBBBBBBBBBBBBBB

From MGED to FGED What information is needed for an HTS experiment? –MINSEQE: Minimum Information about a high- throughput SeQuencing Experiment How do you “code up” functional genomics data? –MAGE-TAB can still be utlized What words do you use to describe a functional genomics experiment? –OBI: Ontology for Biomedical Investigations, incorporates MO

MAGE-TAB Format What is MAGE-TAB? A simple spreadsheet view consisting of 2 files: –IDF: describing the experiment design, contact details, variables, and protocols –SDRF: a spreadsheet with columns that describe samples, annotations, protocol references, assays, and data –Linked data files (e.g. CEL files) are referenced by the SDRF Where can I get MAGE-TAB from? ~10,000 MAGE-TAB files are available from ArrayExpress (includes GEO derived and ArrayExpress data) caArray also provides MAGE-TAB files for download Who is using MAGE-TAB? BioConductor GenePattern MeV and Beta Cell Genomics!

IDF file for E-TABM-34 IDF = Investigation Description Format

SDRF file for E-TABM-34 SDRF = Sample and Data Relationship Format

A microarray expression study IDF

Experimental Design

Following 1 sample: bench component OrganismPart black border = biomaterials red border = treatments

in-silico component image acquisition feature extraction summarization (feature extraction II) and quantile normalization

SDRF Let’s focus on the highlighted row

From design to MAGE-TAB

Viewing the Annotation

Querying the Annotation

Loading and Analyzing the Data Image and.CEL files are archived and their location stored in the database Raw and processed data loaded into the database Downstream analyses (e.g. differential expression) are performed, generating gene lists Analysis results loaded into the database

Querying the Data

A ChIP-Seq study IDF

Experimental Design

Bench Component

In-silico Component Ptf1a_s5_seq.txts5_eland.txt Ptf1a_s4_seq.txts4_eland.txt Input_s8_seq.txt s8_eland.txt Rbpjl_s6_seq.txts6_eland.txt Input_s2_seq.txt s2_eland.txt Rbpjl_s4_seq.txts4_eland.txt Ptf1a_s5 Ptf1a_s4 Input_s8 Rbpjl_s6 Input_s2 Rbpjl_s4 Ptf1a_peaks Rbpjl_peaks cluster generation image acquisition sequencing alignment peak calling

Viewing the Annotation

Querying the Annotation

Viewing the Data

Querying the Data

Topics to cover Time line for a dataset from contact to web site Policies to follow and documents to use Ten questions about your dataset Creating a MAGE-TAB document with us Seeing your dataset on the Beta Cell web site A tool you can use for MAGE-TAB: Annotare

Annotare - An open source standalone MAGE-TAB editor Shankar R, Parkinson H, Burdett T, Hastings E, Liu J, Miller M, Srinivasa R, White J, Brazma A, Sherlock G, Stoeckert CJ Jr, Ball CA. Annotare - a tool for annotating high-throughput biomedical investigations and resulting data. Bioinformatics. 2010 Aug 23.

Annotare - an open source MAGE-TAB Editor Annotare is an annotation tool for high throughput gene expression experiments in MAGE-TAB format. Researchers can describe their investigations with the investigators’ contact details, experimental design, protocols that were employed, references to publications, details of biological samples, arrays, and experimental data produced in the investigation.

Annotare Features Intuitive graphical user interface forms for editing Ontology support, an inbuilt ontology and web services connectivity to bioportal Searchable standard templates Design wizard Validation module Mac and Windows Support http://code.google.com/p/annotare/

Annotare Demo File Gallery: Three different ways to get started Looking at an existing MAGE-TAB –Form versus sheet view Using a template Using the wizard

Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011.

Similar presentations

Presentation on theme: "Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011.

Similar presentations

Presentation on theme: "Time line and procedures for datasets BCBC Pre-retreat Workshop Tyson’s Corner, VA May 11, 2011."— Presentation transcript:

Similar presentations

About project

Feedback