Session 1: WELCOME AND INTRODUCTIONS 2017 Session 1: WELCOME AND INTRODUCTIONS
Instructors and Teaching Assistants Main Instructors: James C. Fleet, PhD (Nutrition Science) Wanqing Liu, PhD (Medicinal Chemistry and Molecular Pharmacology) Pete Pascuzzi, PhD (Libraries) Min Zhang, PhD (Statistics) Teaching Assistants: (Statistics) Chen Chen Min Ren Kirsen Sullivan Will Eagan Harley Schawadron Fleet 2017
Introductions Who are you? Where are you from? What is your research interest? Why are you interested in “big data”? Fleet 2017
Workshop Overview Fleet and Pascuzzi Unit 1: Microarray Unit 2: Next Generation Sequencing Liu and Zhang Unit 3: Biomarker Discovery Unit 4: Genetic Variation Technical Goals: Analysis pipelines Statistical issues Visualization Functional annotation Databases Project management Computation and programming Fleet 2017
Course Materials http://www.stat.purdue.edu/bigtap/index.html
Guest Lecturers Doug Crabill (Purdue University) Bruce Craig (Purdue University) Xiang Zhang (University of Louisville) Sean Davis (National Cancer Institute) Dan Raftery (University of Washington) Yonglan Zheng (University of Chicago) Nancy Cox (Vanderbilt University) Nadia Atallah (Purdue University) Fleet 2017
Session 2: Working with the Purdue Computer Infrastructure Doug Crabill Department of Statistics Purdue University
Sites to Understand Computing UNIX operating system Learn UNIX http://www.tutorialspoint.com/unix/index.htm Linux operating system http://www.tutorialspoint.com//operating_system/os_linux.htm R coding http://bioinformatics.knowledgeblog.org/2011/06/21/using-r-a-guide-for-complete-beginners/ https://www.r-project.org/about.html Fleet 2017
Session 3: Data Repositories and Pre-processed Data Sites James C. Fleet, PhD Distinguished Professor Department of Nutrition Science
Data Archives Web link Description NIH Data Sharing Repositories https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html Trans-NIH BioMedical Informatics Coordinating Committee (BMIC) sites Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ NCBI; transcriptome and ChIP-seq datasets Array Express http://www.ebi.ac.uk/arrayexpress/ EMBL-EBI repository to archive functional genomics data European Nucleotide Archive (ENA) http://www.ebi.ac.uk/ena Comprehensive record of worlds nucleotide sequencing information The Cancer Genome Atlas (TCGA) http://cancergenome.nih.gov/ Multi "omic" phenotype characterization of tumors Proteomics IDEntifications (PRIDE) http://www.ebi.ac.uk/pride/archive/ European proteomics datasets Metabolomics Workbench http://metabolomicsworkbench.org/standards/nominatecompounds.php metabolomic datasets Fleet 2017
Genotype-Tissue Expression project (Gtex) Data Archives Web link Description Oncomine https://www.oncomine.org/resource/login.html 715 microarray datasets from 19 cancers Gene Expression across Normal and Tumor tissue (GENT) http://medical-genome.kribb.re.kr/GENT/ gene expression patterns in human cancer from Affy Chips (+ 1000 cell lines) cBioPrortal http://www.cbioportal.org/ TCGA cancer genomics Genotype-Tissue Expression project (Gtex) http://www.gtexportal.org/home/ human, multi-tissue gene expression and gene variation for eQTL Immunological Genome Project (Immgen) http://www.immgen.org/ transcriptome data from cultured mouse immune cells Human Brain Transcriptome http://hbatlas.org/ transcriptome and associated metadata for developing and adult human brain. NHLBI Kidney Transcriptome database https://hpcwebapps.cit.nih.gov/ESBL/Database/Transcriptomic/index.html Segment-specific expression in rat kidney Kidney Systems Biology Project https://hpcwebapps.cit.nih.gov/ESBL/Database/ Multi-omic database from rat and mouse studies Saccharomyces Genome Database http://www.yeastgenome.org/transcriptome-data-in-yeastmine Integrated biological information on budding yeast miRBase http://mirbase.org/ published miRNA sequences, annotation. Expression dataset links available Fleet 2017
Fleet 2017
Training GEO Datasets Unit 1 and 2 GSE15947: Time course of 1,25(OH)2 D treated RWPE1 cells (Unit 1) GSE80182: A TGFb-PRMT5-MEP50 Axis regulates cancer cell invasion through histone H3 and H4 arginine methylation coupled to transcriptional activation and repression. (Unit 2) GSE #: Accession number for an original, submitter supplied record that summarizes a study GDS #: GSE data that is reassembled by GEO staff into a curated data set GSM #: Accession number for a specific sample within a dataset GPL #: The platform used to generate a dataset SRX #: Accession number for a sample generated by NGS that is deposited in the Sequence Read Archive (SRA) Fleet 2017
Assignment 1 (Individual) Search GEO for datasets that relate to your research Select one dataset Identify important information about your dataset Description/design GSE and GDS # Sample information Platform Analyze your dataset using GEO2R Tools in the dataset browser