Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;

Similar presentations


Presentation on theme: "DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;"— Presentation transcript:

1 DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory; Bio5 Institute, University of Arizona

2 …ride an educational Discovery Environment

3 Green Line : RNA Sequence (RNA-Seq) Analysis First fully GUI interface for RNA-Seq analysis — no command line or data conversions Accesses XSEDE system through the iPlant Agave API Co-localizes up to 100 GB of data in iPlant Data Store Look for differential gene expression in different tissues, life stages, or treatment Generate lists of expressed genes and fold-changes Annotate sequenced genomes; add results to Red Line projects

4 150 feet RNA code represents “active” DNA in genome

5 NCBI Reference Sequence: NM_ Homo sapiens taste receptor, type 2, member 38 (TAS2R38), mRNA CommentFeaturesSequence LOCUS NM_ bp mRNA linear PRI 04-DEC-2009 DEFINITION Homo sapiens taste receptor, type 2, member 38 (TAS2R38), mRNA. ACCESSION NM_ VERSION NM_ GI: KEYWORDS. SOURCE Homo sapiens (human) ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 1143) AUTHORS Dotson,C.D., Shaw,H.L., Mitchell,B.D., Munger,S.D. and Steinle,N.I. TITLE Variation in the gene TAS2R38 is associated with the eating behavior disinhibition in Old Order Amish women JOURNAL Appetite (2009) In press PUBMED REMARK GeneRIF: Observational study of gene-disease association. (HuGE Navigator) Publication Status: Available-Online prior to print REFERENCE 2 (bases 1 to 1143) AUTHORS Keller,K.L., Reid,A., Macdougall,M.C., Cassano,H., Lee Song,J., Deng,L., Lanzano,P., Chung,W.K. and Kissileff,H.R. TITLE Sex Differences in the Effects of Inherited Bitter Thiourea Sensitivity on Body Weight in 4-6-Year-Old Children JOURNAL Obesity (Silver Spring) (2009) In press PUBMED REMARK GeneRIF: Observational study of gene-disease association. (HuGE Navigator) Publication Status: Available-Online prior to print REFERENCE 3 (bases 1 to 1143) AUTHORS Tepper,B.J., Koelliker,Y., Zhao,L., Ullrich,N.V., Lanzara,C., d'Adamo,P., Ferrara,A., Ulivi,S., Esposito,L. and Gasparini,P. TITLE Variation in the bitter-taste receptor gene TAS2R38, and adiposity in a genetically isolated population in Southern Italy JOURNAL Obesity (Silver Spring) 16 (10), (2008) PUBMED REMARK GeneRIF: Report genetic polymorphisms in the bitter-taste receptor gene TAS2R38, and adiposity in a genetically isolated population in Southern Italy. GeneRIF: Observational study of gene-disease association. (HuGE Navigator) REFERENCE 4 (bases 1 to 1143) AUTHORS Mangold,J.E., Payne,T.J., Ma,J.Z., Chen,G. and Li,M.D. TITLE Bitter taste receptor gene polymorphisms are an important factor in the development of nicotine dependence in African Americans JOURNAL J. Med. Genet. 45 (9), (2008) PUBMED REMARK GeneRIF: TAS2R38 single nucleotide polymorphisms are an important factor in determining nicotine dependence in African Americans. GeneRIF: Observational study of gene-disease association. (HuGE Navigator) REFERENCE 5 (bases 1 to 1143) AUTHORS Sharma,K. TITLE Comparing sensory experience in bitter taste perception of phenylthiocarbamide within and between human twins and singletons: intrapair differences in thresholds and genetic variance estimates JOURNAL Anthropol Anz 66 (2), (2008) PUBMED REMARK GeneRIF: Quantitative variations in PTC tasting ability in twins and to estimate heritability of PTC taste perception on the taste of twin data on males and females sexes separately. REFERENCE 6 (bases 1 to 1143) AUTHORS Zhang,Y., Hoon,M.A., Chandrashekar,J., Mueller,K.L., Cook,B., Wu,D., Zuker,C.S. and Ryba,N.J. TITLE Coding of sweet, bitter, and umami tastes: different receptor cells sharing similar signaling pathways JOURNAL Cell 112 (3), (2003) PUBMED REFERENCE 7 (bases 1 to 1143) AUTHORS Bufe,B., Hofmann,T., Krautwurst,D., Raguse,J.D. and Meyerhof,W. TITLE The human TAS2R16 receptor mediates bitter taste in response to beta-glucopyranosides JOURNAL Nat. Genet. 32 (3), (2002) PUBMED REFERENCE 8 (bases 1 to 1143) AUTHORS Montmayeur,J.P. and Matsunami,H. TITLE Receptors for bitter and sweet taste JOURNAL Curr. Opin. Neurobiol. 12 (4), (2002) PUBMED REMARK Review article REFERENCE 9 (bases 1 to 1143) AUTHORS Margolskee,R.F. TITLE Molecular mechanisms of bitter and sweet taste transduction JOURNAL J. Biol. Chem. 277 (1), 1-4 (2002) PUBMED REMARK Review article REFERENCE 10 (bases 1 to 1143) AUTHORS Anne-Spence,M., Falk,C.T., Neiswanger,K., Field,L.L., Marazita,M.L., Allen,F.H. Jr., Siervogel,R.M., Roche,A.F., Crandall,B.F. and Sparkes,R.S. TITLE Estimating the recombination frequency for the PTC-Kell linkage JOURNAL Hum. Genet. 67 (2), (1984) PUBMED COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from BC This sequence is a reference standard in the RefSeqGene project. On Oct 17, 2009 this sequence version replaced gi: Summary: This gene encodes a seven-transmembrane G protein-coupled receptor that controls the ability to taste glucosinolates, a family of bitter-tasting compounds found in plants of the Brassica sp. Synthetic compounds phenylthiocarbamide (PTC) and 6-n-propylthiouracil (PROP) have been identified as ligands for this receptor and have been used to test the genetic diversity of this gene. Although several allelic forms of this gene have been identified worldwide, there are two predominant common forms (taster and non-taster) found outside of Africa. These alleles differ at three nucleotide positions resulting in amino acid changes in the protein (A49P, A262V, and V296I) with the amino acid combination PAV identifying the taster variant (and AVI identifying the non-taster variant). [provided by RefSeq]. Sequence Note: This RefSeq represents the non-taster AVI allele which is defined by polymorphic variation at three positions (A49P, A262V, and V296I). Publication Note: This RefSeq record includes a subset of the publications that are available for this gene. Please see the Entrez Gene record to access additional publications. PRIMARY REFSEQ_SPAN PRIMARY_IDENTIFIER PRIMARY_SPAN COMP BC FEATURES Location/Qualifiers source /organism="Homo sapiens" /mol_type="mRNA" /db_xref="taxon:9606" /chromosome="7" /map="7q34" gene /gene="TAS2R38" /gene_synonym="PTC; T2R61" /note="taste receptor, type 2, member 38" /db_xref="GeneID:5726" /db_xref="HGNC:9584" /db_xref="HPRD:09672" /db_xref="MIM:607751" exon /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="alignment:Splign" /number=1 STS /gene="TAS2R38" /gene_synonym="PTC; T2R61" /db_xref="UniSTS:490162" CDS /gene="TAS2R38" /gene_synonym="PTC; T2R61" /codon_start=1 /product="taste receptor, type 2, member 38" /protein_id="NP_ " /db_xref="GI: " /db_xref="CCDS:CCDS " /db_xref="GeneID:5726" /db_xref="HGNC:9584" /db_xref="HPRD:09672" /db_xref="MIM:607751" /translation="MLTLTRIRTVSYEVRSTFLFISVLEFAVGFLTNAFVFLVNFWDV VKRQALSNSDCVLLCLSISRLFLHGLLFLSAIQLTHFQKLSEPLNHSYQAIIMLWMIA NQANLWLAACLSLLYCSKLIRFSHTFLICLASWVSRKISQMLLGIILCSCICTVLCVW CFFSRPHFTVTTVLFMNNNTRLNWQIKDLNLFYSFLFCYLWSVPPFLLFLVSSGMLTV SLGRHMRTMKVYTRNSRDPSLEAHIKALKSLVSFFCFFVISSCVAFISVPLLILWRDK IGVMVCVGIMAACPSGHAAILISGNAKLRRAVMTILLWAQSSLKVRADHKADSRTLC" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" misc_feature /gene="TAS2R38" /gene_synonym="PTC; T2R61" /inference="protein motif:TMHMM:2.0" /note="Region: transmembrane helix" STS /gene="TAS2R38" /gene_synonym="PTC; T2R61" /standard_name="G63092" /db_xref="UniSTS:140067" ORIGIN 1 cctttctgca ctgggtggca accaggtctt tagattagcc aactagagaa gagaagtaga 61 atagccaatt agagaagtga catcatgttg actctaactc gcatccgcac tgtgtcctat 121 gaagtcagga gtacatttct gttcatttca gtcctggagt ttgcagtggg gtttctgacc 181 aatgccttcg ttttcttggt gaatttttgg gatgtagtga agaggcaggc actgagcaac 241 agtgattgtg tgctgctgtg tctcagcatc agccggcttt tcctgcatgg actgctgttc 301 ctgagtgcta tccagcttac ccacttccag aagttgagtg aaccactgaa ccacagctac 361 caagccatca tcatgctatg gatgattgca aaccaagcca acctctggct tgctgcctgc 421 ctcagcctgc tttactgctc caagctcatc cgtttctctc acaccttcct gatctgcttg 481 gcaagctggg tctccaggaa gatctcccag atgctcctgg gtattattct ttgctcctgc 541 atctgcactg tcctctgtgt ttggtgcttt tttagcagac ctcacttcac agtcacaact 601 gtgctattca tgaataacaa tacaaggctc aactggcaga ttaaagatct caatttattt 661 tattcctttc tcttctgcta tctgtggtct gtgcctcctt tcctattgtt tctggtttct 721 tctgggatgc tgactgtctc cctgggaagg cacatgagga caatgaaggt ctataccaga 781 aactctcgtg accccagcct ggaggcccac attaaagccc tcaagtctct tgtctccttt 841 ttctgcttct ttgtgatatc atcctgtgtt gccttcatct ctgtgcccct actgattctg 901 tggcgcgaca aaataggggt gatggtttgt gttgggataa tggcagcttg tccctctggg 961 catgcagcca tcctgatctc aggcaatgcc aagttgagga gagctgtgat gaccattctg 1021 ctctgggctc agagcagcct gaaggtaaga gccgaccaca aggcagattc ccggacactg 1081 tgctgagaat ggacatgaaa tgagctcttc attaatacgc ctgtgagtct tcataaatat 1141 gcc // Homo sapiens bitter taste receptor (TAS2R38) DNA code > RNA code CCTTTCTGCACTGGGTGGCAACCAGGTCTTTAGATTAGCCAACTAGAGAAGAGAAGTA GAATAGCCAATTAGAGAAGTGACATCATGTTGACTCTAACTCGCATCCGCACTGTGTC CTATGAAGTCAGGAGTACATTTCTGTTCATTTCAGTCCTGGAGTTTGCAGTGGGGTTT CTGACCAATGCCTTCGTTTTCTTGGTGAATTTTTGGGATGTAGTGAAGAGGCAGGCAC TGAGCAACAGTGATTGTGTGCTGCTGTGTCTCAGCATCAGCCGGCTTTTCCTGCATG GACTGCTGTTCCTGAGTGCTATCCAGCTTACCCACTTCCAGAAGTTGAGTGAACCACT GAACCACAGCTACCAAGCCATCATCATGCTATGGATGATTGCAAACCAAGCCAACCTC TGGCTTGCTGCCTGCCTCAGCCTGCTTTACTGCTCCAAGCTCATCCGTTTCTCTCACA CCTTCCTGATCTGCTTGGCAAGCTGGGTCTCCAGGAAGATCTCCCAGATGCTCCTGG GTATTATTCTTTGCTCCTGCATCTGCACTGTCCTCTGTGTTTGGTGCTTTTTTAGCAGA CCTCACTTCACAGTCACAACTGTGCTATTCATGAATAACAATACAAGGCTCAACTGGCA GATTAAAGATCTCAATTTATTTTATTCCTTTCTCTTCTGCTATCTGTGGTCTGTGCCTCCT TTCCTATTGTTTCTGGTTTCTTCTGGGATGCTGACTGTCTCCCTGGGAAGGCACATGA GGACAATGAAGGTCTATACCAGAAACTCTCGTGACCCCAGCCTGGAGGCCCACATTA AAGCCCTCAAGTCTCTTGTCTCCTTTTTCTGCTTCTTTGTGATATCATCCTGTGCTGCC TTCATCTCTGTGCCCCTACTGATTCTGTGGCGCGACAAAATAGGGGTGATGGTTTGTG TTGGGATAATGGCAGCTTGTCCCTCTGGGCATGCAGCCATCCTGATCTCAGGCAATGC CAAGTTGAGGAGAGCTGTGATGACCATTCTGCTCTGGGCTCAGAGCAGCCTGAAGGT AAGAGCCGACCACAAGGCAGATTCCCGGACACTGTGCTGAGAATGGACATGAAATGA GCTCTTCATTAATACGCCTGTGAGTCTTCATAAATATGCC

6 6 Differential Gene Expression RNA Sequence (RNA-Seq) gives “snapshot” of genes active in different cells at different times

7 7 Differential Gene Expression RNA Sequence (RNA-Seq) gives “snapshot” of genes active in different cells

8 RNA Sequence (RNA-Seq) Analysis Isolate total RNA; convert to DNA library Design RNA-Seq experiment, i.e., differential expression Sequence experiment and control libraries Analyze sequence data on DNA Subway Green Line Follow-up experimental validation

9 Image source:

10 1) Manage Data: Quality Assessment with FastQC; ~100 Million 75/150 nucleotide reads in < 1hr

11 2) FastX ToolKit: Quality Control with FastX Toolkit; ~100M 75/150 nucleotide reads in <1 hr (some took up to 19 hours…)

12

13 3) TopHat: Aligns ~100 Million 75/150 nucleotide (paired end) reads to a reference genome of 100M–5B in 6–19hr

14

15 TopHat Alignment JBrowse

16 TopHat Alignment JBrowse

17 4) CuffLinks: Assembles transcripts and calculates abundance on BAM files, 1–12GB in 6–19hr

18

19 5) CuffDiff: Merges assemblies from Cufflinks and performs differential expression analysis on 4–9 samples in 6–19 hr

20

21

22 Green Line Queue time vs Run time Asking for a high run time, leads to longer queue times Asking for a short high time may lead to job being terminated Users don't like to wait too long Users want the results right away Finding the right balance is not easy

23 Green Line Dealing w/ the unexpected Systems taken offline Maintenance Network outages, data transfer issues Science API gives glitches Authentication

24 Green Line “ Monitoring XSEDE”

25

26 DNA Subway “Power Desktop” Intuitive interface to support seamless genome “round trip” for eukaryote of choice Access high performance computing to analyze whole genome data (RNA-seq, initially) Scaffold data to sequenced genomes available in iPlant Data Store Directly upload RNA-seq reads as biological evidence for genome annotation using Red Line

27 NSF CCLI Project Retreat June 8–20, 2014, CSHL 11 faculty from PUIs Program included lectures/practical sessions Wet lab: RNA library prep Green Line analysis & bioinformatics Pedagogy/teaching resources Virtual training materials

28 Agnes Ayme-Southgate College of Charleston, SC Flight muscle development during life-stage transitions in Apis melifera (honeybee) Judy Brusslan California State University, Long Beach, CA Leaf development and senescence in Arabidopsis thaliana Raymond Enke James Madison University, VA Retina development in Gallus gallus Shaye Lewis Prairie View A&M University, TX Testes development from juvenile to puberty in caprine (goat) Irina Makarevitch Hamline University, MN Response to cold stress in maize Judith Ogilvie Saint Louis University, MO Retinal changes of mice with retinitis pigmentosa Jeremy Seto New York City College of Technology, CUNY, NY Differentiation of rat pheochromocytoma line cells (PC12) to a neuronal-like phenotype Carrie Thurber Abraham Baldwin Agricultural College, IL Seed abscission in Sorghum bicolor George Ude Bowie State University, MD Floral inflorescence genes in banana/plantains Deirdre Vaden Prairie View A&M University, TX Peripheral blood mononuclear cells from hypertensive rats treated with captopril Scott Woody University of Wisconsin, WI Gibberellic acid exposure in Brassica rapa (Fast Plants) gibberellic acid (gad) mutants NSF CCLI Project Retreat Faculty Participants

29 NSF CCLI Project Retreat Flight muscle development during life-stage transitions in Apis mellifera (honeybee) Agnes Ayme-Southgate, College of Charleston, SC All honeybees begin as worker bees, flying short distances. Some honeybees transition into foragers, flying long distances. This transition necessitates major changes in flight muscles. Goal is to identify the gene expression changes in flight muscles during this transition Courses Biol 322: Developmental Biology, 30–38 students Genetics, 100 students Undergraduate research in lab, 2–3 students

30 NSF CCLI Project Retreat Differential gene expression in Capra hircus (goat) testes during juvenile development Shaye Lewis, Prairie View A&M University, TX Fertility phenotypes show low heritability, and semen analysis parameters cannot determine fertility status. Molecular biomarkers can increase efficiency of artificial insemination and embryo transfer in goats. Goal is to identify genes important for normal testes development and function Courses 4533: Animal Breeding & Genetics, 20 students Undergraduate research in lab, 4 students

31 NSF CCLI Project Retreat Understanding transcriptional response to cold stress in maize Irina Makarevitch, Hamline University, MN Maize is grown worldwide and is astaple for >1 billion people. Maize is thermophilic and sensitive to low temperatures, and understanding how plants respond to cold can improve yields. Goal is to identify genes that are differentially expressed when maize is grown under cold stress Courses Biol 201: Principles of Genetics, 80 students Biol 301: Genomics & Bioinformatics, 20 students Undergraduate research in lab, 4 students

32 NSF CCLI Project Retreat RNA-Seq Datasets Generated and Analyzed Using the Green Line of DNA Subway 8 eukaryotic organisms 21 controls paired with 26 experimental conditions 402 Gbases sequenced 837 jobs submitted to TACC 87% jobs completed 695 hours total CPU time 16 threads/processors running concurrently

33 100 level 200 level 300 level 400 level 500 level Undergrad Research Intro Biology Genetics, 270 Molecular & Cell Biology, 50 Genetics, 220 Molecular Biology, 100 Genomics & Bioinformatics, 70 Developmental Biology, 35 Cell Structure & Function, 30 Synthetic Biology, 30 Anatomy/Physiology, 50 Advanced Genetic Techniques, 15 Cell & Molecular Biology, 75 Genomics, 40 Animal Breeding & Genetics, 20 Independent Research, 5 Molecular Applications in Crop Improvement s Intended Implementation

34 DNA Subway is… Producers Uwe Hilgert David Micklos Jason Williams Designers Eun-Sook Jeong Susan Lauter Programmers Cornel Ghiban Mohammed Khalfan Sheldon McKay Contributors Matt Vaughn Rion Dooley Anthony Biondo Jim Burnette Scott Cain Ed Lee Zhenyuan Lu Advisors Matt Conte Carson Holt Bruce Nash Oscar Pineda-Catalan

35 HPC in Undergraduate Biology Education Banbury Center, CSHL, September 3-5, 2014 Contact Dave Micklos A Great Gatsby era estate on Long Island’s “Gold Coast” Funded by NSF and the Alfred P. Sloan Foundation


Download ppt "DNA Subway Green Line Onramp to HPC in Biology Education Dave Micklos and Uwe Hilgert iPlant Collaborative DNA Learning Center, Cold Spring Harbor Laboratory;"

Similar presentations


Ads by Google