Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics

Slides:



Advertisements
Similar presentations
Introduction to genomes & genome browsers
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Chromatin Immuno-precipitation (CHIP)-chip Analysis
Differentially expressed genes
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Transcription & Translation Biology 6(C). Learning Objectives Describe how DNA is used to make protein Explain process of transcription Explain process.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Chromosomes carry genetic information
Central Dogma of Biology
Introduction to gene expression Seema Zargar. Lecture outline Introduction to all terms used in Gene expression.
SNPs, Inheritance, and the Evolution of Lactose Tolerance
DNA’s Function. DNA DNA = deoxyribonucleic acid. DNA carries the genetic information in the cell – i.e. it carries the instructions for making all the.
From DNA to Protein Chapter DNA, RNA, and Gene Expression  What is genetic information and how does a cell use it?
Gene Expression and Gene Regulation. The Link between Genes and Proteins At the beginning of the 20 th century, Garrod proposed: – Genetic disorders such.
A little about how DNA works David Sloane, MD Special Studies, HGSE Brigham and Women’s Hospital Harvard Medical School 2/10/2014David.
DNA STRUCTURE TOPICS 3.3 & 7.1. Assessment Statements Outline DNA nucleotide structure in terms of sugar (deoxyribose), base and phosphate
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
What is this DNA you speak of?  DNA stands for deoxyribonucleic acid - Found in nucleus of eukaryotic cells - Found in cytoplasm of protists.
KEY CONCEPT DNA structure is the same in all organisms.
RNA and Transcription DNA RNA PROTEIN. RNA and Transcription.
DNA (deoxyribonucleic acid) consists of three components.
Deoxyribonucleic Acid
DNA & GENETICS. There are four kinds of bases in DNA: adenine guanine cytosine thymine.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Introduction to Gene Expression
What is central dogma? From DNA to Protein
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Chapter 12 DNA, RNA, Gene function, Gene regulation, and Biotechnology.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
DNA in the Cell Stored in Number of Chromosomes (24 in Human Genome) Tightly coiled threads of DNA and Associated Proteins: Chromatin 3 billion bp in Human.
Protein Synthesis Review By PresenterMedia.com PresenterMedia.com.
Genetics Jeopardy Terms Central Dogma MutationsStructuresMolecular.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Replication, Transcription and Translation. Griffith’s Experiment.
DNA (Deoxyribonucleic Acid) : Structure and Function.
DNA- Deoxyribonucleic acid Each nucleotide of DNA is composed of a phosphate group, a sugar called deoxyribose and a molecule that is called a nitrogenous.
Lesson 3 – Gene Expression
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Chapter 12 DNA and RNA.
Introduction to molecular biology Data Mining Techniques.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
DNA and RNA Structure of DNA Chromosomes and Replication Transcription and Translation Mutation and Gene Regulation.
8.2 KEY CONCEPT DNA structure is the same in all organisms.
Genetics Heredity Genes DNA Chapter Introduction.
DNA and Protein Synthesis
Chapter Eight: From DNA to Proteins
DNA: The Molecule of Heredity
Molecular Genetics Transcription & Translation
Protein Synthesis Human Biology.
1st lesson Medical students Medical Biology Molecular Biology
Transcription.
M.B.Ch.B, MSC, DCH (UK), MRCPCH
DNA! spooled gene chromosomes chromatin double helix.
DNA Structure 2.6 & 7.1.
Pharmacogenetics and Pharmacoepidemiology
Transcription and Translation
Transcription.
What is Life Three kingdoms The Cell thoery Central Dogma Genetic code
What is RNA? Do Now: What is RNA made of?
RNA and Transcription DNA RNA PROTEIN.
Pharmacogenetics and Pharmacoepidemiology
Deoxyribonucleic Acid (DNA)
Nucleic Acids Store and transfer genetic information
What is Life Three kingdoms The Cell thoery Central Dogma Genetic code
DNA, RNA, & Proteins Vocab review
The Structure and Function of DNA
Presentation transcript:

Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics

Outline of today’s lecture Introduction to genome and genomics Topics and tools Relevance of statistics

DNA DNAs (Deoxyribonucleic acids) are molecules to store genetic information of a living organism. DNA consists of two polymers made from four types of nucleotides: adenine (A) guanine (G), cytosine (C) and thymine (T). Purines: A, G; Pyrimidines: C, T Two polymers are complementary to each other and from a double-helix structure 5’-ACCGTTCGACGGTAA-3’ ||||||||||||||| 3’-TGGCAAGCTGCCATT-5’

Chromosome

Genome TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCT CTCACACCTGACATGAAAAGGCACATGAGGATCCTCAAATACCCCGTGATCAGTCTCAGGGTAGCTCTCATAGCCTGGACA GGGCCCCCCTCGGGGGTTGCGCCCAGGTCCAGGCGGGGGATGCACAGCAACAGTCACCGAAGCAGAAGCCGTCACAGTGGT GATGGGCTGGCAGTAGCTGGGCACAGAGCTGCCCATGGCGGTGGACGTTGGGTTCCGAGGGTTGTGAGAACGGGCCCCACG GGGCCCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTCAAAAGCGTCTCTGCGCGGTCTGTAGGG GGGTGGCCGCAAGCCTTCTCTAGGGGGATCCCTTCGAGGCTGCTGGCCTTGCCGTCCAGGGGACAAGGAGCCAGAGTCCAG GTGGGGCTGTTGCCGAGGGGTCAAGGGAGGCTGATGTCTGGAGTCCGGATGGACCACCTGCAGAGGAGAGACATAGGTCAA CACAGGGAGGTAGGATGGTGGTGATGTTCCACCCACAAAAGAAAACCTATTCCTTTAGAAACCTCCAGGATGTGAATCCTG CCTGCACCTGCACAGCTGGCTGGAGGCATATAGCCACTGCCCATAGATCTCAACTTACCCTCACAACCAACTGCCCCCAGG CCTAAGTTCTCTGCCTCAAAACTGCCAAGGCCTGGATAGCCAAGAGCCTGGGTGTCTTGGAAATATGCAACCATAAATAGT AGCTTTTAGAAGTATAAGGCTCCTGTTTCTGGGTCATATTAGTGTTGTTTTCACCTGTCCCCAGCCCTAAGCCAGGTGTGG CCAGAAGCAAATGTACTGTAAGAGCAGAGCAAAAACTTCCACACAGATAGTTCTGTTAGGCAATACATCTCTGCCTGACTA TTAGGAATCTGGTTTCTGGGTCCTCTGTACAAAGCTCGGAGCAACACAGTGGCCACATCAATCAAAAGGACCGTGACCAAC TTCAAAGTCGGTGAGCTTGTACCTATTTTTAGGCTCCTGCTGAACAGAACCAGATTCACACTACAGCTCAGCAGGGCATCG TCACGGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGGGGGGGGGGGGTGGACAGAGGACGGGGACACAATT CACTGGCCAGCCCTTCTCTCCTTCAAGGAAGGCTGCTCTAGCCTGGGACTGGAATACACATTTCCTGTAAACATGGTGGGG GCCTCAGGCAAGCCAGAGTTTTGGAGCCTTCCTTAACTCTTCAAGGTGAGCATCTTGACTTGGAGGGTGGGGGTGCGGGTA AGGAAGGAACCTGTGGACTCCTCCCTACAAGACAGAAAAGGAATAAGCCACGAAGACAATAACGATTTTTGTATCAAGCGT CCTCTCCCATTTCAGCTTACCTGACAATGAAATCAAATTCGGACCCTGCAAGCATCAGTACACCCAGCAGAGTGGACACAG CACCGTCCAGAACGGGAGCAAACATGTGCTCCAGAGCGAGCATAGCCCTGTGGTTCTTGTCCCCAATGGCTGTCAGAAAGG CCTGAACAAAGGAGAAAATTGACACGGTCACATTCTGGGTGTGGTAAAGTGCTCAGCTGTGTCTATACTTGGGTTTTGTAT … Total amount of DNA in human genome: 3 * 10 9 base pairs (bp)

Gene

Central Dogma Gene expression

Topic 1: gene expression and microarray Expression A B C A B C A B C No Expression X Y Z X Y Z X Y Z X Y Z Temporally Spatially

Microarray probe cDNA sample

Microarray data

Topic 2: transcriptional regulation TF1TF2 Transcription factors (TF): Transcription factor binding sites (TFBS):CCACCCAC, TAATAAAAT TF1 TF2 TTATGTAACCTGCACTTACTACCACCCACAACATAATAAAATCTAAACCACTGAATGAAATACAAAATCTATGTATGA... TF1 TF2 TTATGTAACCTGCACTTACTACCACCCACAACATAATAAAATCTAAACCACTGAATGAAATACAAAATCTATGTATGA...

Transcription factor binding motif GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC ATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG TF TGGGTGGTC TGGGTGGTA TGGGAGGTC TGGGTGGTG TGAGTGGTC TGGGTGGTC A C G T A C G T Motif Transcription Factor Binding Sites (TFBS)

Motifs are regulatory codes in the genome TCAGTTGGAGCTGCTCCCCCACGGCCTCTCCTCACATTCCACGTCCTGTAGCTCTATGACCTCCACCTTTGAGTCCCTCCT CTCACACCACCCATGTTTTGTTTATGAGGATCCTCAAATACCCCGTGATCAGTCTCAGGGTAGCTCTCATAGCCTGGACAG GGCCCCCCTCGGGGGTTGCGCCCAGGTCCAGGCGGGGGATGCACAGCAACAGTCACCGAAGCAGAAGCCGTCACAGTGGTG ATGGGCTGGCAGTAGCTGGGCACAGAGCTGCCCATGGCGGTGGACGTTGGGTTCCGAGGGTTGTGAGAACGGGCCCCACGG GGCCCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTCAAAAGCGTCTCTGCGCGGTCTGTAGGGG GGTGGCCGCAAGCCTTCTCTAGGGGGATCCCTTCGTTGCTGCTGGCCTTGCCGTCCAGGGGACAAGGAGCCAGAGTCCAGG TGGGGCTGTTGCCGAGGGGTCAAGGGAGGCTGATGTCTGGAGTCCGGATGGACCACCTGCAGAGGAGAGACATAGGTCAAC ACAGGGAGGTAGGATGGTGGTGATGTTCCACCCACAAAAGAAAACCTATTCCTTTAGAAACCTCCAGGATGTGAATCCTGC CTGCACCTGCACAGCTGGCTGGAGGCATATAGCCACTGCCCATAGATCTCAACTTACCCTCACAACCAACTGCCCCCAGGC CTAAGTTCTCTGCCTCAAAACTGCCAAGGCCTGGATAGCCAAGAGCCTGGGTGTCTTGGAAATATGCAACCATAAATAGTA GCTTTTAGAAGTATAAGGCTCCTGTTTCTGGGTCATATTAGTTTTGTTTTCACCTGTCCCCACCCATAAGCCAGGTGTGGC CAGAAGCAAATGTACTGTAAGAGCAGAGCAAAAACTTCCACACAGATAGTTCTGTTAGGCAATACATCTCTGCCTGACTAT TAGGAATCTGGTTTCTGGGTCCTCTGTACAAAGCTCGGAGCAACACAGTGGCCACATCAATCAAAAGGACCGTGACCAACT TCAAAGTCGGTGAGCTTGTACCTATTTTTAGGCTCCTGCTGAACAGAACCAGATTCACACTACAGCTCAGCAGGGCATCGT CACGGGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTGGGGGGGGGGGGTGGACAGAGGACGGGGACACAATTC ACTGGCCAGCCCTTCTCTCCTTCAAGGAAGGCTGCTCTAGCCTGGGACTGGAATACACATTTCCTGTAAACATGGTGGGGG CCTCAGGCAAGCCAGAGTTTTGGAGCCTTCCTTAACTCTTCAAGGTGAGCATCTTGACTTGGAGGGTGGGGGTGCGGGTAA GGAAGGAACCTGTGGACTCCACCCAACAAGACAGAAAAGGAATAAGCCACGAAGACAATAACGATTTTTGTATCAAGCGTC CTCTCCCATTTCAGCTTACCTGACAATGAAATCAAATTCGGACCCTGCAAGCATCAGTACACCCAGCAGAGTGGACACAGC ACCGTCCAGAACGGGAGCAAACATGTGCTCCAGAGCGAGCATAGCCCTGTGGTTCTTGTCCCCAATGGCTGTCAGAAAGGC CTGAACAAAGGAGAAAATTGACACGGTCACATTCTGGGTGTGGTAAAGTGCTCAGCTGTGTCTATACTTGGGTTTTGTAT Transcription Factor Binding Sites (TFBS) Gene

Gene regulatory network Transcription factors Other genes Activation Repression Other Interactions MisregulationDiseases Gene1 TACTACCACCCACAACATAATAAAATCTAA TF1TF2 Gene2 TTAATAAAATACCACCCACAACCTAAGGAT TF1 TF2 TF3 Gene3

Motif discovery and decoding regulatory programs in the genome DictionaryHuman Language guesswhatthestoryisaslongasyouknowthela nguageitshouldbeprettyeasy Guess what the story is. As long as you know the language, it should be pretty easy. Know Guess Be … Dictionary Genomic Language GGCCCTGAGCGGTCCCTATTGCTGGGTGGTCAATGCCCTTCATCTGAAATTTC AAAAGCGTCTCTGCGCGGTCTGTAGGGGGGTGGCCGCAAGCCTTCTCTAGGGG GGCCCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTC GGCCCTGAGCGGTCCCTATTGCTGGGTGGTCAATGCCCTTCATCTGGAATTTC AAAAGCGTCTCTGCGCGGTCTGTAGGGGGGTGGCCGCAAGCCTTCTCTAGGGG GGCCCTGAGCGGTCCCTATTGCTAGGGCCAGAATGCCCTTCAGTAGAAATTTC step1 step2 step1 step2

Finding motifs from co-regulated genes (Roth et al., 1998; Hughes et al., 2000; etc.) GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA Gene 1 Gene 2 Gene 3 … Gene N Condition1 Condition2 GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA Gene1 Gene2 Gene3

Motif discovery is difficult in mammalian genomes due to a low signal-to-noise ratio Gene1 100~1000 bp Gene2 100~1000 bp Gene3 100~1000 bp Gene1 10k~1000k bp Gene2 10k~1000k bp Gene3 10k~1000k bp yeast human

Topic 3: ChIP-chip and tiling array IPNo IP 500~2000 bp long ChIP-chip (Chromatin ImmunoPrecipitation coupled with Microarray)

ChIP-chip on tiling arrays IP IP CT CT IP CT 500~2000 bp long Probe: 25~60 bp long 35~300 bp spacing

A combined approach to study gene regulation ChIP-chip 500~2000bp 6~30bp Sequence Analysis GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC

Topic 4: alternative splicing and exon array splicing gene exon intron promoter transcription start site (TSS)

Alternative splicing Isoform 1 Isoform 2 Isoform 3 exon 1exon 2exon 3 exon 4exon 5

Exon array

Topic 5: single nucleotide polymorphism and SNP array SNPs: occur every 100 to 1000 bp make up 90% of genetic variations minor allele frequency >= 1% (otherwise we call them mutations)

SNP array ACCGTGGA[C/T]CTGAACCG |||||||| | |||||||| TGGCACCT[G/A]GACTTGGC ACCGTGGA[G]CTGAACCG ACCGTGGA[C]CTGAACCG ACCGTGGA[T]CTGAACCG ACCGTGGA[A]CTGAACCG What will happen when the genotype is CC? CT? TT? Applications: 1. Genotyping & genome-wide association study 2. Copy number variations and loss of heterozygosity 3. Allele specific expression …

Topic 6: next-generation sequencing Traditional sequencing

Next-generation sequencing Prepare genomic DNA  Attach DNA to surface  Bridge amplification  Fragement become double stranded  Denature the double stranded molecules  Complete amplification  Determine first base  Image first base  Determine second base  Image second base  Sequence reads over multiple cycles  Align data. >50 milliion clusters/flow cell, each 1000 copies of the same template, 1 billion bases per run, 1% of the cost of capillary-based method. (From:

Array vs. next-generation sequencing

Microarray, Exon array  RNA-seq ChIP-chip  ChIP-seq SNP array  SNP/mutation detection by sequencing …  …

Other topics Epigenomics Transposon miRNA

Relevance of statistics GenomicsStatistics Need new statistical theories and tools Guide development of efficient data analysis strategies

Example 1: differential gene expression

Example 1: multiple testing Gene i=1 i=2 i=3 … i=I t-statistic … -0.5 p-value … 0.56 Bonferroni adjustment Rejections … Bonferroni adjustment too stringent Multiplicity needs to be adjusted in order to determine statistical significance False discovery rate

False discovery rate (FDR, Benjamini & Hochberg, 1995) AcceptRejectTotal True H 0 UVm0m0 True H 1 TSm-m 0 m-RRm FDR = E(V/R) = Pr(R>0)E(V/R|R>0) FWER = Pr(V ≥ 1) False discovery rate (FDR)

Pooling information … Test Sample Variance ( df ) … I … Variance Estimates … Modified t-statistics Multiplicity caused some problem in controlling type I errors, but it can be used to improve statistical power! A common distribution

Example 2: motif discovery 00 Θ S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA A: Motif:Background: A C G T A C G T A C G T  A Inference by iterative estimation/sampling (Gibbs sampler) f (A,Θ | S) Marginalization: f (A | S) = ∫ f (A, Θ | S) dΘ