Download presentation
Presentation is loading. Please wait.
Published byCynthia Lane Modified over 8 years ago
1
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome
2
References Cell 2013, 155:27 Cell 2013, 155:39 Annu. Rev. Plant Biol. 2009, 60: 305. Annu. Rev. Genomics Hum. Genet. 2009, 10:135. Curr. Opin. Biotechnology, 24:22. Nat. Biotech. 2009, 25:195. Nat. Methods. 2009, 6:S6. Nat. Rev. Genet. 2009, 10:669. Nat Rev Genet. 2010 Jan;11(1):31-46. Genomics. 2010 Jun;95(6):315-27. This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles.
3
Background “Method of the year” 2007 by Nature Methods. The name: “Next generation sequencing” “Deep sequencing” “High-throughput sequencing” “Second-generation sequencing” The key characteristics: Massive parallel sequencing amount of data from a single run ~ amount of data from the human genome project The reads are short ~ a few hundred bases / read
4
Background Potential impact: The “$1000 genome” will become reality very soon Genome sequencing will become a regular medical procedure. Personalized medicine Predictive medicine Ethical issues For statisticians: Data mining using hundreds of thousands of genomes Finding rare SNPs/mutations associated with diseases New methods to analyze epigeomics/transcriptomics data Finding interventions to improve life quality
5
Background The companies use different techniques. We use Illumina’s as an example. (http://seqanswers.com/forums/showthread.php?t=21)http://seqanswers.com/forums/showthread.php?t=21
6
Background
9
An incomplete list of some common platforms. BMC Genomics 2012, 13:341
10
Background
11
Advantages: Fast and cost effective. No need to clone DNA fragments. Drawbacks: Short read length (platform dependent) Some platforms have trouble on identical repeats Non-uniform confidence in base calling in reads. Data less reliable near the 3’ end of each read.
12
Background What deep sequencing can do:
13
Background Nat Methods. 2009 Nov;6(11 Suppl):S2-5.
14
Sequence the genome of a person? --- Alignment Can rely on existing human genome as a blue print. Align the short reads onto the existing human genome. Need a few fold coverage to cover most regions. Sequence a whole new genome? --- Assembly Overlaps are required to construct the genome. The reads are short need ~30 fold coverage. If 3G data per run, need 30 runs for a new genome similar to human size. Alignment and Assembly
15
Hash table-based alignment. Similar to BLAST in principle. (1) Find potential locations: (2) Local alignment.
16
Alignment and Assembly From read to graph:
17
Alignment and Assembly
18
de Bruijn graph assembly Red: read error.
19
Alignment and Assembly de Bruijn graph assembly
20
Alignment and Assembly de Bruijn graph assembly
21
Whole gnome/exome/transcriptome sequencing
22
Genomics Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations) Could be associated with disease: Rare variants (burden testing by collapsing by gene) De novo mutations (need family tree) Rare Mendelian disorders Structural variants in cancer
23
Medical Genomics Nature Reviews Genetics 11, 415 Example: Extreme-case sequencing to find rare variants associated with a disease.
24
Medical Genomics Example: Cancer genome
25
Epigenomics http://www.roadmapepigenomics.org/
26
ChIP-Seq ChIP-Seq. Purpose: analyze which part of the DNA sequence bind to a certain protein. Transcription factor (Regulome) Modified histone (Epigenome)
27
Overall ChIP-Seq workflow ChIP-Seq
28
Before deep sequencing, the same information was obtained by using array in the place of sequencing. ChIP-Seq
30
Different kind of profiles in different applications. Elongation Silencing ChIP-Seq
31
Example of active gene chromatin pattern found by ChIP-Seq. Initiation site Elongation ChIP-Seq
32
RNA-Seq
34
Deep sequencing provides more information about each mRNA RNA-Seq
35
Finding novel exons. Splicing? (short read could be an issue.) RNA-Seq
36
Gene expression profiling – to replace arrays? Exon-specific abundance. RNA-Seq
37
Sequencin small RNA. RNA-Seq
38
Quantification of miRNA and de novo detection of miRNAs MicroRNA: 21-23 in length. Regulate gene expression by complementary binding. Derived from non-coding RNAs that form stem-loop structure. RNA-Seq
39
Directly probe mRNA targets of miRNA. RNA-Seq
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.