Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.

Slides:



Advertisements
Similar presentations
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Advertisements

Transcriptome Sequencing with Reference
Next-generation sequencing
RNA-seq: the future of transcriptomics ……. ?
Canadian Bioinformatics Workshops
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
Greg Phillips Veterinary Microbiology
Transcriptomics Jim Noonan GENE 760.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Chapter 15 Noncoding RNAs. You Must Know The role of noncoding RNAs in control of cellular functions.
High Throughput Sequencing
mRNA-Seq: methods and applications
Department of Bioinformatics and Computational Biology
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Next generation sequencing platforms Applications
Whole Exome Sequencing for Variant Discovery and Prioritisation
Nuevas perspectivas en análisis genomico: implicaciones del proyecto ENCODE 1 Rory Johnson Bioinformatics and Genomics Centre for Genomic Regulation AEEH.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.
AP Biology Ch. 20 Biotechnology.
Biotechnology SB2.f – Examine the use of DNA technology in forensics, medicine and agriculture.
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Todd J. Treangen, Steven L. Salzberg
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Genomics and High Throughput Sequencing Technologies: Applications Jim Noonan Department of Genetics.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
I519 Introduction to Bioinformatics, Fall, 2012
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
How do you handle huge amounts of information? When looking in an encyclopedia you use an index When biologists search the volumes of the human genome.
Sackler Medical School
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
KEY CONCEPT Biotechnology relies on cutting DNA at specific places.
Introduction to RNAseq
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Geuvadis achievements and contributions Robert Häsler, functional genomics.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Advances in Genetic Technology Class Notes Make sure you study this along with our first PowerPoint on Transgenics and your class Article notes.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
Lecture-5 ChIP-chip and ChIP-seq
11 Gene function: genes in action. Sea in the blood Various kinds of haemoglobin are found in red blood cells. Each kind of haemoglobin consists of four.
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Biochemistry April Lecture DNA Microarrays.
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
? ? Individual 1Individual 2 1. Questions This is a pedigree for a disease involving a mutation within an imprinted gene. The disease manifests only when.
Interpreting exomes and genomes: a beginner’s guide
Biotechnology.
Cancer Genomics Core Lab
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Very important to know the difference between the trees!
mRNA Degradation and Translation Control
The transcript profiles in the three human cell lines based on RNA sequencing (RNA‐seq). The transcript profiles in the three human cell lines based on.
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome

References Cell 2013, 155:27 Cell 2013, 155:39 Annu. Rev. Plant Biol. 2009, 60: 305. Annu. Rev. Genomics Hum. Genet. 2009, 10:135. Curr. Opin. Biotechnology, 24:22. Nat. Biotech. 2009, 25:195. Nat. Methods. 2009, 6:S6. Nat. Rev. Genet. 2009, 10:669. Nat Rev Genet Jan;11(1): Genomics Jun;95(6): This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles.

Background “Method of the year” 2007 by Nature Methods. The name: “Next generation sequencing” “Deep sequencing” “High-throughput sequencing” “Second-generation sequencing” The key characteristics: Massive parallel sequencing amount of data from a single run ~ amount of data from the human genome project The reads are short ~ a few hundred bases / read

Background Potential impact: The “$1000 genome” will become reality very soon Genome sequencing will become a regular medical procedure. Personalized medicine Predictive medicine Ethical issues For statisticians: Data mining using hundreds of thousands of genomes Finding rare SNPs/mutations associated with diseases New methods to analyze epigeomics/transcriptomics data Finding interventions to improve life quality

Background The companies use different techniques. We use Illumina’s as an example. (

Background

An incomplete list of some common platforms. BMC Genomics 2012, 13:341

Background

Advantages: Fast and cost effective. No need to clone DNA fragments. Drawbacks: Short read length (platform dependent) Some platforms have trouble on identical repeats Non-uniform confidence in base calling in reads. Data less reliable near the 3’ end of each read.

Background What deep sequencing can do:

Background Nat Methods Nov;6(11 Suppl):S2-5.

Sequence the genome of a person? --- Alignment Can rely on existing human genome as a blue print. Align the short reads onto the existing human genome. Need a few fold coverage to cover most regions. Sequence a whole new genome? --- Assembly Overlaps are required to construct the genome. The reads are short  need ~30 fold coverage. If 3G data per run, need 30 runs for a new genome similar to human size. Alignment and Assembly

Hash table-based alignment. Similar to BLAST in principle. (1) Find potential locations: (2) Local alignment.

Alignment and Assembly From read to graph:

Alignment and Assembly

de Bruijn graph assembly Red: read error.

Alignment and Assembly de Bruijn graph assembly

Alignment and Assembly de Bruijn graph assembly

Whole gnome/exome/transcriptome sequencing

Genomics Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations) Could be associated with disease: Rare variants (burden testing by collapsing by gene) De novo mutations (need family tree) Rare Mendelian disorders Structural variants in cancer

Medical Genomics Nature Reviews Genetics 11, 415 Example: Extreme-case sequencing to find rare variants associated with a disease.

Medical Genomics Example: Cancer genome

Epigenomics

ChIP-Seq ChIP-Seq. Purpose: analyze which part of the DNA sequence bind to a certain protein. Transcription factor (Regulome) Modified histone (Epigenome)

Overall ChIP-Seq workflow ChIP-Seq

Before deep sequencing, the same information was obtained by using array in the place of sequencing. ChIP-Seq

Different kind of profiles in different applications. Elongation Silencing ChIP-Seq

Example of active gene chromatin pattern found by ChIP-Seq. Initiation site Elongation ChIP-Seq

RNA-Seq

Deep sequencing provides more information about each mRNA RNA-Seq

Finding novel exons. Splicing? (short read could be an issue.) RNA-Seq

Gene expression profiling – to replace arrays? Exon-specific abundance. RNA-Seq

Sequencin small RNA. RNA-Seq

Quantification of miRNA and de novo detection of miRNAs MicroRNA: in length. Regulate gene expression by complementary binding. Derived from non-coding RNAs that form stem-loop structure. RNA-Seq

Directly probe mRNA targets of miRNA. RNA-Seq