Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of genomes and transcriptomes using ChIP-seq and RNA-seq

Similar presentations


Presentation on theme: "Analysis of genomes and transcriptomes using ChIP-seq and RNA-seq"— Presentation transcript:

1 Analysis of genomes and transcriptomes using ChIP-seq and RNA-seq
Leonardo Mariño-Ramírez, PhD NCBI / NLM / NIH ICGEB – Practical Course "Bioinformatics: Computer Methods in Molecular Biology” June / 2017

2 The central dogma

3 Overview of an RNA–seq experiment
Data generation and analysis Reference-based and De novo transcriptome assembly Popular aligner and assembler software

4

5

6

7

8

9

10 Overview of an ChIP–seq experiment
Data generation and analysis ChIP profiles Peak calling in strand specific profiles

11 Using chromatin immunoprecipitation (ChIP) followed by massively parallel sequencing, the specifi c DNA sites that interact with transcription factors or other chromatin-associated proteins (non-histone ChIP) and sites that correspond to modified nucleosomes (histone ChIP) can be profiled. The ChIP process enriches the crosslinked proteins or modified nucleosomes of interest using an antibody specific to the protein or the histone modification. Purified DNA can be sequenced on any of the next-generation platforms12. The basic concepts are similar for different platforms: common adaptors are ligated to the ChIP DNA and clonally clustered amplicons are generated. The sequencing step involves the enzyme-driven extension of all templates in parallel. After each extension, the fluorescent labels that have been incorporated are detected through high-resolution imaging. On the Illumina Solexa Genome Analyzer (bottom left), clusters of clonal sequences are generated by bridge PCR, and sequencing is performed by sequencing-by-synthesis. On the Roche 454 and Applied Biosystems (ABI) SOLiD platforms (bottom middle), clonal sequencing features are generated by emulsion PCR and amplicons are captured on the surface of micrometre-scale beads. Beads with amplicons are then recovered and immobilized to a planar substrate to be sequenced by pyrosequencing (for the 454 platform) or by DNA ligase-driven synthesis (for the SOLiD platform). On single-molecule sequencing platforms such as the HeliScope by Helicos (bottom right), fluorescent nucleotides incorporated into templates can be imaged at the level of single molecules, which makes clonal amplification unnecessary.

12 a | Examples of the profiles generated by chromatin immunoprecipitation followed by sequencing (ChIP–seq) or by microarray (ChIP–chip). Shown is a section of the binding profiles of the chromodomain protein Chromator, as measured by ChIP–chip (unlogged intensity ratio; blue) and ChIP–seq (tag density; red) in the Drosophila melanogaster S2 cell line. The tag density profile obtained by ChIP–seq reveals specific positions of Chromator binding with higher spatial resolution and sensitivity. The ChIP–seq input DNA (control experiment) tag density is shown in grey for comparison. b | Examples of different types of ChIP–seq tag density profiles in human T cells. Profiles for different types of proteins and histone marks can have different types of features, such as: sharp binding sites, as shown for the insulator binding protein CTCF (CCCTC-binding factor; red); a mixture of shapes, as shown for RNA polymerase II (orange), which has a sharp peak followed by a broad region of enrichment; medium size broad peaks, as shown for histone H3 trimethylated at lysine 36 (H3K36me3; green), which is associated with transcription elongation over the gene; or large domains, as shown for histone H3 trimethylated at lysine 27 (H3K27me3; blue), which is a repressive mark that is indicative of Polycomb-mediated silencing. BPIL2, bactericidal/permeability-increasing protein-like 2; FBXO7, F box only 7; NPC1, Niemann-Pick disease, type C1; Pros35, proteasome 35 kDa subunit; SYN3, synapsin III. Data for part b are from Ref. 25.

13 DNA fragments from a chromatin immunoprecipitation experiment are sequenced from the 5' end.Therefore, the alignment of these tags to the genome results in two peaks (one on each strand) that flank the binding location of the protein or nucleosome of interest. This strand-specific pattern can be used for the optimal detection of enriched regions. To create an approximate distribution of all fragments, each tag location can be extended by an estimated fragment size in the appropriate orientation and the number of fragments can be counted at each position.

14 NCBI Submission Portal

15 The Sequence Read Archive (SRA)
The SRA is an entirely new resource at NCBI. It is being designed specifically meet the challenges presented by massively parallel sequencing technologies. Provide a central repository for next generation sequencing data.

16 The Sequence Read Archive (SRA) Concepts
Study – A study is a set of experiments and has an overall goal. Experiment – An experiment is a consistent set of laboratory operations on input material with an expected result. Sample – An experiment targets one or more samples. Results are expressed in terms of individual samples or bundles of samples as defined by the experiment. Run – Results are called runs. Runs comprise the data gathered for a sample or sample bundle and refer to a defining experiment. Submission – A submission is a package of metadata and/or data objects and a directive for what to do with those objects.

17 The Sequence Read Archive (SRA) Submission process
Create a NCBI PDA account Register a BioProject and receive the BioProject accession PRJNA# Register a BioSample and receive the BioSample accession SAMN# Complete submission metadata on the SRA website. You will receive the FTP information after creating a Run For FTP, use put to transmit the file(s) to the private FTP box. For Aspera, use the ascp program to transfer data files to the private account.

18 The Sequence Read Archive (SRA) Submission process

19 The Sequence Read Archive (SRA) Web access

20 The Sequence Read Archive (SRA) Web access

21 The Sequence Read Archive (SRA) Web access

22 The Sequence Read Archive (SRA) Toolkit

23 dbEST Homo sapiens (human) 8,704,790 Mus musculus + domesticus (mouse)
Organism ESTs Homo sapiens (human) 8,704,790 Mus musculus + domesticus (mouse) 4,853,570 Zea mays (maize) 2,019,137 Sus scrofa (pig) 1,669,337 Bos taurus (cattle) 1,559,495 Arabidopsis thaliana (thale cress) 1,529,700 Organism ESTs Homo sapiens (human) 8,704,790 dbEST release Summary by Organism - 01 January 2013

24 UniGene

25 UniGene - Statistics

26 Gene Expression Omnibus (GEO)

27 Gene Expression Omnibus (GEO)

28 Transcriptome Shotgun Assembly (TSA)
TSA is an archive of computationally assembled sequences from primary data such as ESTs, traces and Next Generation Sequencing Technologies. The overlapping sequence reads from a complete transcriptome are assembled into transcripts by computational methods instead of by traditional cloning and sequencing of cloned cDNAs.

29 A typical transcriptome (454)

30 RNAseq data analysis with Bowtie tophat and cufflinks


Download ppt "Analysis of genomes and transcriptomes using ChIP-seq and RNA-seq"

Similar presentations


Ads by Google