Presentation is loading. Please wait.

Presentation is loading. Please wait.

Next Generation Sequencing analysis

Similar presentations


Presentation on theme: "Next Generation Sequencing analysis"— Presentation transcript:

1 Next Generation Sequencing analysis
June 6th, 2017

2 Course instructors Antonio Marco Stuart Newman Vladimir Teif

3 Course plan : Introductory lecture : Lunch : ChIP-seq practical : RNA-seq practical : Integrative analysis

4 1st Generation Sequencing

5 Microarrays Affimetrix microarrays

6 2nd (Next) Generation Sequencing
Illumina MiSeq

7 Microarrays and NGS are used for different purposes

8 NGS METHODS AND THEIR APPLICATIONS
Chromatin domains Hi-C Figure adapted from

9 NGS data types RNA-seq, GRO-seq, CAGE, SAGE, CLIP-seq, Drop-seq
gene expression; non-coding RNA ChIP-seq, MNase-seq, DNase-seq, ATAC-se, etc protein binding; histone modifications chromatin accessibility; nucleosome positioning Bisulfite sequencing (DNA methylation) Hi-C, 3C, 4C, ChIA-PET, etc (Chromatin loops in 3D) Amplicon sequencing targeted regions; philogenomics; metagenomics Whole Genome Sequencing (WGS) de-novo assembly (new species or new analyses) Curated bibliography of NGS methods (~100 methods) can be found at

10

11

12 Where to get NGS data? Do your own experiment
Gene Expression Omnibus (GEO) Sequence read archive (SRA) European Nucleotide Archive The Cancer Genome Atlas (TCGA) Exome Aggregation Consortium (ExAC) You also have to upload your data!

13 How to analyze NGS data? Ask a bioinformatician
you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limits Example of a convenient online tool: Galaxy

14 ChIP-seq experiment workflow
1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA, prepare sequencing library and submit for sequencing Adapted from

15 ChIP-seq analysis workflow

16 NGS output after sequencing: .fastq files (FASTQ format)

17 NGS data after mapping: .bed files (BED format)
Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalW TopHat (for RNA-seq)

18 Data view in genome browsers
Jung et al., NAR 2014 UCSC Genome Browser (online) IGV (install on a local computer)

19 Peak shapes can be different
Park P. J., Nature Genetics, 2009

20 ChIP-seq: reads to peaks/regions
MACS2 (universal) HOMER (universal) CISER (histones ) PeakSeq edgeR CisGenome Park P. J., Nature Genetics, 2009

21 RNA-seq: reads to genes/regions
DESeq, edgeR, Cuffdiff

22 DNA methylation data DMRcaller BISMARK

23 Intersecting genomic regions
BedTools (command line) Galaxy (online)

24 Genomic features are also regions Is ChIP-seq signal enriched there?
Mattout et al., Genome Biology, 2015

25 Let’s look at many similar regions
deepTools 2.0

26 ChIP-seq heat maps for all genes, scaled with respect to their start (TSS) and end (TES)
deepTools 2.0

27 Cluster heatmaps deepTools 2.0

28 Comparing cluster heatmaps between two cell conditions
NucTools

29 Histone modifications around TSS

30 NGS data integration

31 Different datasets in several tracks of a genome browser
5mC Gifford et.al., Cell 2013

32 Heat maps again: Signal from data 1 around regions in data 2
Here: Nucleosome occupancy around bound CTCF in mouse stem cells Vainshtein et.al., BMC Genomics 2017

33 Correlation analysis: any 2 datasets can be correlated

34 Correlation of regulatory protein binding with gene expression
Pavlaki et al., 2016

35 Gene ontology (GO) analysis
Calo et al. (2015) Nature 518, 249–253 DAVID, Gorilla, GREAT, EnrichR

36 Motif enrichment analysis
HOMER, MEME Pavlaki et al., 2016

37 Motif enrichment analysis
MEME-ChIP

38 Summary of typical analyses:
Differential peak calling Differential gene expression Intersection of different signals Correlation of different signals Motif sequence analysis Gene Ontology analysis

39 Questions?

40 Computer cluster and Linux
NGS data are stored in very large text files NGS analysis is usually performed on a computer cluster using Linux. Why Linux? Because it is free, open-source, and very stable. Plus historic reasons. Linux likes working with large text files :)

41 WinSCP: Windows file manager

42 WinSCP: Windows file manager
genome.essex.ac.uk

43 WinSCP: Windows file manager

44 Putty: Linux command line

45 Putty: Linux command line
genome.essex.ac.uk

46 Putty: Linux command line

47 Putty: Linux command line

48 Learning Linux in 5 minutes
There are two options for your work in Linux: Type your commands one by one in Putty Write all commands in a file called “bash file”, then execute this file, and all your commands written there will be executed We have prepared your bash files, you will just need to execute them

49 5 Linux commands you need
cd DirectoryName – change directory less FileName – read file FileName qsub FileName – execute bash file qstat – check progress of all users wc FileName – count lines in FileName

50 Useful shortcuts To copy/paste from Windows to Putty:
Copy [CTRL]+[C], then right-click in Putty to paste it Anywhere in Command Line in Putty: [up], [down] keys - scrolls through command history Auto completion of file/directory names: <something-incomplete> [TAB]   When specifying directory name: ".." (dot dot)          - refers to the parent directory "~" (Tilda) or "~/" - refers to the home directory

51 Additional Linux hints
All commands, usernames, passwords, file & directory names in Linux are case sensitive. File paths (locations of files) use “/”, not “\”, e.g. /storage/projects/”. Avoid using spaces in filenames

52 Questions?


Download ppt "Next Generation Sequencing analysis"

Similar presentations


Ads by Google