Next Generation Sequencing analysis

Slides:



Advertisements
Similar presentations
Before we start Login to the laptop: user: crgcomu Password: crgcomu Login to the network: Wifi: carretwifi Password : Login to galaxy (ldap):
Advertisements

Methods to read out regulatory functions
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
NGS data analysis CCM Seminar series Michael Liang:
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
Sackler Medical School
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Introduction to RNAseq
 CHANGE!! MGL Users Group meetings will now be on the 1 st Monday of each month 3:00-4:00 Room Note the change of time and room.
Overview of ENCODE Elements
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.
Biol 456/656 Molecular Epigenetics Lecture #5 Wed. Sept 2, 2015.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
STAT115 STAT225 BIST512 BIO298 - Intro to Computational Biology.
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
HOMER – a one stop shop for ChIP-Seq analysis
Using Galaxy to build and run data processing pipelines Jelle Scholtalbers / Charles Girardot GBCS Genome Biology Computational Support.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Centralizing Bioinformatics Services: Analysis Pipelines, Opportunities, and Challenges with Large- scale –Omics, and other BigData High-Performance Computing.
Additional high-throughput sequencing techniques (finding all functional elements of genome) June 15, 2017.
Introductory RNA-seq Transcriptome Profiling
Short Read Sequencing Analysis Workshop
Placental Bioinformatics
Epigenetics Continued
Outline of the chromatin immunoprecipitation (ChIP) technique
Cancer Genomics Core Lab
Monica Britton, Ph.D. Sr. Bioinformatics Analyst June 2016 Workshop
NGS Analysis Using Galaxy
Regulatory Genomics Lab
Short Read Sequencing Analysis Workshop
Chip – Seq Peak Calling in Galaxy
GE3M25: Data Analysis, Class 4
ChIP-Seq Analysis – Using CLCGenomics Workbench
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Sequencing Data Analysis
BS222 – Genome Science Lecture 8
ChIP-seq analysis 2/28/2018.
Taichi Umeyama, Takashi Ito  Cell Reports 
Epigenetics System Biology Workshop: Introduction
Yating Liu July 2018 G-OnRamp workshop
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
ChIP-seq Robert J. Trumbly
Volume 17, Issue 6, Pages (November 2016)
Genome-wide analysis of p53 occupancy.
Regulatory Genomics Lab
Volume 63, Issue 6, Pages (September 2016)
Volume 132, Issue 2, Pages (January 2008)
Introduction to RNA-Seq & Transcriptome Analysis
Eukaryotic genomes are complex 3D structures comprised of modified and unmodified DNA, RNA and many types of interacting proteins Most DNA is wrapped around.
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
High Sensitivity Profiling of Chromatin Structure by MNase-SSP
Taichi Umeyama, Takashi Ito  Cell Reports 
Chromatin basics & ChIP-seq analysis
Sequencing Data Analysis

DNA methylation and TF binding: Integrative data analysis
Presentation transcript:

Next Generation Sequencing analysis June 6th, 2017

Course instructors Antonio Marco Stuart Newman Vladimir Teif

Course plan 11.00-12.00: Introductory lecture 12.00-12.30: Lunch 12.30-14.00: ChIP-seq practical 14.15-16.00: RNA-seq practical 16.15-18.00: Integrative analysis

1st Generation Sequencing

Microarrays Affimetrix microarrays

2nd (Next) Generation Sequencing Illumina MiSeq

Microarrays and NGS are used for different purposes http://www.genengnews.com/Contributor/ShawnCBakerPhD/5687/

NGS METHODS AND THEIR APPLICATIONS Chromatin domains Hi-C Figure adapted from http://www.scienceinschool.org

NGS data types RNA-seq, GRO-seq, CAGE, SAGE, CLIP-seq, Drop-seq gene expression; non-coding RNA ChIP-seq, MNase-seq, DNase-seq, ATAC-se, etc protein binding; histone modifications chromatin accessibility; nucleosome positioning Bisulfite sequencing (DNA methylation) Hi-C, 3C, 4C, ChIA-PET, etc (Chromatin loops in 3D) Amplicon sequencing targeted regions; philogenomics; metagenomics Whole Genome Sequencing (WGS) de-novo assembly (new species or new analyses) Curated bibliography of NGS methods (~100 methods) can be found at https://liorpachter.wordpress.com/seq/

Where to get NGS data? Do your own experiment Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo Sequence read archive (SRA) https://www.ncbi.nlm.nih.gov/sra European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA) https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/ You also have to upload your data!

How to analyze NGS data? Ask a bioinformatician you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limits Example of a convenient online tool: Galaxy http://galaxy.essex.ac.uk/

ChIP-seq experiment workflow 1. Crosslink Protein-DNA complexes in situ 2. Isolate nuclei and fragment DNA (sonication or digestion) 3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks 4. Release DNA, prepare sequencing library and submit for sequencing Adapted from www.VisiScience.com

ChIP-seq analysis workflow www.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png

NGS output after sequencing: .fastq files (FASTQ format)

NGS data after mapping: .bed files (BED format) Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalW TopHat (for RNA-seq)

Data view in genome browsers Jung et al., NAR 2014 UCSC Genome Browser (online) IGV (install on a local computer)

Peak shapes can be different Park P. J., Nature Genetics, 2009

ChIP-seq: reads to peaks/regions MACS2 (universal) HOMER (universal) CISER (histones ) PeakSeq edgeR CisGenome Park P. J., Nature Genetics, 2009

RNA-seq: reads to genes/regions DESeq, edgeR, Cuffdiff

DNA methylation data DMRcaller BISMARK

Intersecting genomic regions BedTools (command line) Galaxy (online)

Genomic features are also regions Is ChIP-seq signal enriched there? Mattout et al., Genome Biology, 2015

Let’s look at many similar regions deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations

ChIP-seq heat maps for all genes, scaled with respect to their start (TSS) and end (TES) deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations

Cluster heatmaps deepTools 2.0 https://github.com/fidelram/deepTools/wiki/Visualizations

Comparing cluster heatmaps between two cell conditions NucTools https://homeveg.github.io/nuctools/

Histone modifications around TSS http://www.ie-freiburg.mpg.de/bioinformaticsfac

NGS data integration http://determinedtosee.com/wp-content/uploads/2014/08/jigsaw-puzzle.jpg

Different datasets in several tracks of a genome browser 5mC Gifford et.al., Cell 2013

Heat maps again: Signal from data 1 around regions in data 2 Here: Nucleosome occupancy around bound CTCF in mouse stem cells Vainshtein et.al., BMC Genomics 2017

Correlation analysis: any 2 datasets can be correlated http://homer.salk.edu/homer/ngs/quantification.html

Correlation of regulatory protein binding with gene expression Pavlaki et al., 2016

Gene ontology (GO) analysis Calo et al. (2015) Nature 518, 249–253 DAVID, Gorilla, GREAT, EnrichR

Motif enrichment analysis HOMER, MEME Pavlaki et al., 2016

Motif enrichment analysis MEME-ChIP

Summary of typical analyses: Differential peak calling Differential gene expression Intersection of different signals Correlation of different signals Motif sequence analysis Gene Ontology analysis

Questions?

Computer cluster and Linux NGS data are stored in very large text files NGS analysis is usually performed on a computer cluster using Linux. Why Linux? Because it is free, open-source, and very stable. Plus historic reasons. Linux likes working with large text files :)

WinSCP: Windows file manager

WinSCP: Windows file manager genome.essex.ac.uk

WinSCP: Windows file manager

Putty: Linux command line

Putty: Linux command line genome.essex.ac.uk

Putty: Linux command line

Putty: Linux command line

Learning Linux in 5 minutes There are two options for your work in Linux: Type your commands one by one in Putty Write all commands in a file called “bash file”, then execute this file, and all your commands written there will be executed We have prepared your bash files, you will just need to execute them

5 Linux commands you need cd DirectoryName – change directory less FileName – read file FileName qsub FileName – execute bash file qstat – check progress of all users wc FileName – count lines in FileName

Useful shortcuts To copy/paste from Windows to Putty: Copy [CTRL]+[C], then right-click in Putty to paste it Anywhere in Command Line in Putty: [up], [down] keys - scrolls through command history Auto completion of file/directory names: <something-incomplete> [TAB]   When specifying directory name: ".." (dot dot)          - refers to the parent directory "~" (Tilda) or "~/" - refers to the home directory

Additional Linux hints All commands, usernames, passwords, file & directory names in Linux are case sensitive. File paths (locations of files) use “/”, not “\”, e.g. /storage/projects/”. Avoid using spaces in filenames

Questions?