Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Slides:



Advertisements
Similar presentations
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Advertisements

CSS Central: Central Management Utility Screen View Samples Next.
The Maize Inflorescence Project Website Tutorial Nov 7, 2014.
CPSC 203 Introduction to Computers Tutorial 59 & 64 By Jie (Jeff) Gao.
Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.
Creating and Publishing Your own website
What is so good about Archie and RevMan 5
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
What is Blast What/Why Standalone Blast Locating/Downloading Blast Using Blast You need: Your sequence to Blast and the database to search against.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
NGS Analysis Using Galaxy
Retrieving Data Guide Rebecca and John Moores UCSD Cancer Center DNA Sequencing Shared Resource.
Digital Logic and State Machine Design Installing Xilinx WebPACK 12.4 CS 2204 Digital Hardware.
The file server model July 14, 2011 © 2011 PC3.org The File Server Model Presented for the PCCC Program Notes We recommend that you print these slides.
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
Introduction to RNA-Seq and Transcriptome Analysis
Li and Dewey BMC Bioinformatics 2011, 12:323
How to Download and Install a Sharp Print Driver on a Mac.
Using Dreamweaver. Slide 1 Dreamweaver has 2 screens that do different things The Document window where you create your WebPages The Site window where.
November 2014Prepared by the Computer Lab Montgomery County-Norristown Public Library.
Accessing Barney Off- Campus How can I get my H: files when I am not on the GU network? Business 111 Edward Mitchell Fall 2006.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
Another Method to Open WebSpace as a Web Folder Alternative Method for Creating Web Folder in WebSpace, Slide 1Copyright © 2004, Jim Schwab, University.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
Information Security 493. Lab 11.3: Encrypt a Windows File Windows operating systems since Windows 2000 have included the ability to encrypt files. Follow.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
CPSC 203 Introduction to Computers Lab 23 By Jie Gao.
RNAseq analyses -- methods
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Creating and Publishing Your own web site PC Version SEAS 001 Professor Ahmadi.
1 ITI 1120 Lab # 1 An Introduction to the Lab Environment Contributors: G. Arbez, M. Eid, D. Inkpen, A. Williams, D. Amyot.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Introduction to Systems Programming (CS 0449) PalmOS Tools: Developer Studio & Cygwin.
Booting Ubuntu Linux Live CSCI 130 – Fall 2008 Action Lab Dr. W. Jones.
Computing Fundamentals Module Lesson 3 — Changing Settings and Customizing the Desktop Computer Literacy BASICS.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Tour Overview Introduction Collage Basics Collage Basics (Templates and Tools) Computer Configuration Bookmark Collage Getting Started Tour Collage Terminology.
IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
A Genomics View of Unix. General Unix Tips To use the command line start X11 and type commands into the “xterm” window A few things about unix commands:
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
XP Browser and Basics COM111 Introduction to Computer Applications.
Installing BioLinux on Mac OS X or Windows using a virtual machine Dr. Habil Zare, PhD.
Installing BioLinux on Mac OS X or Windows using a virtual machine Dr. Habil Zare, PhD.
IS493 INFORMATION SECURITY TUTORIAL # 1 (S ) ASHRAF YOUSSEF.
The iPlant Collaborative
Bioinformatics for biologists
Practical Kinetics Exercise 0: Getting Started Objectives: 1.Install Python and IPython Notebook 2.print “Hello World!”
®® Microsoft Windows 7 Windows Tutorial 2 Organizing Your Files.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Bioinformatics for biologists (2) Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Placental Bioinformatics
Integrative Genomics Viewer (IGV)
NGS Analysis Using Galaxy
Bioinformatics for biologists (2)
Bioinformatics for biologists
Workshop on Microbiome and Health
How to use WEBDAV in CIRCABC Pierre Beauregard
Steps in accessing Past Examination Papers
Download from Zotero Home Page
Regulatory Genomics Lab
Regulatory Genomics Lab
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented at University of Texas, Health Science Center – San Antonio 20 November 2015

Part 1 - BioLinux - Mapping RNAseq data to transcriptome (Salmon)

Bioinformatics: Computational and statistical analysis of biological data Data Biologists Results Genotypes / Phenotypes Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

In this workshop: A compact demo of bioinformatics analysis starting from raw data to produce useful plots and meaningful interpretation of the data RNAseq Biologists Pathway and Network Analysis Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

Goals of the workshop -A practical introduction to some basic bioinformatics tools for biologists. -Having hands-on experience with simple, toy-example data. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

Bio-Linux Bio-Linux is a free workstation platform that facilitates running hundreds of bioinformatics tools without the corresponding installation hassles. An easy way to install it on Mac OS X and Windows computers is described below: Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

Browsing files and folders tar.gz refers to a compressed file in Linux. Let’s practice decompressing such a file with an example. Follow the next steps in BioLinux. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

. 8 Double-click on Bio-Linux Documentation to open it.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Double-click on Introductory Tutorial

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Click on File>New TAb Click on File>New TAb

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Select the second tab and click on Home.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Drag and drop this file from intro_course tab to Home tab.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Right-click on the file and then Extract Here…

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This folder will appear. Open it and have a look inside.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

Downloading and installing Most useful bioinformatics tools are publicly available. You can download, install, and use them easily. Let’s practice with an example. Follow the next steps in BioLinux. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This is the “Dash”. Use it to launch and organize applications.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov E.g., use “Firefox” to browse the web.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Type oncinfo.org in the address bar and press enter.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov From the right menu, click on the workshop link.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Click on “zipped” to download the folder.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Choose to save the file.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Click on Files icon. 2- Click on Downloads. The file that you just downloaded was saved in Downloades folder.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This is the file you just downloaded. The file that you just downloaded was saved in Downloades folder.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Extract (decompress) the file that you just. Right-click on the file and then Extract Here…

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov The file that you just downloaded in saved in Downloades folder.

Salmon Salmon, a successor of Sailfish, is a useful tool for mapping RNAseq data. It is faster and easier to run than alternatives such as TopHat. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

Installing Salmon software We will run a script provided in the zipped file using a terminal. Terminal is an interface that uses only text to communicate between the user and the computer. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Click on the black rectangular to open a terminal. Click on the black rectangular to open a terminal. How to open a terminal?

. Try a few simple Linux commands e.g., echo, date, cal, … Try a few simple Linux commands e.g., echo, date, cal, … Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Type “ cd ” in the terminal to “ c hange d irectory”.

. Drag the folder to the terminal. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Now press Enter.

. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Double-click on the folder to open it. What is in the folder?

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Equivalently, “ls” shows you the list of files in this folder. What is in the folder?

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This script will install Salmon for you. What is in the folder?

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Type the name of the script and then press Enter. Type the name of the script and then press Enter. How to run the script?

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov How to run the script? Type your password, which is “manager” by default.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov How to make sure Salmon is installed? Type “salmon v” to test if it is installed or not. The script should download and install salmon. The following test indicates that installation was OK.

1- A FASTA file, which has the sequence information of the transcriptome of the species of interest. 2- One or more FASTQ files, which are provided by the sequencer instrument and contain the reads information from the samples. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Input for Salmon

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Toy examples of FASTA and FASTQ files Open the sample_data folder

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Next generation sequencing A sequencer produces millions of short reads ( bps). Biological sample Sequencer Short reads

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Toy examples of a FASTQ file Double click on reads_1.fastq file.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This is a read of length 50 with nucleotide and (Phred) quality information. Toy examples of a FASTQ file

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Double click on transcripts.fasta file. Toy examples of a FASTA file

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This is a transcript. Toy examples of a FASTA file

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov It is an mRNA with RefSeq ID NM_ Toy examples of a FASTA file

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Type the RefSeq ID, e.g., NM_ More information on the transcript Search in the NCBI database

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Type the RefSeq ID, e.g., NM_ Visualize the transcript on the genome Search in the UCSC genome browser bin/hgGateway bin/hgGateway

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This is the transcript Visualize the transcript on the genome Search in the UCSC genome browser bin/hgGateway bin/hgGateway

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov More information on this region is available. Visualize the transcript on the genome Search in the UCSC Genome Browser bin/hgGateway bin/hgGateway

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Quantify the level of expression The level of expression of each transcript can be quantified by counting the number of reads that are aligned to it.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Next generation sequencing A sequencer produces millions of short reads ( bps). Biological sample Sequencer Short reads

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Only exons are present in mRNA } } } } exon 1 exon 2 exon 3 exon 4

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Alignment Gene 1Gene 2 Determines what transcript (where on the genome) each read was originated from. Short reads in a FASTQ file

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Alignment Gene 1Gene 2 Short reads in a FASTQ file Determines what transcript (where on the genome) each read was originated from.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Alignment Gene 1Gene 2 Count the number of aligned (mapped) reads to each region.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Alignment Gene 1Gene 2 High expressionLow expression Compare the level of expression between genes.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Quantifying expression from RNAseq data Salmon processes raw data and quantifies expression levels in 2 steps. Step 1- Building an index for the transcriptome. Step 2- Aligning the reads to the transcriptome.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Are you in the right directory? Before you start, make sure you are in the correct directory. The pwd command in Linux shows the current directory. Typing “pwd” and then “Enter” will “print the working directory”, i.e., your current path.

Always make sure that the files are stored where you expect them to be. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Are you in the right directory? Before you start, make sure you are in the correct directory. The pwd command in Linux shows the current directory.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Step 1- Building an index for the transcriptome. Run the following command in the terminal in BioLinux: salmon index -t transcripts.fasta -i transcripts_index --type fmd

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Type the command here.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov For now, ignore this warning.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov The index is built.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Salmon created a new folder.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Step 2- Aligning the reads to the transcriptome. Run the following command in the terminal in BioLinux: salmon quant -i transcripts_index –l IU -1 reads_1.fastq -2 reads_2.fastq –o transcripts_quanton

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Step 2- Aligning the reads to the transcriptome. Run the following command in the terminal in BioLinux: } The command } The indexing built in step 1 } The first input file } The second input file } Output folder salmon quant -i transcripts_index –l IU -1 reads_1.fastq -2 reads_2.fastq –o transcripts_quanton

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Salmon created a new folder and stored the results there.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov quant.sf is the main output file that reports the number of reads and expression. Double click on it.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov The names of the transcripts (RefSeq IDs) and their length are in the first 2 columns.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov The number of mapped reads is reported on the last column.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Transcript per million (TPM) is the estimated expression. Transcript per million (TPM) is the estimated expression.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov Transcript per million (TPM) is the estimated expression. Transcript per million (TPM) is the estimated expression. TPM values correspond to counts normalized by the length of transcripts and also the depth of sequencing. There are other normalization methods such as RPKM and FPKM.

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This transcript is highly expressed

Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov This transcript is highly expressed These transcripts have low expression.

References: Some of the slides are based on Introduction to Biolinux Salmon is a useful tool for mapping and analyzing RNAseq data. I prepared these guidelines to facilitate the “Bioinformatics for biologists workshop”, 20 Nov 2015, UTHSC – San Antonio. Instaling BioLinux using VM, Dr. Habil Zare 27 Oct