Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Similar presentations


Presentation on theme: "Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented."— Presentation transcript:

1 Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented at University of Texas, Health Science Center – San Antonio 20 November 2015

2 Part 1 - BioLinux - Mapping RNAseq data to transcriptome (Salmon)

3 Bioinformatics: Computational and statistical analysis of biological data Data Biologists Results Genotypes / Phenotypes Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 3

4 In this workshop: A compact demo of bioinformatics analysis starting from raw data to produce useful plots and meaningful interpretation of the data RNAseq Biologists Pathway and Network Analysis Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 4

5 Goals of the workshop -A practical introduction to some basic bioinformatics tools for biologists. -Having hands-on experience with simple, toy-example data. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 5

6 Bio-Linux Bio-Linux is a free workstation platform that facilitates running hundreds of bioinformatics tools without the corresponding installation hassles. An easy way to install it on Mac OS X and Windows computers is described below: http://oncinfo.org/file/view/BioLinux_VM.pptx/564155065/BioLinux_VM.pptx http://oncinfo.org/file/view/BioLinux_VM.pptx/564155065/BioLinux_VM.pptx Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 6

7 Browsing files and folders tar.gz refers to a compressed file in Linux. Let’s practice decompressing such a file with an example. Follow the next steps in BioLinux. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 7

8 . 8 Double-click on Bio-Linux Documentation to open it.

9 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 9 Double-click on Introductory Tutorial

10 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 10 Click on File>New TAb Click on File>New TAb

11 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 11 Select the second tab and click on Home.

12 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 12 Drag and drop this file from intro_course tab to Home tab.

13 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 13 Right-click on the file and then Extract Here…

14 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 14 This folder will appear. Open it and have a look inside.

15 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 15

16 Downloading and installing Most useful bioinformatics tools are publicly available. You can download, install, and use them easily. Let’s practice with an example. Follow the next steps in BioLinux. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 16

17 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 17 This is the “Dash”. Use it to launch and organize applications.

18 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 18 E.g., use “Firefox” to browse the web.

19 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 19 Type oncinfo.org in the address bar and press enter.

20 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 20 From the right menu, click on the workshop link.

21 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 21 Click on “zipped” to download the folder.

22 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 22 Choose to save the file.

23 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 23 1- Click on Files icon. 2- Click on Downloads. The file that you just downloaded was saved in Downloades folder.

24 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 24 This is the file you just downloaded. The file that you just downloaded was saved in Downloades folder.

25 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 25 Extract (decompress) the file that you just. Right-click on the file and then Extract Here…

26 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 26 The file that you just downloaded in saved in Downloades folder.

27 Salmon Salmon, a successor of Sailfish, is a useful tool for mapping RNAseq data. It is faster and easier to run than alternatives such as TopHat. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 27

28 Installing Salmon software We will run a script provided in the zipped file using a terminal. Terminal is an interface that uses only text to communicate between the user and the computer. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 28

29 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 29 Click on the black rectangular to open a terminal. Click on the black rectangular to open a terminal. How to open a terminal?

30 . Try a few simple Linux commands e.g., echo, date, cal, … Try a few simple Linux commands e.g., echo, date, cal, … Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 30

31 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 31 Type “ cd ” in the terminal to “ c hange d irectory”.

32 . Drag the folder to the terminal. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 32

33 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 33 Now press Enter.

34 . Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 34 Double-click on the folder to open it. What is in the folder?

35 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 35 Equivalently, “ls” shows you the list of files in this folder. What is in the folder?

36 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 36 This script will install Salmon for you. What is in the folder?

37 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 37 Type the name of the script and then press Enter. Type the name of the script and then press Enter. How to run the script?

38 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 38 How to run the script? Type your password, which is “manager” by default.

39 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 39 How to make sure Salmon is installed? Type “salmon v” to test if it is installed or not. The script should download and install salmon. The following test indicates that installation was OK.

40 1- A FASTA file, which has the sequence information of the transcriptome of the species of interest. 2- One or more FASTQ files, which are provided by the sequencer instrument and contain the reads information from the samples. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 40 Input for Salmon

41 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 41 Toy examples of FASTA and FASTQ files Open the sample_data folder

42 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 42 Next generation sequencing A sequencer produces millions of short reads (50-200 bps). Biological sample Sequencer Short reads

43 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 43 Toy examples of a FASTQ file Double click on reads_1.fastq file.

44 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 44 This is a read of length 50 with nucleotide and (Phred) quality information. Toy examples of a FASTQ file

45 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 45 Double click on transcripts.fasta file. Toy examples of a FASTA file

46 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 46 This is a transcript. Toy examples of a FASTA file

47 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 47 It is an mRNA with RefSeq ID NM_001168316 Toy examples of a FASTA file

48 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 48 Type the RefSeq ID, e.g., NM_001168316 More information on the transcript Search in the NCBI database http://www.ncbi.nlm.nih.gov/nuccore/ http://www.ncbi.nlm.nih.gov/nuccore/

49 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 49 Type the RefSeq ID, e.g., NM_001168316 Visualize the transcript on the genome Search in the UCSC genome browser https://genome.ucsc.edu/cgi- bin/hgGateway https://genome.ucsc.edu/cgi- bin/hgGateway

50 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 50 This is the transcript Visualize the transcript on the genome Search in the UCSC genome browser https://genome.ucsc.edu/cgi- bin/hgGateway https://genome.ucsc.edu/cgi- bin/hgGateway

51 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 51 More information on this region is available. Visualize the transcript on the genome Search in the UCSC Genome Browser https://genome.ucsc.edu/cgi- bin/hgGateway https://genome.ucsc.edu/cgi- bin/hgGateway

52 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 52 Quantify the level of expression The level of expression of each transcript can be quantified by counting the number of reads that are aligned to it.

53 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 53 Next generation sequencing A sequencer produces millions of short reads (50-200 bps). Biological sample Sequencer Short reads

54 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 54 Only exons are present in mRNA } } } } exon 1 exon 2 exon 3 exon 4

55 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 55 Alignment Gene 1Gene 2 Determines what transcript (where on the genome) each read was originated from. Short reads in a FASTQ file

56 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 56 Alignment Gene 1Gene 2 Short reads in a FASTQ file Determines what transcript (where on the genome) each read was originated from.

57 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 57 Alignment Gene 1Gene 2 Count the number of aligned (mapped) reads to each region.

58 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 58 Alignment Gene 1Gene 2 High expressionLow expression Compare the level of expression between genes.

59 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 59 Quantifying expression from RNAseq data Salmon processes raw data and quantifies expression levels in 2 steps. http://salmon.readthedocs.org/en/latest/salmon.html#using-salmon Step 1- Building an index for the transcriptome. Step 2- Aligning the reads to the transcriptome.

60 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 60 Are you in the right directory? Before you start, make sure you are in the correct directory. The pwd command in Linux shows the current directory. Typing “pwd” and then “Enter” will “print the working directory”, i.e., your current path.

61 Always make sure that the files are stored where you expect them to be. Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 61 Are you in the right directory? Before you start, make sure you are in the correct directory. The pwd command in Linux shows the current directory.

62 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 62 Step 1- Building an index for the transcriptome. Run the following command in the terminal in BioLinux: salmon index -t transcripts.fasta -i transcripts_index --type fmd

63 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 63 Type the command here.

64 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 64 For now, ignore this warning.

65 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 65 The index is built.

66 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 66 Salmon created a new folder.

67 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 67 Step 2- Aligning the reads to the transcriptome. Run the following command in the terminal in BioLinux: salmon quant -i transcripts_index –l IU -1 reads_1.fastq -2 reads_2.fastq –o transcripts_quanton

68 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 68 Step 2- Aligning the reads to the transcriptome. Run the following command in the terminal in BioLinux: } The command } The indexing built in step 1 } The first input file } The second input file } Output folder salmon quant -i transcripts_index –l IU -1 reads_1.fastq -2 reads_2.fastq –o transcripts_quanton

69 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 69 Salmon created a new folder and stored the results there.

70 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 70 quant.sf is the main output file that reports the number of reads and expression. Double click on it.

71 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 71 The names of the transcripts (RefSeq IDs) and their length are in the first 2 columns.

72 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 72 The number of mapped reads is reported on the last column.

73 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 73 Transcript per million (TPM) is the estimated expression. Transcript per million (TPM) is the estimated expression.

74 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 74 Transcript per million (TPM) is the estimated expression. Transcript per million (TPM) is the estimated expression. TPM values correspond to counts normalized by the length of transcripts and also the depth of sequencing. There are other normalization methods such as RPKM and FPKM.

75 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 75 This transcript is highly expressed

76 Bioinformatics for biologists workshop, Dr. Habil Zare, Oncinfo Lab UTHSC San Antonio, 20 Nov 2015 76 This transcript is highly expressed These transcripts have low expression.

77 References: Some of the slides are based on Introduction to Biolinux http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf http://nebc.nerc.ac.uk/downloads/courses/Bio-Linux/bl8_latest.pdf Salmon is a useful tool for mapping and analyzing RNAseq data. https://combine-lab.github.io/salmon/ https://combine-lab.github.io/salmon/ I prepared these guidelines to facilitate the “Bioinformatics for biologists workshop”, 20 Nov 2015, UTHSC – San Antonio. http://oncinfo.org/Bioinformatics+for+biologist+workshop http://oncinfo.org/Bioinformatics+for+biologist+workshop Instaling BioLinux using VM, Dr. Habil Zare 27 Oct 2015 77


Download ppt "Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented."

Similar presentations


Ads by Google