Workshop on Microbiome and Health

Workshop on Microbiome and Health
Hands on: Metagenomics data types, statistics and quality control Esteban Pérez Wohlfeil & Oswaldo Trelles {estebanpw, Computer Architecture Department, University of Malaga, Spain Faculte des Sciences; Universite Sidi Mohamed Ben Abdellah 2017

Global Agenda Contents and time distribution
AGENDA (1h 00m) Getting to know our Virtual Machine Interacting and exploring the metagenomic samples Quality control step using QTrim Running a sequence comparison with a reference database using BLAST

Getting to know the Virtual Machine
The provided VM has Ubuntu and several software already installed to facilitate the hands-on. The following software is incorporated already: BLAST suite: BLASTn, BLASTx, BLASTp, … MEGAN Qtrim Trimmomatic METAGECKO EMBOSS toolkit Rstudio with R 3.3.3 Several scripts: FastaQ to Fasta converter, spreadsheets, plotting tools, etc.

Log into the “metagenomics-pipeline” user with the password: student

Examples that we will use during the hands-on are located in /home/student/Documents/Example

These examples include: Folder 454: Lean_TS1.fastq Obese_TS19.fastq Folder calc: Spreadsheet to calculate differential abundance Folder database: A reference database containing several genomes commonly found in gastrointestinal human system Folder results: Empty folder to store processed files

Exploratory analysis

? Exploratory Analysis FastQ Fasta
A metagenome can be seen as a long signal (often incomplete and noisy) that requires processing in order to detect anything significant Although not mandatory, it is very recommended that we take a look into our samples always before starting a processing pipeline FastQ Fasta ?

Exploratory Analysis In your Virtual Machine, start by opening a terminal clicking on the black command prompt in the left tab

Exploratory Analysis We can execute commands on the terminal just as if we were double clicking on programs. Lets first open up our metagenomes using the less command. Do as follows: This will open up the lean_TS1.fastq metagenome. Does it look ok? To navigate through the metagenome use the arrow keys. To exit, just press q

Exploratory Analysis (Skip this if you already know it) The terminal is a powerful tool to manage files. There are a few commands that always come in handy: Command Description cp <file to copy> <destination> Copies a file to another place mv <file to move> <location to move> Moves a file to another location rm <file to delete> Deletes a file less <file to read> Reads a text file in the terminal ls Displays the contents of the folder pwd Shows the current working directory cd <folder to enter> Enters a folder. Use cd ../ to go back one level

Exploratory Analysis Now we will check the distribution of lengths of the reads to see that there are no outliers. First convert from fastQ to fasta: And then run the script exploratory.sh with the new fasta file as argument: This will generate a .png image with a histogram of the distribution of length of reads.

Exploratory Analysis Are there any outliers?
Does it make sense taking into account the kind of sequencer it comes from? Will it be different after the quality control step? Before QC

Exploratory Analysis We can also check the average length, the number of reads and the maximum length by opening the file that was generated automatically:

Quality Control Step Quality control

Quality Control Step Now we will perform the Quality Control step to trim and filter impurities in the samples. This goes from adapters to errors that have been included in the sequencing process. Lets filter and trim both samples: lean_TS1.fastq and obese_TS19.fastq. To do so, execute the following commands into the terminal: python ~/Qtrim/QTrim_v1_1/QTrim_v1_1.py -m 26 -fastq $DATA/454/lean_TS1.fastq -o $DATA/454/lean_TS1.trimmed.fastq And also: python ~/Qtrim/QTrim_v1_1/QTrim_v1_1.py -m 26 -fastq $DATA/454/obese_TS19.fastq -o $DATA/454/obese_TS19.trimmed.fastq

Quality Control Step Remember to convert both of them to fasta format so we can run them in our pipeline: And also: This will generate the fasta files ready to be processed.

Quality Control Step Now run the exploratory analysis for the new trimmed lean_TS1.trimmed.fasta file and compare previous and new plot Are there any outliers? Does it make sense taking into account the kind of sequencer it comes from? Is it any different after the quality control step? Before QC After QC

Quality Control Step Notes Quality control should be rightly parametrized depending on the preparation libraries used in sequencing and the sequencing instrument A strong biological knowledge is needed Still, a filtering process will usually improve quality The –M parameter can be adjusted for more/less filtering

Workshop on Microbiome and Health

Similar presentations

Presentation on theme: "Workshop on Microbiome and Health"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Workshop on Microbiome and Health

Similar presentations

Presentation on theme: "Workshop on Microbiome and Health"— Presentation transcript:

Similar presentations

About project

Feedback