NGS Bioinformatics Workshop 2.2 Tutorial – Whole Genome Assembly Part I May 9th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor,

Slides:

Advertisements

Similar presentations

NGS Bioinformatics Workshop 2.1 Tutorial – Next Generation Sequencing and Sequence Assembly Algorithms May 3rd, 2012 IRMACS Facilitator: Richard.

Advertisements

Facilitator: Richard Bruskiewich

NGS Bioinformatics Workshop 1.3 Tutorial - Sequence Alignment and Searching March 22 nd, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor,

Kelley Bullard, Henry Dewhurst, Kizee Etienne, Esha Jain, VivekSagar KR, Benjamin Metcalf, Raghav Sharma, Charles Wigington, Juliette Zerick Genome Assembly.

Linux Platform  Download the source tar ball from the BLAST source code link  ncbi-blast src.tar.gz  Compilation  cd /BLASTdirectory/c++ ./configure.

Introduction to bioperl. What is perl? Production Engineering Research Laboratory Practically Everything Really Likeable Pre-positioned Equipment Requirement.

Computer System Laboratory

NGS data format and General Quality Control. Data format “Flowchart” Sequencer raw data FastqSAM/BAM.

CS 0008 Day 2 1. Today Hardware and Software How computers store data How a program works Operators, types, input Print function Running the debugger.

Before we start: Align sequence reads to the reference genome

The New Books List Michael Doran, Systems Librarian Ex Libris Southwest Users Group February 6, 2008 – Santa Ana College.

Lab 3 Department of Computer Science and Information Engineering National Taiwan University Lab3 - Cross Tools 2014/10/7/ 20 1.

De-novo Assembly Day 4.

Li and Dewey BMC Bioinformatics 2011, 12:323

MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.

Introduction to Python

Builtins, namespaces, functions. There are objects that are predefined in Python Python built-ins When you use something without defining it, it means.

LING 408/508: Programming for Linguists Lecture 3 August 31 st.

File formats Wrapping your data in the right package Deanna M. Church

NGS Bioinformatics Workshop 1.1 Tutorial – Preparing for Bioinformatics Work March 8 th, 2012 IRMACS, SFU Facilitator: Richard Bruskiewich Adjunct Professor,

Genome Assembly Preliminary Results

Day 7 Installing Software RPM tar, mtools make, ssh.

Visualising NGS data in GBrowse 2 August 2009 GMOD Meeting 6-7 August 2009 Dave Clements GMOD Help Desk National Evolutionary Synthesis Center (NESCent)

Sequence assembly using paired- end short tags Pramila Ariyaratne Genome Institute of Singapore SOC-FOS-SICS Joint Workshop on Computational Analysis of.

NGS Bioinformatics Workshop 1.5 Tutorial – Genome Annotation April 5th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB.

Next Generation DNA Sequencing

RNA-Seq in Galaxy Igor Makunin QAAFI, Internal Workshop, April 17, 2015.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

Lab7 QRNA, HMMER, PFAM. Sean Eddy’s Lab

Quick introduction to genomic file types Preliminary quality control (lab)

NGS Bioinformatics Workshop 1.4 Tutorial - Comparative Sequence Analysis and Visualization March 29th, 2012 IRMACS Facilitator: Richard Bruskiewich.

Robert Arthur Kevin Lee Xing Liu Pushkar Pande Gena Tang Racchit Thapliyal Tianjun Ye.

Next Generation Sequencing pipeline: a joint LONI – BIRN [UCLA – UCI] collaborative project F. Macciardi – March 16, 2011.

Cole David Ronnie Julio Sam Littlefield. Let’s Begin  Globus Toolkit runs on Unix platform only  Install Ubuntu  download all updates for Ubuntu.

IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

Lab7 Twinscan, HMMER, PFAM. TWINSCAN TwinScan TwinScan finds genes in a "target" genomic sequence by simultaneously maximizing the probability of the.

Python 101 Dr. Bernard Chen University of Central Arkansas PyArkansas.

ZHT Hands-on tutorial How to install, configure and run ZHT on a multi-nodes cluster.

Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.

Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015

Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.

Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID

Denovo Sequencing Practical. Overview Very small dataset from Staphylococcus aureus – 4 million x 75 base-pair, paired end reads Cover basic aspects of.

RNA-Seq in Galaxy Igor Makunin DI/TRI, March 9, 2015.

Cloud Computing Mapreduce (2) Keke Chen. Outline  Hadoop streaming example  Hadoop java API Framework important APIs  Mini-project.

__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.

De novo assembly of RNA Steve Kelly

Nachos Overview and Project 1. Nachos Introduction Official website

Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.

Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.

Install CB 1.8 on Ubuntu. Steps Followed Install Ubuntu (Ubuntu LTS) on Virtual machine – (VMware Workstation) (

Installing CUDA, PyCUDA on Ubuntu

Tutorial on setting up Zebra: A Z39.50 Server ARD Prasad DRTC Indian Statistical Institute Bangalore.

ECE 544 Software Project 1 Kuo-Chun Huang (KC). Environment Linux (Ubuntu or others) Windows with Cygwin

Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.

Short Read Workshop Day 5: Mapping and Visualization

Using Docker in a CyVerse World. To install Docker GO TO Click “Get Started”, follow the directions.

Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.

Outline Installing Gem5 SPEC2006 for Gem5 Configuring Gem5.

Placental Bioinformatics

Install external command line softwares

The Linux Operating System

ChIP-Seq Analysis – Using CLCGenomics Workbench

Introduction into the processing of raw data

University of Texas Rio Grande Valley Systems Administration CSCI 6175

Maximize read usage through mapping strategies

Computer System Laboratory

Campus and Phoenix Resources

RNA-Seq Data Analysis UND Genomics Core.

Presentation transcript:

NGS Bioinformatics Workshop 2.2 Tutorial – Whole Genome Assembly Part I May 9th, 2012 IRMACS Facilitator: Richard Bruskiewich Adjunct Professor, MBB

Workflow for Today  Generate a synthetic NGS read data set  Genome assembly  ABySS  Velvet  ALLPATHS-LG

Generate synthetic NGS read data for assembly  Try a new program out called “ART” from Baylor College Huang W, Li L, Myers JR, Marth GT ART: a next-generation sequencing read simulator. Bioinformatics. 28(4):593-4  Available as open source and as binary programs for 32 or 64 bit Windows, Mac and Linux  Notes:  the binary archive names are a bit strange – really a.tar.gz in disguise (need to do a gunzip followed by a tar –xvf)  The fastq sequence line is *lower case* which is not expected by some software (e.g. ABySS)

Simulated Illuminex Paired End Reads  Using rice chloroplast genome (~134kb) art_illumina -i Chloroplast.fasta  -p -l 50 -f 20 -m 200  -s 10 -o Chloroplast -sam  Generates files:  Chloroplast1.aln  Chloroplast1.fq  Chloroplast2.aln  Chloroplast2.fq  Chloroplast.sam

============================================================================== ART (Q Version 1.3.6) Copyright(c) , Weichun Huang, Jason Myers. All Rights Reserved. ============================================================================== Paired-end Simulation Total CPU time used: 2.48 Parameters used during run Read Length: 50 Fold Coverage: 20X Mean Fragment Length: 200 Standard Deviation: 10 Profile Type: Combined ID Tag: Quality Profile(s) First Read: EMP50R1 (built-in profile) Second Read: EMP50R2 (built-in profile) Output files FASTQ Sequence Files: the 1st reads: Chloroplast1.fq the 2nd reads: Chloroplast2.fq ALN Alignment Files: the 1st reads: Chloroplast1.aln the 2nd reads: Chloroplast2.aln SAM Alignment File: Chloroplast.sam

Unfortunately…  The ART program generates peculiar id’s (doesn’t mark the paired end reads…) and lower case sequence letters, which causes some headaches…  So, I wrote a small python script to fix this…

#!/usr/bin/python # Fixes the output of the ART program # art_illumina -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o outFile_prefix -sam from sys import stdin seq = False qual = False if __name__ == '__main__': for line in stdin: line = line.strip() if qual: qual = False # to avoid treating rare quality score lines that start with as id's elif line.startswith('+'): qual = True elif not seq and # massage the ID part1 = line.split('|') part2 = part1[1].split('-') line = part1[0]+'_'+part2[0]+'-'+part2[1]+'/'+part2[2] seq = True elif seq: # convert sequence all to upper case to avoid downstream confusion... line = line.upper() seq = False print line

Getting ABySS  Installation:  For Ubuntu, sudo apt-get install abyss  Or visit BCGSC and download tar.gz source, then configure..make (more up-to-date?)  Perhaps put the abyss bin directory on your path…  To test run ABySS: abyss-pe k=25 name=test  se=  velvet/master/data/test_reads.fa

Try our test PE read data set  abyss-pe name=Chloroplast31 k=31  ABYSS_OPTIONS=--no-trim-masked  in=‘Chloroplast1.fastq Chloroplast2.fastq‘  The ‘no-trim-masked’ needed because default behaviour of abyss is to trim lower case letters in sequence (which designate identified vector sequences in 454 outputs…)  Try with other k-mer sizes…

For more info about ABySS  Active list service to troubleshoot issues:

Velvet  download & tar -zxvf  make  sudo make install  put velvet directory on your $PATH  Run velveth:  velveth outputdir k_mer -fastq readfile  Run velvetg:  velvetg outputdir -ins_length 200 -exp_cov 20

ALLPATHS-LG  download and tar –zxvf ./configure  make  sudo make install  Execute the program:  PrepareAllPathsInputs.pl # needs some config files…  RunAllPathsLG