Presentation on theme: "NGS Bioinformatics Workshop 2.2 Tutorial – Whole Genome Assembly Part I May 9th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor,"— Presentation transcript:
NGS Bioinformatics Workshop 2.2 Tutorial – Whole Genome Assembly Part I May 9th, 2012 IRMACS 10900 Facilitator: Richard Bruskiewich Adjunct Professor, MBB
Workflow for Today Generate a synthetic NGS read data set Genome assembly ABySS Velvet ALLPATHS-LG
Generate synthetic NGS read data for assembly Try a new program out called “ART” from Baylor College Huang W, Li L, Myers JR, Marth GT. 2012. ART: a next-generation sequencing read simulator. Bioinformatics. 28(4):593-4 Available as open source and as binary programs for 32 or 64 bit Windows, Mac and Linux http://www.niehs.nih.gov/research/resources/software/art Notes: the binary archive names are a bit strange – really a.tar.gz in disguise (need to do a gunzip followed by a tar –xvf) The fastq sequence line is *lower case* which is not expected by some software (e.g. ABySS)
============================================================================== ART (Q Version 1.3.6) Copyright(c) 2008-2012, Weichun Huang, Jason Myers. All Rights Reserved. ============================================================================== Paired-end Simulation Total CPU time used: 2.48 Parameters used during run Read Length: 50 Fold Coverage: 20X Mean Fragment Length: 200 Standard Deviation: 10 Profile Type: Combined ID Tag: Quality Profile(s) First Read: EMP50R1 (built-in profile) Second Read: EMP50R2 (built-in profile) Output files FASTQ Sequence Files: the 1st reads: Chloroplast1.fq the 2nd reads: Chloroplast2.fq ALN Alignment Files: the 1st reads: Chloroplast1.aln the 2nd reads: Chloroplast2.aln SAM Alignment File: Chloroplast.sam
Unfortunately… The ART program generates peculiar id’s (doesn’t mark the paired end reads…) and lower case sequence letters, which causes some headaches… So, I wrote a small python script to fix this…
#!/usr/bin/python # Fixes the output of the ART program # art_illumina -i reference.fa -p -l 50 -f 20 -m 200 -s 10 -o outFile_prefix -sam from sys import stdin seq = False qual = False if __name__ == '__main__': for line in stdin: line = line.strip() if qual: qual = False # to avoid treating rare quality score lines that start with '@' as id's elif line.startswith('+'): qual = True elif not seq and line.startswith('@'): # massage the ID part1 = line.split('|') part2 = part1.split('-') line = part1+'_'+part2+'-'+part2+'/'+part2 seq = True elif seq: # convert sequence all to upper case to avoid downstream confusion... line = line.upper() seq = False print line
Getting ABySS Installation: For Ubuntu, sudo apt-get install abyss Or visit BCGSC and download tar.gz source, then configure..make (more up-to-date?) Perhaps put the abyss bin directory on your path… To test run ABySS: abyss-pe k=25 name=test se=https://raw.github.com/dzerbino/ velvet/master/data/test_reads.fa
Try our test PE read data set abyss-pe name=Chloroplast31 k=31 ABYSS_OPTIONS=--no-trim-masked in=‘Chloroplast1.fastq Chloroplast2.fastq‘ The ‘no-trim-masked’ needed because default behaviour of abyss is to trim lower case letters in sequence (which designate identified vector sequences in 454 outputs…) Try with other k-mer sizes…
For more info about ABySS http://www.bcgsc.ca/platform/bioinfo/software/abyss Active list service to troubleshoot issues: firstname.lastname@example.org
Velvet http://www.ebi.ac.uk/~zerbino/velvet/ download & tar -zxvf make sudo make install put velvet directory on your $PATH Run velveth: velveth outputdir k_mer -fastq readfile Run velvetg: velvetg outputdir -ins_length 200 -exp_cov 20
ALLPATHS-LG http://www.broadinstitute.org/software/allpaths-lg/blog/ download and tar –zxvf ./configure make sudo make install Execute the program: PrepareAllPathsInputs.pl # needs some config files… RunAllPathsLG
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
Your consent to our cookies if you continue to use this website.