Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:

Similar presentations


Presentation on theme: "BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:"— Presentation transcript:

1 BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Email: glu3@unl.edu Website: http://biocore.unl.eduglu3@unl.edu

2 A Whole Genome Shotgun Sequencing Project NATURE August 2000 pp. 801.

3 Introduction to Sequence Assembly Sequence assembly –also known as fragment assembly –assembling DNA fragments (both text sequences and chromatograms) from automated sequencers, into longer contiguous sequences or “contigs”

4 Introduction to Sequence Assembly Raw sequence data from the sequencer in the form of graphical trace files Viewed and converted into textual sequence files Align fragments and create assemblies Note that –Not all bases can be read correctly –Not all bases are equally reliable –Current sequencing methods allow reading of ~1000 bases per gel –Vector contamination

5 Available Sequence Assembly Systems GCG Fragment Assembly Package VNTI ContigExpress Staden GAP4 Phred/phrap/consed TIGR Web Contig Assembly ProgramContig Assembly Program …

6 GCG Fragment Assembly Package Only works with text-based sequence files Does not work directly with automated sequencer trace files Can generate sequence files from trace files using FromTrace

7 A GCG Fragment Assembly Project Initializes a new project Incorporates individual sequence files into the project Automatic identification of overlaps and arrangement of ordered contigs Multiple sequence editor Presents the reader with a graphical representation

8 Sequence Data Import Must be in a supported format –GCGGCG –FastA –StadenStaden Enter in SeqEd Enter in GelEnter

9 New Project Create a new fragment assembly project –Creates a new set of directories and files –DO NOT alter these files and directories GB:M13mp18,GB:SynpBR322 GAATTC, GGATCC

10 GelEnter Sequence editor Works like SeqEd For entering new fragments or importing fragments Existing fragments are modified with GelAssemble

11 GelMerge Finds overlaps between fragments and contigs Compares every fragment with every other fragment Settings determine the stringency necessary for an overlap ?

12 Calculating an Overlap Word Size (* 7 *) Stringency (* 0.80 *) –What fraction of words must match? Minimum overlap length (* 14 *) Sequence 1 Sequence 2 1 125 200 1

13 GelView Displays the structure of the fragments and contigs graphically Shows the current state of the fragment assembly project

14 ContigExpress A program for assembling DNA fragments (both text sequences and chromatograms) from automated sequencers, into longer contiguous sequences or “contigs”

15 Launch ContigExpress (CE) From the Start menu choose Programs | InforMax | Vector NTI Suite 8 | ContigExpress NOTE: CE Can be launched fro most other Vector NTI Suite applications Download Demo ProjectsDownload Demo Projects, then open it

16 End Trimming By Sequence Characteristics With sequences highlighted, choose Edit | Trim Selected Fragment Ends Click Settings and review the options: –For 5’ End; For 3’ End; Leave all settings as the default, click OK and then click Calculate! Any regions meeting the trim criteria defined above will be in red and lowercase Click OK then right-click on the gray column heading bar and choose Columns Double-click each of Length, 3’Trimmed bases and 5’ Trimmed bases Click OK

17 Trimming Using Phred Quality Values Select all fragments in the Project pane then right-click and choose Load phred quality values Click Quality Values If you have data with associated Phred quality values, navigate to the.qual file and click Open Click OK Imported Phred data, the scores may be used to trim sequence data Select sequences in the right-hand pane and choose Edit | Trim Selected Fragments Ends Using Phred QVs Review the Settings options: –Trim bases with QV less than: Select the threshold below which bases will be trimmed –Trim 5’/3’ bases: Specify which end(s) you wish to trim Click OK

18 Selecting Plasmid Regions for Vector Trimming From the Vector NTI Explorer, open the DNA molecule pUC19 Set the selection to 351bp to 500bp (to include the polylinker) Choose Tools | Send to | Polylinker to Contig Express Check Selection Only and Direct then click OK Name the file ‘pUC19 (351- 500)direct.seq’ then click Save Repeat for the complement and name the file ‘pUC19 (351- 500)comp.seq’

19 Trimming for Vector Contamination Highlight the sequences in the right-hand pane Choose Edit | Trim Selected Fragments For Vector Contamination… Click Settings In the Polylinker list, check the sequences defined earlier (pUC19 (351- 500)direct and pUC19 (351-500)comp) Highlight the name pUC19 (351-500)direct, click Add REN Sites, choose Enzlist25.dat then click Open Click HindIII (it will change color from gray to blue) Repeat for pUC19 (351-500)comp Click OK then click Calculate! Any contaminated regions will be in red and lowercase Click OK

20 Calling Secondary Peaks With all 12 sequences highlighted, choose Edit | Call Secondary Peaks For Selected Fragments Review the settings (Allow Ns to be Replaced, Allow Edited Bases to be Replaced, Set Threshold) Click Unselect All Fragments Check Allow Ns to be replaced Check the box next to ONE4KANR in the left hand pane (ensure this is the only fragment checked) and move the sliding bar to choose the threshold and observe the result in the sequence window. Choose 85%, the viewer will display secondary bases with heights 85% (or greater) as tall as the higher peak Click OK This tool can be used to resolve occurrence of double peaks in a chromatogram

21 Saving a Project Choose Project | Save As... and save the Project to your desktop as ‘Tutorial.cep’ Note: Tools such as BLAST Search, BioPlot are available from the menu bar all of the ContigExpress viewers

22 Assembly Setup From the Contig Express Project Window, choose Assemble | Assembly Setup –Contig Assembly Tab: Definition of various parameters such as length and % identity allowed for overlap –Alignment Tab: Define parameters for the alignments generated between fragments in contig creation (e.g. the score assigned to matching nucleotides or a mismatch). These are greyed out when using Linear Assembly –Algorithm Tab: Two algorithms are available –Light Settings Tab: Light contigs disregard chromatogram data and editing done on light contigs isn’t reflected in the original fragment sequences. Light contig assembly is preferred for assembling very large projects Leave all selections as the defaults and click OK

23 Pairwise Assembly Linear Assembly

24 Assembling Contigs From the List Pane on the right-hand side, highlight all 12 fragments Choose Assemble | Assemble Selected Fragments and click OK when the assembly is complete The Tree Pane on the left hand side shows the Assembly (Assembly 1) Click the Content View icon to show the tree/branching of contigs Click the History View icon In the List pane, the arrows indicate if fragment was included (blue) or attempted to be included (gray) in the assembly Highlight the name of the Contig containing most fragments (Contig1) in the List pane Click the Show Unassembled Fragments icon to deselect it and thus view only those fragments that are part of the contig. Click the icon again to return to the original view

25 Exporting the Contig Consensus Sequence from Vector NTI In the Contig Express Project Viewer, highlight the name Contig 2 in the List pane Right-click and choose Export Contig | To GenBank file Save the file to your desktop With Contig 2 still highlighted, choose Edit | Copy Return to the Vector NTI Explorer Choose Edit | Paste In the New DNA/RNA Molecule dialog box click OK (leave the name as Contig 2) In the Vector NTI Explorer, the Contig 2 molecule should now be present Double-click the name Contig 2 to open the Molecule Viewer The consensus sequence is now available for restriction mapping, editing, annotation and other analyses in Vector NTI

26 phred/phrap/consed Developed at the University of Washington –Phil Green (phrap) –Brent Ewing (phred) –David Gordon (consed) http://www.mbt.washington.edu/

27 Sequence Assembly PHRED –Base calling with quality scores PHRAP –Sequence Assembly CONSED –Assembly visualization/Editing

28 Quality Scores Phred assigns a quality value to each called base Phrap uses the quality value during automated assembly Consed displays the qualities in different shades

29 Exercise Log onto your biocomp2 account Create a directory: $ mkdir sequenceAssembly Fetch all mu* sequence files to the directory –$ cd sequenceAssembly –$ fetch mu*.seq Add fetched sequences to seqlab working.list Highlight all mu*.seq sequence files and run Gelstart, Gelenter, Gelmerge, Gelassemble, Gelview Export the fetched sequences to a file called mu.genbank in genbank format Use ftp to transfer mu.genbank to your desktop computer Drag and drop mu.genbank file to ContigExpress Project Window Select all fragments and Run Assembly Selected Fragments

30 Answer the following questions Summarize the outcome of the assembly and compare the results generated from the two sequence assembly systems How many contigs resulted? What were the lengths of the contigs? What were the sequences of the contigs?


Download ppt "BIOS816/VBMS818 Lecture 6 – Sequence Assembly Guoqing Lu Office: E115 Beadle Center Tel: (402) 472-4982 Website:"

Similar presentations


Ads by Google