MCB 371/372 PHYLIP & Exercises 4/13/05 Peter Gogarten Office: BSP 404 phone: 860 486-4061,

Slides:



Advertisements
Similar presentations
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
Advertisements

Statistical Methods Lynne Stokes Department of Statistical Science Lecture 7: Introduction to SAS Programming Language.
ATPase dataset -> nj in figtree. ATPase dataset -> muscle -> phyml (with ASRV)– re-rooted.
MCB 372 Phylogenetic reconstruction PHYLIP Peter Gogarten Office: BSP 404 phone: ,
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
Functions.
MCB 372 Trees Phylogenetic reconstruction PHYLIP Peter Gogarten Office: BSP 404 phone: ,
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
MCB 371/372 phyml & treepuzzle 4/18/05 Peter Gogarten Office: BSP 404 phone: ,
Input and output. What’s in PHYLIP Programs in PHYLIP allow to do parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus.
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
MCB 372 Phylogenetic reconstruction Peter Gogarten Office: BSP 404 phone: ,
Phylogenetic reconstruction Peter Gogarten Office: BSP 404 phone: ,
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
MCB 5472 Phylogenetic reconstruction PHYLIP Peter Gogarten Office: BSP 404 phone: ,
MCB 5472 Sequence alignment Peter Gogarten Office: BSP 404 phone: ,
Phylogenetic reconstruction - How
Steps of the phylogenetic analysis
MCB 371/372 PHYLIP how to make sense out of a tree 4/11/05 Peter Gogarten Office: BSP 404 phone: ,
MCB 371/372 vi, perl, Sequence alignment, PHYLIP 4/6/05 Peter Gogarten Office: BSP 404 phone: ,
Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable.
MCB 371/372 Sequence alignment 3/30/05 Peter Gogarten Office: BSP 404 phone: ,
MCB 371/372 Sequence alignment Sequence space 4/4/05 Peter Gogarten Office: BSP 404 phone: ,
Branches, splits, bipartitions In a rooted tree: clades Mono-, Para-, polyphyletic groups, cladists and a natural taxonomy Terminology The term cladogram.
MCB 372 Sequence alignment Peter Gogarten Office: BSP 404
COBOL for the 21 st Century Stern, Stern, Ley Chapter 1 INTRODUCTION TO STRUCTURED PROGRAM DESIGN IN COBOL.
MCB5472 Computer methods in molecular evolution Lecture 3/31/2014.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
MCB5472 Computer methods in molecular evolution Lecture 3/22/2014.
Using Macs and Unix Nancy Griffeth January 6, 2014 Funding for this workshop was provided by the program “Computational Modeling and Analysis of Complex.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Structured COBOL Programming, Stern & Stern, 9th edition
Demo: Phylip Ziheng Yang Department of Biology, UCL.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
1 Lab of COMP 406 Teaching Assistant: Pei-Yuan Zhou Contact: Lab 1: 12 Sep., 2014 Introduction of Matlab (I)
ATPase dataset -> nj in figtree. ATPase dataset -> muscle -> phyml (with ASRV)– re-rooted.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetic trees School B&I TCD Bioinformatics May 2010.
Chapter 1: Getting Started with MATLAB MATLAB for Scientist and Engineers Using Symbolic Toolbox.
Applied Bioinformatics Week 8 Jens Allmer. Practice I.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
Trees – what might they mean? Calculating a tree is comparatively easy, figuring out what it might mean is much more difficult. If this is the probable.
Agenda Getting Started: Using Unix Unix Structure / Features Elements of the Unix Philosophy Unix Command Structure Command Line Editing Online Unix Command.
Copyright OpenHelix. No use or reproduction without express written consent1.
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Algorithms  Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
5 1 Data Files CGI/Perl Programming By Diane Zak.
11/25/2015Slide 1 Scripts are short programs that repeat sequences of SPSS commands. SPSS includes a computer language called Sax Basic for the creation.
Why do trees?. Phylogeny 101 OTUsoperational taxonomic units: species, populations, individuals Nodes internal (often ancestors) Nodes external (terminal,
ATPase dataset from last Friday Alignment clustal vs muscle Conserved part are aligned reproducibly.
MCB5472 Computer methods in molecular evolution Slides for comp lab 4/2/2014.
Applied Bioinformatics Week 8 Jens Allmer. Theory I.
BIF713 Introduction to Linux. Agenda Getting Started: Using Linux Unix and Linux - Structure / Features Elements of the Linux Philosophy Linux Command.
Phylip PHYLIP (the PHYLogeny Inference Package) is a package of programs for inferring phylogenies (evolutionary trees). PHYLIP is the most widely-distributed.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Phylogenetic reconstruction - How Distance analyses calculate pairwise distances (different distance measures, correction for multiple hits, correction.
Development Environment
Algorithms Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.
Multiple Alignment, Distance Estimation, and Phylogenetic Analysis
Phylogenetic reconstruction PHYLIP
Topics Introduction to File Input and Output
MCB 5472 Sequence alignment Peter Gogarten Office: BSP 404
MCB 5472 Intro to Trees Peter Gogarten Office: BSP 404
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Input and Output Python3 Beginner #3.
Topics Introduction to File Input and Output
Presentation transcript:

MCB 371/372 PHYLIP & Exercises 4/13/05 Peter Gogarten Office: BSP 404 phone: ,

input and output

Programs in PHYLIP are Modular For example: SEQBOOT take one set of aligned sequences and writes out a file containing bootstrap samples. PROTDIST takes a aligned sequences (one or many sets) and calculates distance matices (one or many) FITCH (or NEIGHBOR) calculate best fitting or neighbor joining trees from one or many distance matrices CONSENSE takes many trees and returns a consensus tree …. modules are available to draw trees as well, but often people use treeview or njplottreeview njplot

The Phylip Manual is an excellent source of information. Brief one line descriptions of the programs are herehere The easiest way to run PHYLIP programs is via a command line menu (similar to clustalw). The program is invoked through clicking on an icon, or by typing the program name at the command line. > seqboot > protpars > fitch If there is no file called infile the program responds with: gogarten]$ seqboot seqboot: can't find input file "infile" Please enter a new file name>

program folder

Example 1 Protpars example: seqboot, protpars, consense on infile1 NOTE the bootstrap majority consensus tree does not necessarily have the same topology as the “best tree” from the original data! threshold parsimony, gap symbols - versus ? (in vi you could use :%s/-/?/g to replace all – ?) outfile outtree compare to distance matrix analysis

protpars (versus distance/FM) Extended majority rule consensus tree CONSENSUS TREE: the numbers on the branches indicate the number of times the partition of the species into the two sets which are separated by that branch occurred among the trees, out of trees Prochloroc | | Synechococ | | Guillardia | | | | | Clostridiu | | | | | | | | | Thermoanae | | | | | Homo sapie | | | | | Oryza sati | | | | | Arabidopsi | | | | Synechocys | | | | | Nostoc pun | | | | | Nostoc sp | | | Trichodesm | Thermosyne remember: this is an unrooted tree! branches are scaled with respect to bootstrap support values, the number for the deepest branch is handeled incorrectly by njplot and treeview

(protpars versus) distance/FM Tree is scaled with respect to the estimated number of substitutions. what might be the explanation for the red algae not grouping with the plants? If time: demo of njplot

protdist PROTdist Settings for this run: P Use JTT, PMB, PAM, Kimura, categories model? Jones-Taylor-Thornton matrix G Gamma distribution of rates among positions? No C One category of substitution rates? Yes W Use weights for positions? No M Analyze multiple data sets? No I Input sequences interleaved? Yes 0 Terminal type (IBM PC, ANSI)? ANSI 1 Print out the data at start of run No 2 Print indications of progress of run Yes

without and with correction for ASRV

subtree with branch lengths without and with correction for ASRV

compare to trees with FITCH and clustalw – same dataset

bootstrap support ala clustalprotpars (gaps as ?)

Progress in Unix wget is a command that is implemented in most modern UNIX flavors (linux, darwin..) You execute it from the command line. For example: >wget will copy test1.fa from the web address into your current working directory. Using this in conjunction with (PC), (MAC), or (linux/darwin) this saves a lot of typing. (control c in UNIX terminates whatever program you are running. Therefore the copy shortcut usually is ) Hasan and Lina volunteered to do a short UNIX intro this afternoon in the lab.

perl assignment #1: On Monday we used the perl script extract_lines.pl. extract_lines.pl Modify the script so that it prints out an additional column that for each blasthit it alse gives the bitscore of the alignment divided by the alignment lengths. Hints: chomp ($line); removes the \n character at the end of $line $parts[i] contains the (i+1)s entry of the line. If $a[11] and $a[3] are variables in an array that contain numbers you can assign a new variable to the ratio using the command $ratio_of_values=$a[11]/$a[3]; To print things in a line in the output separated by tabs and a new line symbol at the end you could say for example: print OUT "$line\t$parts[12]\t$ratio\n“; Go over new*.pl in from_peter/temp/ on carrot One working program is here.here

perl assignment #2: Assume that you have the following non-aligned multiple sequence files in a directory: A.fa : vacuolar/archaeal ATPase catalytic subunits ; B.fa : vacuolar/archaeal ATPase non-catalytic subunits; alpha.fa : F-ATPases non-catalytic subunits, beta.fa : F-ATPases catalytic subunits, F.fa : ATPase involved in the assembly of the bacterial flagella. A.fa B.fa alpha.fa beta.fa F.fa Write a perl script that executes muscle or clustalw and 1) aligns the sequences within each file 2) successively calculates profile alignments between all aligned sequences. Hints: system (command) executes “command”as if you had typed command in the command line A working solution is herehere

questions and #why is the \ in \. necessary? if ($counter==0); #why would one “=“ fail? system("clustalw -profile1=result.out -profile2=$filename.aln -outfile=result.out -profile -align"); #Notes: the profile alignment is written back into the file from which profile1 originated. The file does not need to end on.aln -- but it is nice to keep the convention (see below). Perl replaces the variables ( $filename with their content e.g. beta ). Note the use of indentations to clarify the control structures. $counter=$counter+1 ; and $counter += 1 are equivalent = is an assignment operator. $counter=$counter+1 is not a mathematical equation

Perl assignment #3 Write a script that takes all phylip formated aligned multiple sequence files present in a directory, and perfomes a bootstrap analyses using maximum parsimony. I.e., the script should go through the same steps as we did in the exercises #4 tasks 1a and 1c Files you might want to use are A.fa, B.fa, alpha.fa, beta.fa, and atp_all.phy. BUT you first have to convert them to phylip format AND you should replace some or all gaps with ? (In the end you would be able to answer the question “does the resolution increase if a more related subgroup is analyzed independent from an outgroup?)

hints Rather than typing commands at the menu, you can write the responses that you would need to give via the keyboard into a file (e.g. your_input.txt) You could start and execute the program protpars by typing protpars < your_input.txt your input.txt might contain the following lines: infile1.txt r t 10 y r in the script you could use the line system (“protpars < your_input.txt”); The main problem are the owerwrite commands if the oufile and outtree files are already existing. You can either create these beforehand, or erase them by moving (mv) their contents somewhere else.