Presentation on theme: "Bioinformatics and Phylogenetic Analysis"— Presentation transcript:
1 Bioinformatics and Phylogenetic Analysis Edgar ScottMulticampus Bioinformatics Education Specialist
2 What is Bioinformatics Interdisciplinary field that combines principles and techniques from computer science, probability and statistics, and linguistics to the study of genomic and proteomic sequences.Biological database for storing and organizng DNA and protein sequencesComputational tools for analyzing sequences
3 Phylogenetic Analysis and Bioinformatics Phylogenetics – study of evolutionary relationshipsPhylogenetic trees used to represent evolutionary relationshipsUse of protein or DNA sequences to detect relationships versus morphological charactersBioinformatics provides both sequence repositories and sequence analysis software.
4 Overview Acquiring Data Set Analyzing Data Set Text searching at the National Center for Biotechnology Information (NCBI)Sequence similarity and homologySequence similarity searching with Basic Local Alignment Search Tool (BLAST)Analyzing Data SetPhylogenetic Analysis with Molecular Evolutionary Genetics Analysis (MEGA) 3.1 softwareBuild multiple sequence alignments of sequences using ClustalWBuild phylogenetic trees
5 Text Searching at NCBINCBI maintains provides molecular information and bioinformatic tools to the scientific communityGenBank – an archival DNA and protein sequence databaseRefSeq – a curated DNA and protein sequence databaseEntrez Gene – a gene centered database
6 Sequence Similarity and Homology Homology – sequence that share a common ancestral sequenceParalogs – arise via gene duplicationOrthologs – arise via speciation eventXenologs – arise via gene transferEvolutionarily related sequences have similar sequences.Sequence differences correspond to amount of change that has occurred since they last shared a common ancestral sequence.
7 Sequence AlignmentsSequence Alignment – a process that identifies a series of characters or character patterns that are in the same order in both sequences.Pairwise Global alignmentPairwise Local alignmentOptimal alignment – an alignment between sequences in which the number of matching characters are maximized and the mismatching characters are minimized.Quantifying alignmentsAlignment score of the optimal alignmentPercent identity scoresPercent similarity scores
8 Sequence Similarity Searching Basic Local Alignment Search Tool (BLAST)Blastp, Blastn, Blastx, Tblastn, & TblastXLocal alignments are reportedExpectation Value – the number of times an investigator can expect to find an alignment that has an alignment score as good or better than the alignment score under consideration.
9 Steps to Build a Tree Build a multiple sequence alignment of data set. Analyze multiple sequence alignment using either distance based methods or character based methods.
10 Molecular Evolutionary Genetics Analysis (MEGA) 3.1 Phylogenetic Analysis programConstructs multiple sequence alignment using ClustalWProvides tree building methodsDistance based MethodsUPGMANeighbor-joining methodMinimum EvolutionCharacter based MethodMaximum ParsimonyProvides a great help document!
11 Multiple Sequence Alignment Multiple Sequence Alignment – an alignment between three or more sequences.Computationally classified as NP-hardProgramsClustalW – fast, applies a progressive methodT-Coffee – slower, applies an advanced progressive methodDialign – slow, applies an iterative methodCombine – combines multiple sequence alignments
12 Tree Building methods UPGMA, Neighbor-Joining, Minimum Evolution Distance based methodsAnalyze the multiple sequence alignment to calculate a distance matrix.Clustering algorithm analyzes the distance matrix to determine which sequences should be clustered.Maximum parsimonyCharacter based methodAnalyze the multiple sequence alignment to create a tree whose tree length has been minimized.
13 Tree ReliabilityBootstrapping – method for assessing the reliability of trees.StepsThe original data set is resampled several times (e.g. 1000).For each resampling, a tree is builtThe trees created from the resampling iterations are compared to the original tree.
14 Review Acquiring Data Set Analyzing Data Set Text searching at the National Center for Biotechnology Information (NCBI)Sequence similarity and homologySequence similarity searching with Basic Local Alignment Search Tool (BLAST)Analyzing Data SetPhylogenetic Analysis with Molecular Evolutionary Genetics Analysis (MEGA) 3.1 softwareBuild multiple sequence alignments of sequences using ClustalWBuild phylogenetic trees