Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Mutations.
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Sequence Alignments and Database Searches Introduction to Bioinformatics.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Sequencing and Sequence Alignment
Introduction to Bioinformatics Algorithms Sequence Alignment.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Introduction to Bioinformatics
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignments Introduction to Bioinformatics.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Sequence Alignments and Dynamic Programming BIO/CS 471 – Algorithms for Bioinformatics.
Sequence Alignment.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Introduction to Bioinformatics Dot Plots. One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity –Assign.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequencing a genome and Basic Sequence Alignment
Construction of Substitution Matrices
Arun Goja MITCON BIOPHARMA
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Mutations.
Sequence Alignment.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
 During replication (in DNA), an error may be made that causes changes in the mRNA and proteins made from that part of the DNA  These errors or changes.
Rate of mutations in the Human Genome A study published in Current Biology in 2009, shows that in total, we all carry new mutations in our DNA.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Fantasy Mutations Reality. Mutations: a permanent and heritable change in the nucleotide sequence of a gene. Are caused by mutagens (x-rays and UV light)
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
DNA Mutations. Remember that during DNA replication, the DNA makes an exact copy of itself before it divides. DNA replication is not always accurate.
Ch. 9.7 Mutations Every once in a while, cells make mistakes in copying their own DNA An incorrect base can be inserted or sometimes a base is skipped.
A change in the nucleotide sequence of DNA Ultimate source of genetic diversity Gene vs. Chromosome.
Bioinformatics Overview
Introduction to sequence alignment Mike Hallett (David Walsh)
Variation among organisms
From DNA to RNA.
Mutations Mutations: changes in the genetic code that can lead to changes in the amino acid sequence and ultimately to the overall shape of the protein.
Mutations Chapter 12-4.
Types of Mutations.
Aim: How is protein shape determined?
MUTATIONS And their effect.
MUTATIONS.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Intro to Alignment Algorithms: Global and Local
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Decode the following message.
Ch 12-4 Genetic Mutations.
MUTATIONS.
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
MUTATIONS.
MULTIPLE SEQUENCE ALIGNMENT
Academic Biology Notes
Sequence Analysis Alan Christoffels
Mutations: Changes in Genes
Presentation transcript:

Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center

Summer Bioinformatics Workshop Sequence Alignments  Cornerstone of bioinformatics  What is a sequence?  Nucleotide sequence  Amino acid sequence  Pairwise and multiple sequence alignments  What alignments can help  Determine function of a newly discovered gene sequence  Determine evolutionary relationships among genes, proteins, and species  Predict structure and function of protein

Summer Bioinformatics Workshop Why Align Sequences?  The draft human genome is available  Automated gene finding is possible  Gene: AGTACGTATCGTATAGCGTAA  What does it do?  One approach: Is there a similar gene in another species?  Align sequences with known genes  Find the gene with the “best” match

Summer Bioinformatics Workshop Visualization of Sequence Alignment  Dot Plot  One of the simplest and oldest methods for sequence alignment  Visualization of regions of similarity  Assign one sequence on the horizontal axis  Assign the other on the vertical axis  Place dots on the space of matches  Diagonal lines means adjacent regions of identity

Summer Bioinformatics Workshop A Simple Example  Construct a simple dot plot for TAGTCGATG TGGTCATC  The alignment is TAGTCGATG TGGTC-ATC TAGTCGATG T*** G*** G*** T*** C* A** T*** C*

Summer Bioinformatics Workshop Genes Accumulate Mutations over Time  Mistakes in gene replication or repair  Deletions, duplications  Insertions, inversions  Translocations  Point mutations  Environmental factors  Radiation  Oxidation

Summer Bioinformatics Workshop  Codon deletion: ACG ATA GCG TAT GTA TAG CCG…  Effect depends on the protein, position, etc.  Almost always deleterious  Sometimes lethal  Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?…  Almost always lethal Deletions

Summer Bioinformatics Workshop Indels  Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known: ACGTCTGATACGCCGTATCGTCTATCT ACGTCTGAT---CCGTATCGTCTATCT

Summer Bioinformatics Workshop The Genetic Code Substitutions Substitutions are mutations accepted by natural selection. Synonymous: CGC  CGA Non-synonymous: GAU  GAA

Summer Bioinformatics Workshop Point Mutation Example: Sickle-cell Disease  Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin [Glu]  Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ Mutant hemoglobin [Val]------

Summer Bioinformatics Workshop image credit: U.S. Department of Energy Human Genome Program,

Summer Bioinformatics Workshop Comparing Two Sequences  Point mutations, easy: ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT  Indels are difficult, must align sequences: ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT

Summer Bioinformatics Workshop Scoring a Sequence Alignment  Example  Match score:+1  Mismatch score:+0  Gap penalty:–1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT  Matches: 18 × (+1)  Mismatches: 2 × 0  Gaps: 7 × (– 1)  Various scoring scheme exist. Score = (-7) = +11

Summer Bioinformatics Workshop How can we find an optimal alignment?  Finding the alignment is computationally hard: ACGTCTGATACGCCGTATAGTCTATCT CTGAT---TCG-CATCGTC--T-ATCT  There are ~888,000 possibilities to align the two sequences given above.  Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.

Summer Bioinformatics Workshop Global and Local Alignments  Global alignments – score the entire alignment  Local alignment – find the best matching subsequence  Why local sequence alignment?  Global alignment is useful only if the sequences to be aligned are very similar  Subsequence comparison between a DNA sequence and a genome  Identify  Conserved regions  Protein function domains

Summer Bioinformatics Workshop Example  Compare the two sequences: TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG  Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG  Local alignment (does it look good?) TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG

Summer Bioinformatics Workshop Where do we get sequences to work with?  Biological databases  NCBI Entrez ( i?term=) i?term  Wet labs  Simulations  Other people’s results  On-line education resources  BEDROCK (  BLAST results