# Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.

## Presentation on theme: "Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University."— Presentation transcript:

Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu

Summer Bioinformatics Workshop 2008 2 Sequence Alignments  Cornerstone of bioinformatics  What is a sequence?  Nucleotide sequence  Amino acid sequence  Pairwise and multiple sequence alignments  What alignments can help  Determine function of a newly discovered gene sequence  Determine evolutionary relationships among genes, proteins, and species  Predict structure and function of protein

Summer Bioinformatics Workshop 2008 3 Why Align Sequences?  The draft human genome is available  Automated gene finding is possible  Gene: AGTACGTATCGTATAGCGTAA  What does it do?  One approach: Is there a similar gene in another species?  Align sequences with known genes  Find the gene with the “best” match

Summer Bioinformatics Workshop 2008 4 Visualization of Sequence Alignment  Dot Plot  One of the simplest and oldest methods for sequence alignment  Visualization of regions of similarity  Assign one sequence on the horizontal axis  Assign the other on the vertical axis  Place dots on the space of matches  Diagonal lines means adjacent regions of identity

Summer Bioinformatics Workshop 2008 5 A Simple Example  Construct a simple dot plot for TAGTCGATG TGGTCATC  The alignment is TAGTCGATG TGGTC-ATC TAGTCGATG T*** G*** G*** T*** C* A** T*** C*

Summer Bioinformatics Workshop 2008 6 Genes Accumulate Mutations over Time  Mistakes in gene replication or repair  Deletions, duplications  Insertions, inversions  Translocations  Point mutations  Environmental factors  Radiation  Oxidation

Summer Bioinformatics Workshop 2008 7  Codon deletion: ACG ATA GCG TAT GTA TAG CCG…  Effect depends on the protein, position, etc.  Almost always deleterious  Sometimes lethal  Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?…  Almost always lethal Deletions

Summer Bioinformatics Workshop 2008 8 Indels  Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known: ACGTCTGATACGCCGTATCGTCTATCT ACGTCTGAT---CCGTATCGTCTATCT

Summer Bioinformatics Workshop 2008 9 The Genetic Code Substitutions Substitutions are mutations accepted by natural selection. Synonymous: CGC  CGA Non-synonymous: GAU  GAA

Summer Bioinformatics Workshop 2008 10 Point Mutation Example: Sickle-cell Disease  Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin ------[Glu]------  Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ Mutant hemoglobin ------[Val]------

Summer Bioinformatics Workshop 2008 11 image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.http://www.ornl.gov/hgmis

Summer Bioinformatics Workshop 2008 12 Comparing Two Sequences  Point mutations, easy: ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT  Indels are difficult, must align sequences: ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT

Summer Bioinformatics Workshop 2008 13 Scoring a Sequence Alignment  Example  Match score:+1  Mismatch score:+0  Gap penalty:–1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT  Matches: 18 × (+1)  Mismatches: 2 × 0  Gaps: 7 × (– 1)  Various scoring scheme exist. Score = 18 + 0 + (-7) = +11

Summer Bioinformatics Workshop 2008 14 How can we find an optimal alignment?  Finding the alignment is computationally hard: ACGTCTGATACGCCGTATAGTCTATCT CTGAT---TCG-CATCGTC--T-ATCT  There are ~888,000 possibilities to align the two sequences given above.  Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.

Summer Bioinformatics Workshop 2008 15 Global and Local Alignments  Global alignments – score the entire alignment  Local alignment – find the best matching subsequence  Why local sequence alignment?  Global alignment is useful only if the sequences to be aligned are very similar  Subsequence comparison between a DNA sequence and a genome  Identify  Conserved regions  Protein function domains

Summer Bioinformatics Workshop 2008 16 Example  Compare the two sequences: TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG  Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG  Local alignment (does it look good?) ---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------

Summer Bioinformatics Workshop 2008 17 Where do we get sequences to work with?  Biological databases  NCBI Entrez (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcg i?term=)http://www.ncbi.nlm.nih.gov/gquery/gquery.fcg i?term  Wet labs  Simulations  Other people’s results  On-line education resources  BEDROCK (http://www.bioquest.org/bedrock/)http://www.bioquest.org/bedrock/  BLAST results

Download ppt "Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University."

Similar presentations