Download presentation

Presentation is loading. Please wait.

1
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey

2
What is Bioinformatics? Algorithms to analyze DNA, RNA, or protein sequences Database searches to find homologous sequences Construction of evolutionary trees Structure prediction Human Genome Project

3
Why use Bioinformatics in an Algorithms Course? Real-life applications of algorithms Variety of string processing algorithms Use of similarity instead of exact matching Dynamic programming examples Theory vs. Practice Issues

4
Models for Incorporating Bioinformatics Infusion – include material from bioinformatics in computer science courses Paired Courses – have joint lectures and projects from, e.g., Algorithms and Genetics courses Tracked Courses – have a separate Algorithms for Bioinformatics course

5
Biology Basics Primary DNA structure – Oriented character string Double strand constructed through base pairing Central Dogma – Information passes in one direction, from DNA to RNA to protein Amino acids formed from triples of bases, called codons

6
Bonding along a strand

7
Bonding between strands

8
Complexity of DNA Problems 3 billion base pairs in human genome Many NP complete problems 10 600 possible alignments for two 1000 character sequences

9
Sequence Alignment Determine the alignment of two sequences that maximizes similarity (global alignment) Determine substrings of two sequences with maximum similarity (local alignment) Determine the alignment for several sequences that maximizes the sum of pairs similarity (multiple alignment)

10
Edit Operations AATAAGC ATTAAGC AAT-AAGC AATTAAGC AATAAGC AA-AAGC SubstitutionInsertionDeletion

11
Dynamic Programming Alignment Algorithm (Needleman-Wunsch) Match a i+1 with b j+1 Match a i+1 with a space — Match b j+1 with a space — If a 1,a 2,…,a i and b 1,b 2,…,b j have been aligned, there are three possible next moves: Choose the move that maximizes the similarity of the two sequences

12
Alignment Scoring System +1 for a character match -1 for a mismatch (substitution) -2 for using a space (indel) or a + b·k for a gap of k spaces (affine gap penalty)

13
Global Alignment Matrix —GGACA —0-2-4-6-8-10 G-21-3-5-7 G-420-2-4 G-6-301-3 C-8-5-220 A-10-7-403 T-12-9-6-3-21

14
Optimal Alignment GGGCAT GGACA—

15
Other Bioinformatics Algorithms Palindromes Tandem Repeats Longest Common Subsequence Double Digest (NP complete) Shortest Common Superstring (NP complete)

16
References Clote and Backofen, Computational Molecular Biology, Wiley Gusfield, Algorithms on Strings, Trees, and Sequences, Cambridge University Press Mount, Bioinformatics, Cold Spring Harbor Press Setubal and Meidanis, Introduction to Computational Molecular Biology, PWS Waterman, Introduction to Computational Biology, CRC Press

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google