Download presentation
Presentation is loading. Please wait.
Published byMillicent Wright Modified over 9 years ago
1
C OMPUTATIONAL BIOLOGY
2
O UTLINE Proteins DNA RNA Genetics and evolution The Sequence Matching Problem RNA Sequence Matching Complexity of the Algorithms
3
DEFINITION Computational Biology encompasses all computational methods and theories applicable to molecular biology and areas of computer based techniques for solving biological problems.
4
PROTIENS Building blocks of living organism Large molecule that is composed of sequences of amino acids There are 20 amino acids which are divided into classes hydrophobic(h-phob) hydrophillic(h-phil) polar(pos,neg)
5
Amino acid SymClassAmino Acid SymClass AlanineAh-phobLeucineLh-phob ArginineRposLysineKpos Asparagi ne Nh-phillMetheioni ne Mh-phob Aspartic acid DnegPhenylala nine Fh-phob CysterineCh-phillProlinePh-phob Glutamin e Qh-phillSerineSh-phill Glutamic acid EnegThreonin e Th-phill GlycineGh-phobTryptoph an Wh-phob HistidineHposTyrosineYh-phill IsoleucineIh-probValineVh-prob
6
DNA Blueprint of living organisms DNA is composed of two strands hold by a weak hydrogen bond Each strand is a sequence of nucleotides DNA has four bases which are classified as two chemical types BaseSymbolType AdenineAPurine ThymineTPurine CytosineCPyrimidine GuanineGPyrimidine
7
DNA DOUBLE HELIX
8
RNA RNA is chemically very similar to DNA There are two important differences Four bases present in RNA are adenine(A) guanine(G) cystosine(C) uracil(U) RNA nucleotides contain a different sugar molecule(ribose)
9
G ENETICS AND EVOLTION Mutation Natural selection Genetic drift
10
S EQUENCE MATCHING PROBLEM Matching DNA,RNA, or Protein sequence between a diseased organism and a healthy organism Proteins are longer and DNA strands are even longer We match them by breaking them in to shorter subsequences Breaking and matching is done by notion of alignment.
11
S EQUENCE MATCHING EXAMPLE Consider two amino acid sequences: ACCTGAGAG ACGTGGCAG sequence alignment A C C T G A G – A C A C G T G – G C A C
12
F INITE STATE MACHINES IN BLAST It is used to find out which of the sequences in a database are related to the new given sequence using BLAST The BLAST system is a three step process 1. Examine the query string and select set of substrings of length w (between 4 and 20) which are good for producing matches 2. Build a DFSM that uses set of substrings and find the sequences with the highest local matches in the database 3. Examine the matches found in step2 and try to build a longer matching sequences
13
R EGULAR EXPRESSIONS SPECIFY PROTEIN MOTIF Aligning collection of related proteins we can define a motif Example: E S G H D T Y Y N K N R M D T T T T T S W Q S R G S D T T T P D M T A G P T T W R N T Once an motif is defined we can search for the occurrences of it in other protein sequence by using regular expressions
14
H MM FOR SEQUENCE MATCHING HMM’s are used when sequences become fairly diverse We can capture the variations among the members of the family and the probabilities associated with them So by using HMM’s we can find the best alignment between two sequences and from which family does a given new sequence belongs to
15
HMM profile is given by M = (K,O,π,A,B) K is a set of n states, one for each position in the sequence O is the output alphabet Π contains the initial state probabilities A contains the transition probabilities B contains the output probabilities
16
E XAMPLE OF HMM DESCRIBING PROTEIN SEQUENCE FAMILY
17
R NA SEQUENCE MATCHING AND SECONDARY STRUCTURE PREDICTION USING THE TOOLS OF CONTEXT - FREE LANGUAGES In RNA a change to a single nucleotide in a stem region could completely alter the molecules shape and its function So an change in the stem must be matched by a corresponding change in the paired nucleotide Context free languages are used describe these nested dependencies and secondary structure
18
EXAMPLE
19
C OMPLEXITY OF ALGORITHMS USED IN COMPUTATIONAL BIOLOGY Approaches to many of the problems described here are computational like breaking up of large protein and DNA molecules into substrings NP-hard Conversion to decision problem SHOERTEST-SUPERSTRING( : S is a set of strings and there exists some superstring T such that every element of S is a substring of T and T has length less than or equal to K) – NP-complete
20
REFERENCE Automata, computability, and complexity|Theory and Applications [book] by Elaine Rich. http://en.wikipedia.org/wiki/Computational_biology
21
Thank you
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.