Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright  1996, 1999-2009. All rights reserved.

Similar presentations


Presentation on theme: "Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright  1996, 1999-2009. All rights reserved."— Presentation transcript:

1 Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright  1996, 1999-2009. All rights reserved.

2 Slides from Chapter 4 Ch04_Motifs_mod.ppt Ch04_Motifs_mod.ppt Ch04_Motifs_mod.ppt

3 Describing features using frequency matrices Goal: Describe a sequence feature (or motif) more quantitatively than possible using consensus sequences Goal: Describe a sequence feature (or motif) more quantitatively than possible using consensus sequences Need to describe how often particular bases are found in particular positions in a sequence feature Need to describe how often particular bases are found in particular positions in a sequence feature

4 Describing features using frequency matrices Definition: For a feature of length m using an alphabet of n characters, a frequency matrix is an n by m matrix in which each element contains the frequency at which a given member of the alphabet is observed at a given position in an aligned set of sequences containing the feature Definition: For a feature of length m using an alphabet of n characters, a frequency matrix is an n by m matrix in which each element contains the frequency at which a given member of the alphabet is observed at a given position in an aligned set of sequences containing the feature

5 Frequency matrices (continued) Three uses of frequency matrices Three uses of frequency matrices  Describe a sequence feature  Calculate probability of occurrence of feature in a random sequence  Calculate degree of match between a new sequence and a feature

6 Matlab Demonstration % read some aligned sequences provided with the bioinformatics toolbox seqs = fastaread('pf00002.fa'); seqdisp(seqs); startposition=4; endposition=13; [P,S] = seqprofile(seqs,'limits',[startposition endposition]); disp([' ' sprintf('%2d ',[1:size(P,2)])]); for i=1:length(S) disp([S(i) ' ' sprintf('%4.3f ',P(i,:))]) disp([S(i) ' ' sprintf('%4.3f ',P(i,:))])endseqlogo(seqs,'startat',startposition,'endat',endposition,'alphabet','aa’);

7 Frequency matrix

8 Logo Example

9 Logos for displaying sequence motifs http://www.ccrnp.ncifcrf.gov/~toms/sequencelogo.html Free logo maker at http://weblogo.berkeley.edu/ Free logo maker at http://weblogo.berkeley.edu/http://weblogo.berkeley.edu/

10 Frequency Matrices, PSSMs, and Profiles A frequency matrix can be converted to a Position-Specific Scoring Matrix (PSSM) by converting frequencies to scores A frequency matrix can be converted to a Position-Specific Scoring Matrix (PSSM) by converting frequencies to scores PSSMs also called Position Weight Matrixes (PWMs) or Profiles PSSMs also called Position Weight Matrixes (PWMs) or Profiles

11 Methods for converting frequency matrices to PSSMs Using log ratio of observed to expected Using log ratio of observed to expected  where m(j,i) is the frequency of character j observed at position i and f(j) is the overall frequency of character j (usually in some large set of sequences) Using amino acid substitution matrix (Dayhoff similarity matrix) [see later] Using amino acid substitution matrix (Dayhoff similarity matrix) [see later]

12 Pseudo-counts How do we get a score for a position with zero counts for a particular character? Can’t take log(0). How do we get a score for a position with zero counts for a particular character? Can’t take log(0). Solution: add a small number to all positions with zero frequency Solution: add a small number to all positions with zero frequency

13 Finding occurrences of a sequence feature using a Profile As with finding occurrences of a consensus sequence, we consider all positions in the target sequence as candidate matches As with finding occurrences of a consensus sequence, we consider all positions in the target sequence as candidate matches For each position, we calculate a score by “looking up” the value corresponding to the base at that position For each position, we calculate a score by “looking up” the value corresponding to the base at that position

14 Block Diagram for Building a PSSM – Aligned Sequences PSSM builder Set of Aligned Sequence Features Expected frequencies of each sequence element PSSM

15 Block Diagram for Building a PSSM – Unaligned Sequences PSSM builder Set of unaligned sequences Expected frequencies of each sequence element PSSM Parameters for aligning (i.e., expected length)

16 Block Diagram for Searching with a PSSM PSSM search PSSM Set of Sequences to search Sequences that match above threshold Threshold Positions and scores of matches

17 Block Diagram for Searching for sequences related to a family with a PSSM PSSM search PSSM Set of Sequences to search Sequences that match above threshold Threshold Positions and scores of matches PSSM builder Set of Aligned Sequence Features Expected frequencies of each sequence element

18 Consensus sequences vs. PSSMs Should I use a consensus sequence or a frequency matrix to describe my site? Should I use a consensus sequence or a frequency matrix to describe my site?  If all allowed characters at a given position are equally "good", use IUB codes to create consensus sequence  Example: Restriction enzyme recognition sites  If some allowed characters are "better" than others, use PSSM  Example: Promoter sequences

19 Consensus sequences vs. frequency matrices Advantages of consensus sequences: smaller description, quicker comparison Advantages of consensus sequences: smaller description, quicker comparison Disadvantage: lose quantitative information on preferences at certain locations Disadvantage: lose quantitative information on preferences at certain locations

20 Reading for next class Jones/Pevzner Ch 6 through section 6.9 (p. 185) Jones/Pevzner Ch 6 through section 6.9 (p. 185) Read paper by Needleman and Wunsch on web site Read paper by Needleman and Wunsch on web site (recommended) Durbin et al, pp 17-32 (recommended) Durbin et al, pp 17-32


Download ppt "Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright  1996, 1999-2009. All rights reserved."

Similar presentations


Ads by Google