Presentation is loading. Please wait.

Presentation is loading. Please wait.

Projects….

Similar presentations


Presentation on theme: "Projects…."— Presentation transcript:

1 Projects…

2 FASTA Lookup Tables ACNGTSCHQE C S Q GCHCLSAGQD ACNGTSCHQE G C
sequence 1: ACNGTSCHQE sequence 2: GCHCLSAGQD ACNGTSCHQE C S Q GCHCLSAGQD ACNGTSCHQE CH GCHCLSAGQD ACNGTSCHQE G C GCHCLSAGQD

3 SSEARCH Smith-Waterman local alignment pairwise on entire database
Extremely slow Best for identifying weak, distant relationships Review of Scoring

4 Scoring Normal Scores collected from SW matches against a database of sequences are the BEST scores for each pair, not random Thus, distribution is not normal, but skewed positively. For database searches, we can use the actual scores of all pairwise comparisions in DB as the set of scores. Knowing the distribution allows us to compute P(Score≥x) Gumbel Extreme Value Distribution has 2 parameters m(center) and l (scaling) Extreme Value

5 Scoring, cont. Parameter Estimation [m(center) and l (scaling)]
Estimate from moments [m = x s and l = s] Maximum likelihood estimation [SSEARCH, FASTA] scores between random sequences increase with sequence length. For each seq. near length L, plot SW-score vs. log(avg.LENGTH) Fit scores by linear regression High scores and low outliers are trimmed from regression fit. “normalize”: subtract predicted value from real value Compute z-score: how many standard deviations away is normalized score Z-scores have known extreme value distribution parameters.

6 Profile/Scoring Matrixes
So far, query is single sequence Compare: query as regular expression or other generalized pattern Example: Position-Specific Scoring Matrix (PSSM) WHY? Motifs Multiple sequence alignments

7 PSSM A M P G V A M P G V A M P G V A 4 . . . A 4 . . . A 4 . . .
C G M P V - A C G M P V - A C G M P V - =5 =5 0+0+0=0


Download ppt "Projects…."

Similar presentations


Ads by Google