Download presentation
Presentation is loading. Please wait.
Published byCandace Clarke Modified over 8 years ago
1
Basic terms: Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins using concept of conservative substitutions Identity Identity percentage percentage Homology-specific term indicating relationship by evolution
2
Basic terms: Orthologs: homologous sequences found in two or more species, that have the same function (i.e. alpha- hemoglobin).
3
Basic terms: Orthologs: homologous sequences found it two or more species, that have the same function (i.e. alpha- hemoglobin). Paralogs: homologous sequences found in the same species that arose by gene duplication. ( alpha and beta hemoglobin).
4
Pairwise comparison Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position.
5
Pairwise comparison Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position. Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity.
6
Pairwise comparison Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position. Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity. Typically only one direction makes biological sense.Typically only one direction makes biological sense.
7
Pairwise comparison Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position. Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity. Typically only one direction makes biological sense.Typically only one direction makes biological sense. 5’ to 3’ or amino terminus to carboxyl terminus. 5’ to 3’ or amino terminus to carboxyl terminus.
8
Simple plot Window: size of sequence block used for comparison. In previous example: window = 1 window = 1 Stringency = Number of matches required to score positive. In previous example: stringency = 1 (required exact match) stringency = 1 (required exact match)
9
GATCGTACCATGGAATCGTCCAGATCA GATC + (4/4) GATC - (0/4) + (2/4) WINDOW = 4; STRINGENCY = 2 DotPlot
10
Compare two sequences in every register. Vary size of window and stringency depending upon sequences being compared. For nucleotide sequences typically start with window = 21; stringency = 14 Protein - start with smaller window : 3, stringency 1 or 2. Important to test different stringencies.
11
Intergenic comparison Nucleotide sequence contains three domains. 50 - 350 - Strong conservation Indel places comparison out of registerIndel places comparison out of register 450 - 1300 - Slightly weaker conservation 1300 - 2400 - Strong conservation
12
Scoring Alignments Scoring Alignments Quality Score: Score x for match, -y for mismatch; Score x for match, -y for mismatch;
13
Scoring Alignments Quality Score: Score x for match, -y for mismatch; Score x for match, -y for mismatch; Penalty for:Penalty for: Creating Gap Creating Gap Extending a gap Extending a gap
14
Scoring Alignments Quality Score: Quality = [10(match)]
15
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)]
16
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps)
17
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model--
18
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved
19
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences
20
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences Gaps are more likely to disrupt function, hence greater penalty than mismatch.
21
Scoring Alignments Quality Score: Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences Gaps are more likely to disrupt function, hence greater penalty than mismatch. Introduction of a gap (indel) penalized more than extension of a gap.
22
Z Score (standardized score) Z = (Score alignment - Average Score random ) Standard Deviation random
23
Quality Score:Randomization Program takes sequence and randomizes it X times (user select). Determines average quality score and standard deviation with randomized sequences Compare randomized scores with Quality score to help determine if alignment is potentially significant.
24
Randomization It has become clear that Sequences appear to evolve in a “word” like fashion. Sequences appear to evolve in a “word” like fashion. 26 letters of the alphabet--combined to make words.26 letters of the alphabet--combined to make words. Words actually communicate information.Words actually communicate information. Randomization should actually occur at the level of strings of nucleotides (2-4). Randomization should actually occur at the level of strings of nucleotides (2-4).
25
Global Alignment Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps.
26
Global Alignment Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps. Alignment will “run” from one end of the longest sequence, to the other end.
27
Global Alignment Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps. Alignment will “run” from one end of the longest sequence, to the other end. Best for closely related sequences.
28
Global Alignment Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps. Alignment will “run” from one end of the longest sequence, to the other end. Best for closely related sequences. Can miss short regions of strongly conserved sequence.
29
Local Alignment Identifies segments of alignment with the highest possible score.
30
Local Alignment Identifies segments of alignment with the highest possible score. Align sequences, extends aligned regions in both directions until score falls to zero.
31
Local Alignment Identifies segments of alignment with the highest possible score. Align sequences, extends aligned regions in both directions until score falls to zero. Best for comparing sequences whose relationship is unknown.
32
Global Alignment: Local Alignment:
33
Blast 2 Basic Local Alignment Search Tool E (expect) value E (expect) value: number of hits expected by random chance in a database of same size. Larger numerical value = lower significance HIV sequence
34
Both Global and Local alignment programs will (almost) always give a match.
35
It is important to determine if the match is biologically relevant.
36
Both Global and Local alignment programs will (almost) always give a match. It is important to determine if the match is biologically relevant. Not necessarily relevant: Low complexity regions. Sequence repeats (glutamine runs) Sequence repeats (glutamine runs)
37
Both Global and Local alignment programs will (almost) always give a match. It is important to determine if the match is biologically relevant. Not necessarily relevant: Low complexity regions. Sequence repeats (glutamine runs) Sequence repeats (glutamine runs) Transmembrane regions (high in hydrophobes) Transmembrane regions (high in hydrophobes)
38
Both Global and Local alignment programs will (almost) always give a match. It is important to determine if the match is biologically relevant. Not necessarily relevant: Low complexity regions. Sequence repeats (glutamine runs) Sequence repeats (glutamine runs) Transmembrane regions (high in hydrophobes) Transmembrane regions (high in hydrophobes) If working with coding regions, you are typically better off comparing protein sequences. Greater information content.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.