Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.

Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins using concept of conservative substitutions Identity Identity percentage percentage  Homology-specific term indicating relationship by evolution

Basic terms:  Orthologs: homologous sequences found in two or more species, that have the same function (i.e. alpha- hemoglobin).

Basic terms:  Orthologs: homologous sequences found it two or more species, that have the same function (i.e. alpha- hemoglobin).  Paralogs: homologous sequences found in the same species that arose by gene duplication. ( alpha and beta hemoglobin).

Pairwise comparison  Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position.

Pairwise comparison  Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position. Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity.

Pairwise comparison  Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position. Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity. Typically only one direction makes biological sense.Typically only one direction makes biological sense.

Pairwise comparison  Dotplot All against all comparison. All against all comparison. Every position is compared with every other position.Every position is compared with every other position. Nucleic acids and proteins have polarity.Nucleic acids and proteins have polarity. Typically only one direction makes biological sense.Typically only one direction makes biological sense. 5’ to 3’ or amino terminus to carboxyl terminus. 5’ to 3’ or amino terminus to carboxyl terminus.

Simple plot  Window: size of sequence block used for comparison. In previous example: window = 1 window = 1  Stringency = Number of matches required to score positive. In previous example: stringency = 1 (required exact match) stringency = 1 (required exact match)

GATCGTACCATGGAATCGTCCAGATCA GATC + (4/4) GATC - (0/4) + (2/4) WINDOW = 4; STRINGENCY = 2 DotPlot

 Compare two sequences in every register.  Vary size of window and stringency depending upon sequences being compared.  For nucleotide sequences typically start with window = 21; stringency = 14  Protein - start with smaller window : 3, stringency 1 or 2.  Important to test different stringencies.

Intergenic comparison  Nucleotide sequence contains three domains.  50 - 350 - Strong conservation Indel places comparison out of registerIndel places comparison out of register  450 - 1300 - Slightly weaker conservation  1300 - 2400 - Strong conservation

Scoring Alignments Scoring Alignments  Quality Score: Score x for match, -y for mismatch; Score x for match, -y for mismatch;

Scoring Alignments  Quality Score: Score x for match, -y for mismatch; Score x for match, -y for mismatch; Penalty for:Penalty for: Creating Gap Creating Gap Extending a gap Extending a gap

Scoring Alignments  Quality Score:  Quality = [10(match)]

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)]

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps)

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model--

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences Gaps are more likely to disrupt function, hence greater penalty than mismatch.

Scoring Alignments  Quality Score:  Quality = [10(match)] + [-1(mismatch)] - [(Gap Creation Penalty)(#of Gaps) +(Gap Ext. Pen.)(Total length of Gaps)] Scoring scheme incorporates an evolutionary model-- Matches are conserved Mismatches are divergences Gaps are more likely to disrupt function, hence greater penalty than mismatch. Introduction of a gap (indel) penalized more than extension of a gap.

Z Score (standardized score)  Z = (Score alignment - Average Score random ) Standard Deviation random

Quality Score:Randomization Program takes sequence and randomizes it X times (user select). Determines average quality score and standard deviation with randomized sequences Compare randomized scores with Quality score to help determine if alignment is potentially significant.

Randomization  It has become clear that Sequences appear to evolve in a “word” like fashion. Sequences appear to evolve in a “word” like fashion. 26 letters of the alphabet--combined to make words.26 letters of the alphabet--combined to make words. Words actually communicate information.Words actually communicate information. Randomization should actually occur at the level of strings of nucleotides (2-4). Randomization should actually occur at the level of strings of nucleotides (2-4).

Global Alignment  Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps.

Global Alignment  Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps.  Alignment will “run” from one end of the longest sequence, to the other end.

Global Alignment  Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps.  Alignment will “run” from one end of the longest sequence, to the other end.  Best for closely related sequences.

Global Alignment  Global - Compares all possible alignments of two sequences and presents the one with the greatest number of matches and the fewest gaps.  Alignment will “run” from one end of the longest sequence, to the other end.  Best for closely related sequences.  Can miss short regions of strongly conserved sequence.

Local Alignment  Identifies segments of alignment with the highest possible score.

Local Alignment  Identifies segments of alignment with the highest possible score.  Align sequences, extends aligned regions in both directions until score falls to zero.

Local Alignment  Identifies segments of alignment with the highest possible score.  Align sequences, extends aligned regions in both directions until score falls to zero.  Best for comparing sequences whose relationship is unknown.

Global Alignment: Local Alignment:

Blast 2 Basic Local Alignment Search Tool E (expect) value E (expect) value: number of hits expected by random chance in a database of same size. Larger numerical value = lower significance HIV sequence

 Both Global and Local alignment programs will (almost) always give a match.

 It is important to determine if the match is biologically relevant.

 Both Global and Local alignment programs will (almost) always give a match.  It is important to determine if the match is biologically relevant.  Not necessarily relevant: Low complexity regions. Sequence repeats (glutamine runs) Sequence repeats (glutamine runs)

 Both Global and Local alignment programs will (almost) always give a match.  It is important to determine if the match is biologically relevant.  Not necessarily relevant: Low complexity regions. Sequence repeats (glutamine runs) Sequence repeats (glutamine runs) Transmembrane regions (high in hydrophobes) Transmembrane regions (high in hydrophobes)

 Both Global and Local alignment programs will (almost) always give a match.  It is important to determine if the match is biologically relevant.  Not necessarily relevant: Low complexity regions. Sequence repeats (glutamine runs) Sequence repeats (glutamine runs) Transmembrane regions (high in hydrophobes) Transmembrane regions (high in hydrophobes)  If working with coding regions, you are typically better off comparing protein sequences. Greater information content.

Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.

Similar presentations

Presentation on theme: "Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.

Similar presentations

Presentation on theme: "Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins."— Presentation transcript:

Similar presentations

About project

Feedback