Presentation is loading. Please wait.

Presentation is loading. Please wait.

Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.

Similar presentations


Presentation on theme: "Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3."— Presentation transcript:

1 Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3

2 Searching for remote homologs  Sometimes BLAST isn’t enough  Large protein family, and BLAST only finds close members. We want more distant members

3 PSI-BLAST  Position Specific Iterative BLAST Regular blast Construct profile from blast results Blast profile search Final results

4 Consensus, Pattern, PSSM GTTCTA GTTCAA CTTCAA 1 2 3 4 5 6 Seq1 Seq2 Seq3 Consensus: the most frequent character in the column is chosen A A C T T G A-[TA]-C-T-T-[GC] Pattern: represents the alignment as a regular expression Pos Nuc6543210000.671A.3300100C.6700000G 0110.330T Profile = PSSM: Position Specific Score Matrix

5 S(AACCAA)=1*0.67*1*1*.25*.33 S(GACCAA)=0 Sequences with higher scores -> higher chance of being related to the PSSM Pos Nuc 654321.33.2500.671A 0.2511.330C.33.250000G.33.250000T

6 PSI-BLAST  Position Specific Iterative BLAST Regular blast Construct profile from blast results Blast profile search Final results

7 BLAST – PSI-Blast

8 PSI-Blast - results

9 PSI-BLAST  Advantage: PSI-BLAST looks for seq’s that are close to the query, and learns from them to extend the circle of friends  Disadvantage: if we obtained a WRONG hit, we will get to unrelated sequences (contamination). This gets worse and worse each iteration

10 PSI-BLAST Which of the following is/are correct? 1. PSI-BLAST is expected to give more hits than BLAST 2. PSI-BLAST is an iterative search method 3. PSI-BLAST is faster than BLAST 4. Each iteration of PSI-BLAST can only improve the results of the previous iteration

11 Turning information into knowledge Turning information into knowledge  The outcome of a sequencing project are masses of raw data  The challenge is to turn these raw data into biological knowledge  A valuable tool for this challenge is an automated diagnostic pipe through which newly determined sequences can be streamlined

12 From sequence to function  Nature tends to innovate rather than invent  Proteins are composed of functional elements: domains and motifs Domains are structural Domains are structural units that carry out a certain function. They are shared between different proteins Motifs are shorter Motifs are shorter and are usually critical for the biological activity

13 http://www.expasy.ch/prosite

14 Prosite  From analyzing conserved regions in protein sequences it is possible to derive signatures of motifs and domains  Prosite consists of annotated sites/motifs/signatures/fingerprints  Given an uncharacterized translated protein sequence, prosite tries to predict which motifs and domains make up the protein and thus identify the family to which it belongs

15 Prosite Prosite represents entries with patterns or profiles A A C T T C A T C T T G A A C T T G profile A-[TA]-C-T-T-[GC]  Profiles are used in prosite when the motif is relatively divergent, and is difficult to represent as a pattern  Profiles also characterize domains over their entire length, not just the motif pattern 654321 00000.671A 01100.330T 0.3300100C 0.6700000G

16 Prosite sequence query

17

18 Patterns with a high probability of occurrence  Entries describing commonly found post- translational modifications or compositionally biased regions  Found in the majority of known protein sequences  High probability of occurrence  Prosite filters them by default

19 Scanning Prosite Query: sequence Query: pattern Result: all patterns found in the sequence Result: all sequences which adhere to this pattern

20 Prosite pattern query

21

22

23 UCSC Genome Browser

24 UCSC Genome Browser - Gateway Reset all settings of previous uses

25 UCSC Genome Browser - Gateway

26 Results

27 Annotation tracks Mammal conservation mRNAs (GenBank) RefSeq Genes Base position Species alignment SNPs Repeats Gene Direction Coding Intron UTR UCSC Genes

28 UCSC Gene

29 UCSC Genome Browser - movement Zoom x3 + Center

30 Controlling annotation tracks

31 Malaria distr. Sickle-cell anemia distr.

32 BLAT  BLAT = Blast-Like Alignment Tool  BLAT is designed to find similarity of >95% on DNA, >80% for protein  Rapid search by indexing entire genome Good for: 1. Finding genomic coordinates of cDNA 2. Determining exons/introns 3. Finding human (or chimp, dog, cow…) homologs of another vertebrate sequence

33 BLAT on UCSC Genome Browser

34 BLAT search

35 BLAT Results

36 Match Non-Match (mismatch/indel) Indel boundaries query hit


Download ppt "Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3."

Similar presentations


Ads by Google