Presentation is loading. Please wait.

Presentation is loading. Please wait.

Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment.

Similar presentations


Presentation on theme: "Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment."— Presentation transcript:

1 Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment score, E-value. 2) Near each result provide a link that redirects to Pairwise Alignment (from the previous exercise). The page for Pairwise Alignment should be pre-filled with the two sequences (first - the original sequence, second – the selected sequence from the Blast run). * You should also submit data flow diagram with BioPerl class names.

2 Home Work (continued) Doc: bioperl tutorial section III.4.1 Running BLAST remotely (using RemoteBlast.pm) Use sleep function GenBank Seq string get_Seq_by_acc('AF303112'); $seq1->seq(); Data-Flow diagram example for retrieving sequence: $gb = new Bio::DB::GenBank(); $seq = $gb->get_Seq_by_acc('AF303112'); print $seq1->seq();

3 Home Work (continued) II. Translate PROSITE pattern into Perl regular expression.

4 Profile Analysis M. Gribskov, D. Eisenberg. Profile Analysis - detection of distantly related proteins by sequence comparison. The information is expressed in a position- specific scoring table (profile).

5 Profiles Seq1-> Seq3-> Seq4-> Seq2->

6

7 Profile calculation The position-specific gap coefficients penalize gaps in conserved regions more heavily than gaps in more variable regions

8 Profile calculation The position-specific gap coefficients penalize gaps in conserved regions more heavily than gaps in more variable regions p(x,j)/p(x) [or log p(x,j)/p(x)] p(x,j) – frequency that character x appears in row (according to previous slide) i p(x) – frequency that character x appears anywhere in all sequences from mult.align.

9

10 Profile alignment Sequence – Profile Alignment. Profile – Profile Alignment. Dynamic Programming. (the same idea as in Pairwise Sequence Alignment)

11 reminder: Pairwise Sequence Alignment Sequence-Profile alignment: S(x,j) – aligning ‘x’ with column ‘j’ S(x,j)= Σ y σ(x,y) p(y,j)/p(y) σ(x,y) – any regular score for Pairwise Alignment (PAM-k, BLOSUM-k …) p(y,j) – frequency that character y appears in mult. align. column ‘j’ p(y) – frequency that character y appears anywhere in all sequences from mult.align. The position-specific gap coefficients penalize gaps in conserved regions more heavily than gaps in more variable regions

12 Profiles in GCG PileUpPileUp creates a multiple sequence alignment from a group of related sequences. ProfileMakeProfileMake makes a profile from a multiple sequence alignment. ProfileSearchProfileSearch uses the profile to search a database for sequences with similarity to the group of aligned sequences. ProfileSegmentsProfileSegments displays optimal alignments between each sequence in the ProfileSearch output list and the group of aligned sequences (represented by the profile consensus). ProfileGapProfileGap makes optimal alignments between one or more sequences and a group of aligned sequences represented as a profile. ProfileScanProfileScan uses a database of profiles to find structural and sequence motifs in protein sequences.

13

14 Progressive Alignment Feng-Doolittle 1987 Implemented in PileUp (GCG package) 1. Calculate the pairwise alignment scores, and convert them to distances. 2. Use an incremental clustering algorithm to construct a tree from the distances. 3. Traverse the nodes in their order of addition to the tree, progressively aligning the sequences. This way, the most similar pair is aligned first, followed by the addition of the next most similar sequence or set of sequences.

15 Iterative profile pairwise alignment 1. Align some pair. 2. While (not done) (a)Pick an unaligned string which is ”near” some aligned one(s). (b)Align with the profile of the previously aligned group. Resulting new spaces are inserted into all strings in the group.

16 Progressive Profile Alignment ClustalW (algorithm of Thompson, Higgins, Gibson 1994) (the idea is close to Feng-Doolittle 1987, implemented in PileUp, GCG package) 1. Calculate the pairwise alignment scores, and convert them to distances. 2. Use a neighbor-joining algorithm to build a tree from the distances. 3. Align sequence - sequence, sequence - profile, profile - profile in decreasing similarity order.

17

18 Alignment tree built by ClustalW


Download ppt "Home Work I. Running Blast with BioPerl Input: 1) Sequence or Acc.Num. 2) Threshold (E value cutoff) Output: 1) Blast results – sequence names, alignment."

Similar presentations


Ads by Google