Presentation is loading. Please wait.

Presentation is loading. Please wait.

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’

Similar presentations


Presentation on theme: "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’"— Presentation transcript:

1 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’ identifies subtype-specific functional sites

2 [2] [2][2] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Significance of Alignment Positions Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance What ‘deviates from expected’? unlikely occurrences What is unlikely? only (relatively) few possibilities to obtain observed result

3 [3] [3][3] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Pfam Ig Family Alignment

4 [4] [4][4] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Aquaporin: Motifs NPA : stabilizes loops B and E G(a)xxxG(a)xxG(a) : Crossing of right-hand helical bundles Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press

5 [5] [5][5] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Counting… Number of possibilities for finding some combination of aminoacids: which types? how much of each? Examples: WWW 3 W  only 1 way RHH 1 R, 2 H  three ways SHQ 1 S, 1 H, 1 Q  six ways

6 [6] [6][6] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Counting… (2) ‏ ‘Real’ examples: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 33 W  only 1 way RRRRRRRRRRRRRRRRHHHHHHHHHHHHHHHHH 16 R, 17 H  ? ways (~ 2 33  10 9 )‏ SSSSSHSSCCCCCCCCEEQQEEEEEEEEEQEEE 7 S, 1 H, 8 C, 14 E, 3 Q  ??? ways (~ 5 32  )‏ ‘many’ ways  but, we can calculate that!

7 [7] [7][7] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Shannon’s ‘Information Entropy’: ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, “ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ” He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

8 [8] [8][8] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Solution: Entropy the entropy of a set of probabilities p i measures information, choice and uncertainty zero only if only one p i is not zero there is only one choice maximal if all p i are equal most ‘uncertain’ situation: all options are possible

9 [9] [9][9] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Information Content Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination. …but it applies equally well to any type of ‘message’ We can use it to measure the level of conservation in columns in an alignment

10 [10] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Simple Example: Sequence Entropy LLLLLLA LLLLLAA LLLLAAA LLLAAAA LLAAAAA LAAAAAA p 1 = 0 p 2 = 0 p 1 = p 2 = ½ p 1 = f (‘L’) ‏ p 2 = f (‘A’) ‏

11 [11] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Analysis: Comparing Groups Many biological problems relate to questions like: “ Why do these proteins do this, and those proteins not? ” or “ Why do these patients get sick, and those not? ”  The answer can be related to similarities and differences between sequences Similarities (conservation) relate to functionally critical positions Differences can explain functional differences

12 [12] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Identification of Functional Sites Functional differences between Protein (sub-)families Current practice: use Multiple Sequence Alignment look for Conserved Sites within (sub-)families (ignore sites that are overall conserved)‏

13 [13] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Conservation and (functional) Differences: Conservation in Ras/Ral TOTAL: Rab5/ MIP SMAD NotOneBothKnownTest-set Sequence Entropy measures Conservation But Sites that are Different are not always Conserved:

14 [14] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Identification of Functional Sites (2) ‏ Functional differences between Protein (sub-)families Example Binders vs. Non-Binders: sites crucial for binding: conserved sites determining ‘non-binding’: not conserved  Take into account Non-Conserved Sites as well! comparing Amino-acid Compositions

15 [15] CENTRFORINTEGRATIVE BIOINFORMATICSVU E TGF-  signalling pathway T  R-IIT  R-I TGF-  AR-Smads division, differentiation, motility, adhesion, programmed cell death Nucleus activation/repression TGF-  target genes Smad-association p pp BMPR-IBMPR-II BR-Smads p Nucleus activation/repression BMP target genes BMP Smad-association pp specificity

16 [16] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2 Alignment & Functionally Specific Sites 27 known sites of functional specificity based mostly on site- specific mutants and characterized on BMPR-I vs. TBR-I binding affinity 10% 21% 3% %FP 59% 48% 76% %FN 31%12SDPpred 52%21TreeDet 21%6AMAS %TPPredictMethod

17 [17] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparing Groups of Sequences: Entropy Relative ‘Entropy’ rE A/B group A vs. B: using probabilities p of amino acid type x at position i  Degenerate for p B = 0, i.e. when A and B fully different!  Introduce Relative ‘Entropy’ rE A/AB A vs. all (‘AB’):  Not degenerate, but still unbound. Upper bound depends on relative size of groups

18 [18] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparing Groups: Sequence Harmony Weigh groups A and B equally: Take p A + p B in stead of p AB  Defined on the fixed interval of [0  1] one is complete overlap in composition: Harmony zero is no overlap in composition: No Harmony

19 [19] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2 Alignment & Sequence Harmony Walter Pirovano*, K. Anton Feenstra* and Jaap Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006). K. Anton Feenstra, Walter Pirovano and Jaap Heringa. “Sub-type Specific Sites for SMAD Receptor Binding Identified by Sequence Comparison using ‘Sequence Harmony’ ”. in: From Computational Biophysics to Systems Biology. pp Eds. U.H.E. Hansmann, J. Meinke, S. Mohanty and O. Zimmermann, Jülich, NIC Series, Vol. 34, Elena Marchiori*, Walter Pirovano, Jaap Heringa and K. Anton Feenstra*. “A Feature Selection Algorithm for Detecting Subtype Specific Sites for Smad Receptor Binding”, Bio-ICMLA06, accepted (2006).

20 [20] CENTRFORINTEGRATIVE BIOINFORMATICSVU E AR BR Smads: Comparing two Groups

21 [21] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Finding Low-harmony sites in Smad-MH  300 

22 [22] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Finding Low-harmony sites in Smad-MH2 21% 7% 28% 33% 79% 93% (SH=0) 32 (SH<0.2) 40 Sequence Harmony 59%10%31%12SDPpred 21% 3% %FP 48% 76% %FN 52%21TreeDet 21%6AMAS %TPPredictMethod

23 [23] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2: Low Harmony Patches

24 [24] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2: Functional Clusters R462 C463 Q400 R410 W368 Y366 A392 S269 F273 N443 Q294 Q309 L297 L440 N381 A354 V461 S460 Q407 Q364 P360 R365 T267 A272 I341 P295 S308 T298 R337 F346 P378 Q284 V325 A323 R427 M327 T430 R334 FAST1, Mixer, SARA c-Ski/SnoN SARA T  R-I/ALK1/2 T  R-I/BMPR-I ? SARA/ Mixer T  R-I/BMPR-I/ALK1/2 ? receptor-binding retention & transcription factors co-repressors

25 [25] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparison to Other Prediction Methods AMAS SDP-pred TreeDet Sequence Harmony 23 sites 8 sites

26 [26] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparison to Other Prediction Methods (2) ‏ AMAS cumulative AMAS SDP-pred TreeDet SH + Entropy (inc)‏ SH + Entropy (dec)‏ Sequence Harmony

27 [27] CENTRFORINTEGRATIVE BIOINFORMATICSVU E AMAS cumulative AMAS SDP-pred TreeDet SH + Ranges + E(inc)‏ SH + Ranges + E(dec)‏ SH + Entropy (inc)‏ SH + Entropy (dec)‏ Sequence Harmony 18 sites 2 Comparison to Other Prediction Methods (3) ‏

28 [28] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Ras family: Rab5 vs. Rab6

29 [29] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Ras Family: Ras vs. Ral

30 [30] CENTRFORINTEGRATIVE BIOINFORMATICSVU E MIP family: AQP vs. GLP

31 [31] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Conclusions Smad-MH2 Sequence Harmony 40 Sites of Low Sequence Harmony in Smad-MH2 different between the AR (TGF-  ) and BR (BMP) sub-type Smads Low Harmony sites in Smad-MH2 are functionally relevant Other methods do not select all known (functional) sites!  Sequence information maps to structure:  Next: Analyze Protein-Protein Interactions 14 Low Harmony Sites in Smad-MH2 of unknown function 11 putative functions from structural considerations promising candidates that determine TGF-  /BMP specificity confirm (or rebuke) putative functions?

32 [32] CENTRFORINTEGRATIVE BIOINFORMATICSVU E General Conclusions Lack of experimental data Adequate quality and quantity hard to attain Discriminating power of test-sets varies Conservation not best identifier for functional differences Selections too conservative and not very specific Differences, as measured by Sequence Harmony good alternative Selections include most known sites, but somewhat lower specificity

33 [33] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Brajenovic, M. et al. J. Biol. Chem. 2004;279: Connectivity map of human Par complexes based on TAP purifications and co- immunoprecipitation experiments Connectivity map of human Par complexes based on TAP purifications and co-immunoprecipitation experiments. The TAP-tagged proteins used as baits are represented as rhomboids. Lines connecting proteins indicate presence in a TAP complex or coimmunoprecipitation (dotted lines). The width of each line represents the degree of sequence coverage of the identification, which depends on the robustness of the interaction but also on the expression level and a number of other factors. Green boxes/lines represent previously known interactors/interactions; red boxes/lines represent novel interactors/interactions. Proteins that are found specifically with only one TAP-protein are grouped in boxes (S1–S6), whereas proteins that are consistently found together with more than one TAP-protein are grouped in modules (M1 and M2).

34 [34] CENTRFORINTEGRATIVE BIOINFORMATICSVU E

35 [35] CENTRFORINTEGRATIVE BIOINFORMATICSVU E

36 [36] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Charting protein complexes, signaling pathways, and networks in the immune system Bauch ABauch A, Superti-Furga G Source: IMMUNOLOGICAL REVIEWS 210: APR 2006Superti-Furga G

37 [37] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Copyright ©2006 by the National Academy of Sciences Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, Fig. 3. The selective nature of the primary interaction site Canonical interaction motifs: Mode I: R/K-X-X-S/T-X-P ModeII:R/K-X-X-X-S/T-X-P ModeIII: S-W-T-Y (C-term.)‏

38 [38] CENTRFORINTEGRATIVE BIOINFORMATICSVU E

39 [39] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Copyright ©2006 by the National Academy of Sciences Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, Fig. 5. Dynamic nature of the dimers Fig. 5. Dynamic nature of the dimers. (A) Crystal structure of the apo-isoform looking down the peptide binding grooves, which are labeled open and closed for the individual monomers. (B) Superimposition of all seven closed state isoforms using only one monomer as the reference, with shown in blue and in green. The other monomers, which have intermediate positions, are colored transparent gray.

40 [40] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Copyright ©2006 by the National Academy of Sciences Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, Fig. 1. Overview of the dimeric structure Fig. 1. Overview of the dimeric structure. Helices and loops involved in target domain interactions are labeled. Each monomer is colored blue to red from the N to C terminus. An aperture exists at the central dimeric interface, which is marked with a circle. Yang et al Structural basis for protein–protein interactions in the protein family PNAS 103, 17237

41 [41] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Copyright ©2006 by the National Academy of Sciences Yang, Xiaowen et al. (2006) Proc. Natl. Acad. Sci. USA 103, Fig. 2. Schematic representation of the heterodimerization process involving the epsilon (green) and zeta (yellow) isoformsThe lines between identified residues indicate specific interactions

42 [42] CENTRFORINTEGRATIVE BIOINFORMATICSVU E HIV Differential Progression/Replication Differences in disease progression in HIV-infected patients based on: Immunotype (e.g., B57 vs. non-B57)‏ Occurrence of specific 'escape' mutations Aim: apply Sequence Harmony to find (additional) key sites that determine disease progression or viral replication rates

43 [43] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Input: multiple sequence alignment of capsid protein

44 [44] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparison of multiple groups: B57 vs. non-B57 'Progressors' (P) vs. 'Long-term non-progressors' (L)‏ Early stage vs. Late stage Late stage: progressors (P) vs. non-progressors (L)‏ is especially interesting: what defines the 'non-progression'

45 [45] CENTRFORINTEGRATIVE BIOINFORMATICSVU E HIV Capsid Specificity: B57 vs. non-B57 36 selected residues from the 422 residue alignment below the cutoff of sites (excluding gaps): all 7 known B57 escape mutations

46 [46] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output nB57/B57: Structure

47 [47] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output: 'Stereotypes'

48 [48] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output: Distinct Specificity Regions n-B57 vs. LP L-early vs L-late L vs. P L vs. P-late L-late vs. P-late P-early vs. P-late

49 [49] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output: Detail in the sequence(s)

50 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Sequence comparison by ‘Sequence Harmony’ identifies subtype-specific functional sites … end …

51 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E


Download ppt "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’"

Similar presentations


Ads by Google