Presentation on theme: "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’"— Presentation transcript:
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E 1-month Practical Course: Genome Analysis Sequence comparison by ‘Sequence Harmony’ identifies subtype-specific functional sites
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Significance of Alignment Positions Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance What ‘deviates from expected’? unlikely occurrences What is unlikely? only (relatively) few possibilities to obtain observed result
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Pfam Ig Family Alignment
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Aquaporin: Motifs NPA : stabilizes loops B and E G(a)xxxG(a)xxG(a) : Crossing of right-hand helical bundles Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Counting… Number of possibilities for finding some combination of aminoacids: which types? how much of each? Examples: WWW 3 W only 1 way RHH 1 R, 2 H three ways SHQ 1 S, 1 H, 1 Q six ways
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Counting… (2) ‘Real’ examples: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 33 W only 1 way RRRRRRRRRRRRRRRRHHHHHHHHHHHHHHHHH 16 R, 17 H ? ways (~ 2 33 10 9 ) SSSSSHSSCCCCCCCCEEQQEEEEEEEEEQEEE 7 S, 1 H, 8 C, 14 E, 3 Q ??? ways (~ 5 32 10 23 ) ‘many’ ways but, we can calculate that!
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Shannon’s ‘Information Entropy’: ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, 1948. “ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ” He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Solution: Entropy the entropy of a set of probabilities p i measures information, choice and uncertainty zero only if only one p i is not zero there is only one choice maximal if all p i are equal most ‘uncertain’ situation: all options are possible
  CENTRFORINTEGRATIVE BIOINFORMATICSVU E Information Content Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination. …but it applies equally well to any type of ‘message’ We can use it to measure the level of conservation in columns in an alignment
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Simple Example: Sequence Entropy LLLLLLA LLLLLAA LLLLAAA LLLAAAA LLAAAAA LAAAAAA p 1 = 0 p 2 = 0 p 1 = p 2 = ½ p 1 = f (‘L’) p 2 = f (‘A’)
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Analysis: Comparing Groups Many biological problems relate to questions like: “ Why do these proteins do this, and those proteins not? ” or “ Why do these patients get sick, and those not? ” The answer can be related to similarities and differences between sequences Similarities (conservation) relate to functionally critical positions Differences can explain functional differences
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Identification of Functional Sites Functional differences between Protein (sub-)families Current practice: use Multiple Sequence Alignment look for Conserved Sites within (sub-)families (ignore sites that are overall conserved)
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Conservation and (functional) Differences: Conservation in 011112Ras/Ral 25462192TOTAL: 4141028Rab5/6 167023MIP 5141029SMAD NotOneBothKnownTest-set Sequence Entropy measures Conservation But Sites that are Different are not always Conserved:
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Identification of Functional Sites (2) Functional differences between Protein (sub-)families Example Binders vs. Non-Binders: sites crucial for binding: conserved sites determining ‘non-binding’: not conserved Take into account Non-Conserved Sites as well! comparing Amino-acid Compositions
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E TGF- signalling pathway T R-IIT R-I TGF- AR-Smads division, differentiation, motility, adhesion, programmed cell death Nucleus activation/repression TGF- target genes Smad-association p pp BMPR-IBMPR-II BR-Smads p Nucleus activation/repression BMP target genes BMP Smad-association pp specificity
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2 Alignment & Functionally Specific Sites 27 known sites of functional specificity based mostly on site- specific mutants and characterized on BMPR-I vs. TBR-I binding affinity 10% 21% 3% %FP 59% 48% 76% %FN 31%12SDPpred 52%21TreeDet 21%6AMAS %TPPredictMethod
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparing Groups of Sequences: Entropy Relative ‘Entropy’ rE A/B group A vs. B: using probabilities p of amino acid type x at position i Degenerate for p B = 0, i.e. when A and B fully different! Introduce Relative ‘Entropy’ rE A/AB A vs. all (‘AB’): Not degenerate, but still unbound. Upper bound depends on relative size of groups
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparing Groups: Sequence Harmony Weigh groups A and B equally: Take p A + p B in stead of p AB Defined on the fixed interval of [0 1] one is complete overlap in composition: Harmony zero is no overlap in composition: No Harmony
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2 Alignment & Sequence Harmony Walter Pirovano*, K. Anton Feenstra* and Jaap Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006). K. Anton Feenstra, Walter Pirovano and Jaap Heringa. “Sub-type Specific Sites for SMAD Receptor Binding Identified by Sequence Comparison using ‘Sequence Harmony’ ”. in: From Computational Biophysics to Systems Biology. pp. 73-78. Eds. U.H.E. Hansmann, J. Meinke, S. Mohanty and O. Zimmermann, Jülich, NIC Series, Vol. 34, 2006. Elena Marchiori*, Walter Pirovano, Jaap Heringa and K. Anton Feenstra*. “A Feature Selection Algorithm for Detecting Subtype Specific Sites for Smad Receptor Binding”, Bio-ICMLA06, accepted (2006).
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E 262270280290300310 AR BR Smads: Comparing two Groups
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Finding Low-harmony sites in Smad-MH2 270280290 300
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparison to Other Prediction Methods AMAS SDP-pred TreeDet Sequence Harmony 23 sites 8 sites
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparison to Other Prediction Methods (2) AMAS cumulative AMAS SDP-pred TreeDet SH + Entropy (inc) SH + Entropy (dec) Sequence Harmony
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E AMAS cumulative AMAS SDP-pred TreeDet SH + Ranges + E(inc) SH + Ranges + E(dec) SH + Entropy (inc) SH + Entropy (dec) Sequence Harmony 18 sites 2 Comparison to Other Prediction Methods (3)
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Ras family: Rab5 vs. Rab6
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Ras Family: Ras vs. Ral
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E MIP family: AQP vs. GLP
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Conclusions Smad-MH2 Sequence Harmony 40 Sites of Low Sequence Harmony in Smad-MH2 different between the AR (TGF- ) and BR (BMP) sub-type Smads Low Harmony sites in Smad-MH2 are functionally relevant Other methods do not select all known (functional) sites! Sequence information maps to structure: Next: Analyze Protein-Protein Interactions 14 Low Harmony Sites in Smad-MH2 of unknown function 11 putative functions from structural considerations promising candidates that determine TGF- /BMP specificity confirm (or rebuke) putative functions?
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E General Conclusions Lack of experimental data Adequate quality and quantity hard to attain Discriminating power of test-sets varies Conservation not best identifier for functional differences Selections too conservative and not very specific Differences, as measured by Sequence Harmony good alternative Selections include most known sites, but somewhat lower specificity
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Brajenovic, M. et al. J. Biol. Chem. 2004;279:12804-12811 Connectivity map of human Par complexes based on TAP purifications and co- immunoprecipitation experiments Connectivity map of human Par complexes based on TAP purifications and co-immunoprecipitation experiments. The TAP-tagged proteins used as baits are represented as rhomboids. Lines connecting proteins indicate presence in a TAP complex or coimmunoprecipitation (dotted lines). The width of each line represents the degree of sequence coverage of the identification, which depends on the robustness of the interaction but also on the expression level and a number of other factors. Green boxes/lines represent previously known interactors/interactions; red boxes/lines represent novel interactors/interactions. Proteins that are found specifically with only one TAP-protein are grouped in boxes (S1–S6), whereas proteins that are consistently found together with more than one TAP-protein are grouped in modules (M1 and M2).
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Charting protein complexes, signaling pathways, and networks in the immune system Bauch ABauch A, Superti-Furga G Source: IMMUNOLOGICAL REVIEWS 210: 187-207 APR 2006Superti-Furga G
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E HIV Differential Progression/Replication Differences in disease progression in HIV-infected patients based on: Immunotype (e.g., B57 vs. non-B57) Occurrence of specific 'escape' mutations Aim: apply Sequence Harmony to find (additional) key sites that determine disease progression or viral replication rates
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Input: multiple sequence alignment of capsid protein
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Comparison of multiple groups: B57 vs. non-B57 'Progressors' (P) vs. 'Long-term non-progressors' (L) Early stage vs. Late stage Late stage: progressors (P) vs. non-progressors (L) is especially interesting: what defines the 'non-progression'
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E HIV Capsid Specificity: B57 vs. non-B57 36 selected residues from the 422 residue alignment below the cutoff of 0.9. 26 sites (excluding gaps): all 7 known B57 escape mutations
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output nB57/B57: Structure
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output: 'Stereotypes'
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output: Distinct Specificity Regions n-B57 vs. LP L-early vs L-late L vs. P L vs. P-late L-late vs. P-late P-early vs. P-late
 CENTRFORINTEGRATIVE BIOINFORMATICSVU E Output: Detail in the sequence(s)
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Sequence comparison by ‘Sequence Harmony’ identifies subtype-specific functional sites … end …
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E