Presentation is loading. Please wait.

Presentation is loading. Please wait.

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 21 sep 2007 Sequence Entropy Genome Analysis.

Similar presentations


Presentation on theme: "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 21 sep 2007 Sequence Entropy Genome Analysis."— Presentation transcript:

1 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 21 sep 2007 Sequence Entropy Genome Analysis

2 [2] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Significance of Alignment Positions Observed occurrence of amino acids at some position in an alignment that deviates from expected may indicate some (functional) significance What ‘deviates from expected’? unlikely occurrences What is unlikely? only (relatively) few possibilities to obtain observed result

3 [3] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Pfam Ig Family Alignment

4 [4] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Aquaporin: Motifs NPA : stabilizes loops B and E G(a)xxxG(a)xxG(a) : Crossing of right-hand helical bundles Andreas Engel and Henning Stahlberg, in: Current Topics in Membranes (2001), Hohmann, Agre & Nielsen (Eds.) Academic Press

5 [5] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Counting… Number of possibilities for finding some combination of aminoacids: which types? how much of each? Examples: WWW 3 W  only 1 way RHH 1 R, 2 H  three ways SHQ 1 S, 1 H, 1 Q  six ways

6 [6] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Counting… (2) ‘Real’ examples: WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW 33 W  only 1 way RRRRRRRRRRRRRRRRHHHHHHHHHHHHHHHHH 16 R, 17 H  ? ways (~ 2 33  10 9 ) SSSSSHSSCCCCCCCCEEQQEEEEEEEEEQEEE 7 S, 1 H, 8 C, 14 E, 3 Q  ??? ways (~ 5 32  ) ‘many’ ways  but, we can calculate that!

7 [7] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Shannon’s ‘Information Entropy’: ‘A Mathematical Theory of Communication’, The Bell System Technical Journal, Vol. 27, “ Can we define a quantity which will measure, in some sense, how much information is ‘produced’ by such a process, or better, at what rate information is produced? ” He was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination.

8 [8] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Solution: Entropy the entropy of a set of probabilities p i measures information, choice and uncertainty zero only if only one p i is not zero there is only one choice maximal if all p i are equal most ‘uncertain’ situation: all options are possible

9 [9] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Information Content Shannon was thinking about the Transmission of Information, i.e., from a Source through some Channel to a Destination. …but it applies equally well to any type of ‘message’ We can use it to measure the level of conservation in columns in an alignment

10 [10] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Simple Example: Sequence Entropy AAAAAAL AAAAALL AAAALLL AAALLLL AALLLLL ALLLLLL p 1 = 0 p 2 = 0 p 1 = p 2 = ½ p 1 = f (‘L’) p 2 = f (‘A’)

11 [11] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Analysis: Comparing Groups Many biological problems relate to questions like: “ Why do these proteins do this, and those proteins not? ” or “ Why do these patients get sick, and those not? ”  The answer can be related to similarities and differences between sequences Similarities (conservation) relate to functionally critical positions Differences can explain functional differences

12 [12] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E TGF-β signalling pathway T  R-IIT  R-I TGF-  AR-Smads division, differentiation, motility, adhesion, programmed cell death Nucleus activation/repression TGF-  target genes Smad-association p pp BMPR-IBMPR-II BR-Smads p Nucleus activation/repression BMP target genes BMP Smad-association pp

13 [13] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E AR BR Alignment & Known Functional Sites:

14 [14] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Measuring Overlapping Distributions Weigh both groups equally; take p A +p B in stead of p AB : Fixed interval [0,1], but not completely symmetrical

15 [15] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Entropy vs. Sequence Harmony: Example AAAAAALAAAAAAL AAAAALLAAAAALL AAAALLLAAAALLL AAALLLLAAALLLL AALLLLLAALLLLL ALLLLLLALLLLLL AAAAAAAAAAAAAA AAAAAAAAAAAAAA AAAAAAALLLLLLL A B

16 [16] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E AR BR Smads: Comparing two Groups

17 [17] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2 Alignment & Functionally Specific Sites 29 known sites of functional specificity based mostly on site- specific mutants and characterized on affinity for binding to BMPR-I vs. TBR-I receptor types MethodPredictSpecificityError AMAS621%3% TreeDet2152%21% SDPpred1231%10%

18 [18] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Finding Low-harmony sites in Smad-MH2 Pos.Sec.str.SHARBRInteraction L263 B1’0 LaVfm SARA T267 B1’0 TmAcen SARA S269 loop0 CShEq ?? (putative) A272 loop0 AKqls ?? (putative) F273 loop0 FHy ?? (putative) Q284 B20 QtN T  R-I Q294 loop0.16 QSq c-Ski/SnoN P295 B30 PTrl c-Ski/SnoN L297 B30.11 LMiVi c-Ski/SnoN T298 B30 TLi c-Ski/SnoN S308 L10 SaN c-Ski/SnoN – L10 – Nsd c-Ski/SnoN E309 L10 EKrs c-Ski/SnoN A323 H10 AeS ALK1/2 V325 H10 VI ?? (putative) M327 H10 LMqN ALK1/2 R334 Loop0.18 RkK ?? (putative) R337 B50 RH ?? (putative) MethodPredictSpecificityError AMAS621%3% TreeDet2152%21% SDPpred1231%10% Sequence Harmony (SH=0) 32 (SH<0.2) 40 79% 93% 28% 33% Pirovano, Feenstra & Heringa. “Sequence Comparison by Sequence Harmony Identifies Subtype Specific Functional Sites”, Nucleic Acids Res., in press (2006) Sequence Harmony.pdf Sequence Harmony.pdf

19 [19] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Smad-MH2: Functional Clusters R462 C463 Q400 R410 W368 Y366 A392 S269 F273 N443 Q294 Q309 L297 L440 N381 A354 V461 S460 Q407 Q364 P360 R365 T267 A272 I341 P295 S308 T298 R337 F346 P378 Q284 V325 A323 R427 M327 T430 R334 FAST1, Mixer, SARA c-Ski/SnoN SARA T  R-I/ALK1/2 T  R-I/BMPR-I ? SARA/ Mixer T  R-I/BMPR-I/ALK1/2 ? receptor-binding retention & transcription factors co-repressors

20 [20] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Conclusions Smad-MH2 40 Sites of Low Sequence Harmony in Smad-MH2 different between the AR (TGF-  ) and BR (BMP) sub-type Smads Low Harmony sites in Smad-MH2 are functionally relevant Other methods cannot select all known sites!  Functional Sites are Interaction Surfaces on Protein Surface:  Next: Analyze Interaction Partners in the Pathway 14 Low Harmony Sites in Smad-MH2 of unknown function 11 putative functions from structural considerations promising candidates that determine TGF-  /BMP specificity confirm (or rebuke) putative functions?

21 [21] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Harmony Webserver

22 [22] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Harmony Webserver: Groups

23 [23] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Harmony Webserver: Reference

24 [24] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Harmony Webserver: Structure

25 [25] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E SMAD Sequence Harmony: Raw Table

26 [26] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E SMAD Sequence Harmony: Results

27 [27] 21 sep 2007 Walter Pirovano CENTRFORINTEGRATIVE BIOINFORMATICSVU E SMAD Sequence Harmony: Structure


Download ppt "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Walter Pirovano 21 sep 2007 Sequence Entropy Genome Analysis."

Similar presentations


Ads by Google