Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparing and Classifying Domain Structures

Similar presentations


Presentation on theme: "Comparing and Classifying Domain Structures"— Presentation transcript:

1 Comparing and Classifying Domain Structures
Methods for comparing protein structures Protein structural classifications How do structures and functions diverge in protein superfamilies What proportion of genome sequences can be predicted to belong to superfamilies of known structure?

2 Protein Domain Family Classifications
Known domain structures Alexey Murzin, LMB, Cambridge Predicted domain structures Julian Gough, Bristol University Known domain structures Predicted domain structures Christine Orengo, UCL Domain sequences Alex Bateman, Sanger

3 domains are important evolutionary units
60-80% of genes in genomes code for multidomain proteins

4 Evolution gives rise to families of proteins (homologues)
Domain Superfamily human yeast human M. tuberculosis Th. thermophilus structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved 4

5 Evolution gives rise to families of proteins (homologues)
orthologues Domain Superfamily human yeast human M. tuberculosis Th. thermophilus structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved 5

6 Evolution gives rise to families of proteins
paralogues Domain Superfamily human yeast human M. tuberculosis Th. thermophilus structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved 6

7 Structural diversity in the CATH Domain Family P-loop hydrolases
Cocaine esterase Acetylcholinesterase Cutinase structure is more highly conserved than sequence during evolution At least 40-50% of the structure is conserved

8 Challenges in comparing protein structures
residue substitutions due to single base mutations insertions or deletions (indels) of residues - usually not in the secondary structures but in the connecting loops Usually the structural cores are highly conserved Although structure is much more conserved than the sequence there can still be considerable structural differences between relatives outside the core

9 residue insertions usually occur in the loops connecting secondary
residue insertions usually occur in the loops connecting secondary structures substitutions can cause shifts in the orientations of secondary structures

10

11 Superposition of OB fold Structures

12 Related structures RMSD usually < 3.5A

13 Coping with Insertions and Deletions
ignore the variable loop regions and only compare the secondary structures use algorithms which can explicitly handle insertions/deletions e.g. dynamic programming, simulated annealing

14 Fast structure comparison by secondary structures

15 In this example the common graph contains 5 nodes.
Graphs can be compared using the Bron Kerbosch algorithm to find the largest common graph In this example the common graph contains 5 nodes. E E E E E E H E H H H H H H H H H Generallly ~1000 times faster than residue based methods

16 STRUCTAL Score distances between superposed residues in path matrix
Use dynamic programming to find best path Align sequences Superpose structures Use equivalences given by the best path to re-superpose the structures

17 Structure Comparison Algorithms
Structure classification Secondary structure based: SSM Henrick PDB GRATH Harrison & Orengo CATH Residue based: SSAP Taylor and Orengo CATH DALI Holm and Sander SCOP Comparer Sali and Blundell HOMSTRAD FatCat Adam Godzik PDB Structal Levitt PDB Structural Bioinformatics, Ed: Phil Bourne, Wiley 2003 Bioinformatics: Genes, Proteins and Computers, Bios, 2003

18 Domain structure database
lass Domain structure database A Orengo & Thornton 1993 rchitecture T opology or Fold Group H omologous Superfamily ~200,000 domains 2600 domain superfamilies

19 C A T H 3 ~40 ~1200 ~200,000 domains Class Architecture Topology or
Fold 3 ~40 ~1200 domain database

20 CATH Architectures Orthogonal bundle Up-down bundle -horseshoe
a-solenoid aa-barrel b-ribbon b-sheet b-roll b-barrel

21 CATH Architectures Clam 2-layer b-sandwich Trefoil Orthogonal b-prism
Parallel b-prism 3-layer b-sandwich b-solenoid ab-roll b-propeller

22 CATH Architectures ab-barrel 2-layer (ab) sandwich
3-layer (aba) sandwich 3-layer (bba) sandwich 3-layer (bab) sandwich 4-layer (abba) sandwich ab-prism ab-box ab-horseshoe

23 C A T H ~200,000 domain entries 40,000 domain entries Topology or
Fold Group ~1200 40,000 domain entries ~200,000 domain entries Homologous Superfamily ~2600 Sequence Family (30%)

24 Divergent Evolution Convergent Evolution Divergent Evolution
..VILST… ..KLST… ...SLTRF... ..VILST… ..KLST… ...SLTRF... Convergent Evolution Convergent Evolution

25 Homologous Structures
cholera toxin pertussis toxin SSAP score 97 81 79% 12% Sequence identity Heat labile enterotoxin high structure similarity score, often < 4A may have detectable sequence similarity e.g. by HMMs related functions

26 Evolutionary Ancestry Uncertain
structural similarity no sequence similarity no functional similarity Evolutionary Ancestry Uncertain

27

28

29 How do proteins evolve new functions?

30 Evolution of Protein Functions in Domain Superfamilies
domain duplication residue mutations and domain structure embellishments domain fusion, change in domain partner oligomerisation

31 Mutation of Residues TIM barrel glycosyl hydrolases
acid chitinase A Glu general acid narbonin Glu incorporated in a salt-bridge and this blocks substrate access

32 Changes in domain function in paralogous relatives
EC code: binding site binding site Pantetheine-phosphate adenyltransferase Glycerol-3-phosphate cytidylyl transferase changes in the domain structure can modify the binding site or domain surface

33 Pantetheine-phosphate adenyltransferase Arginyl-tRNA synthetase
binding site Pantetheine-phosphate adenyltransferase Arginyl-tRNA synthetase 1od6A00 1f7uA01

34 Arginyl-tRNA synthetase

35 changes in the domain partnerships can modify the binding site
Pantetheine-phosphate adenyltransferase Asparagine synthetase B

36 Change in Oligomerisation
Thioredoxin superfamily peroxidase calsequestrin

37 The Mosaic Theory of Protein Evolution
Teichmann et al 2001,2003 Gerstein et al. 2001 60-80% of proteins are multi-domain few thousand domain superfamilies (< 10,000 CATH, SCOP and Pfam) > Two million domain combinations (multi-domain architectures)

38 Similarity in Chemistry
conserved I P 19% P semiconserved I 67% P P P poorly conserved I 7% P P I’ P’ 7% unconserved nearly 90% of families show full or partial conservation of functions

39 chemistry is conserved or semi-conserved across the family but the substrate can change
cytochrome P450s FAD/NAD(P)(H)-dependent disulphide oxidoreductases hexapeptide repeat proteins

40 blade domain

41 fulcrum domain 41

42 handle domain 42

43

44 How representative are these structural superfamilies (ie in CATH, SCOP) of all proteins in nature?

45 :Domain structure predictions in genome sequences
protein sequences from UniProt scan against library of sequence patterns (HMM models) for CATH ~ 26 million domain sequences assigned to CATH superfamilies ~6000 annotated genomes

46

47 10,340 curated families with annotation
Pfam-A Pfam-B Other Pfam-A 10,340 curated families with annotation 47

48 CATH and Pfam coverage of genomes
NewFam?

49 Protein Family Databases
Each family is represented by a sequence profile or HMM


Download ppt "Comparing and Classifying Domain Structures"

Similar presentations


Ads by Google