Presentation is loading. Please wait.

Presentation is loading. Please wait.

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction.

Similar presentations


Presentation on theme: "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction."— Presentation transcript:

1 C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction Lecture 13: Protein Function

2 [2] [2][2] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Sequence Structure Function Threading Homology searching (BLAST) Ab initio prediction and folding Function prediction from structure Sequence-Structure-Function impossible but for the smallest structures very difficult

3 [3] [3][3] CENTRFORINTEGRATIVE BIOINFORMATICSVU E TERTIARY STRUCTURE (fold) Genome Expressome Proteome Metabolome Functional Genomics – Systems Biology Metabolomics fluxomics

4 [4] [4][4] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Systems Biology is the study of the interactions between the components of a biological system, and how these interactions give rise to the function and behaviour of that system (for example, the enzymes and metabolites in a metabolic pathway). The aim is to quantitatively understand the system and to be able to predict the system’s time processes the interactions are nonlinear the interactions give rise to emergent properties, i.e. properties that cannot be explained by the components in the system Biological processes include many time-scales, many compartments and many interconnected network levels (e.g. regulation, signalling, expression,..)

5 [5] [5][5] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Systems Biology understanding is often achieved through modeling and simulation of the system’s components and interactions. Many times, the ‘four Ms’ cycle is adopted: Measuring Mining Modeling Manipulating

6 [6] [6][6] CENTRFORINTEGRATIVE BIOINFORMATICSVU E ‘The silicon cell’ (some people think ‘silly-con’ cell)

7 [7] [7][7] CENTRFORINTEGRATIVE BIOINFORMATICSVU E

8 [8] [8][8] CENTRFORINTEGRATIVE BIOINFORMATICSVU E A system response Apoptosis: programmed cell death Necrosis: accidental cell death

9 [9] [9][9] CENTRFORINTEGRATIVE BIOINFORMATICSVU E This pathway diagram shows a comparison of pathways in (left) Homo sapiens (human) and (right) Saccharomyces cerevisiae (baker’s yeast). Changes in controlling enzymes (square boxes in red) and the pathway itself have occurred (yeast has one altered (‘overtaking’) path in the graph) We need to be able to do automatic pathway comparison (pathway alignment) HumanYeast ‘Comparative metabolomics’

10 [10] CENTRFORINTEGRATIVE BIOINFORMATICSVU E The citric-acid cycle

11 [11] CENTRFORINTEGRATIVE BIOINFORMATICSVU E The citric-acid cycle Fig. 1. (a) A graphical representation of the reactions of the citric-acid cycle (CAC), including the connections with pyruvate and phosphoenolpyruvate, and the glyoxylate shunt. When there are two enzymes that are not homologous to each other but that catalyse the same reaction (non- homologous gene displacement), one is marked with a solid line and the other with a dashed line. The oxidative direction is clockwise. The enzymes with their EC numbers are as follows: 1, citrate synthase ( ); 2, aconitase ( ); 3, isocitrate dehydrogenase ( ); 4, 2-ketoglutarate dehydrogenase (solid line; and ) and 2- ketoglutarate ferredoxin oxidoreductase (dashed line; ); 5, succinyl- CoA synthetase (solid line; ) or succinyl-CoA–acetoacetate-CoA transferase (dashed line; ); 6, succinate dehydrogenase or fumarate reductase ( ); 7, fumarase ( ) class I (dashed line) and class II (solid line); 8, bacterial-type malate dehydrogenase (solid line) or archaeal-type malate dehydrogenase (dashed line) ( ); 9, isocitrate lyase ( ); 10, malate synthase ( ); 11, phosphoenolpyruvate carboxykinase ( ) or phosphoenolpyruvate carboxylase ( ); 12, malic enzyme ( or ); 13, pyruvate carboxylase or oxaloacetate decarboxylase ( ); 14, pyruvate dehydrogenase (solid line; and ) and pyruvate ferredoxin oxidoreductase (dashed line; ). M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of the citric acid cycle: a genomic approach'' Trends Microbiol, 7, (1999)

12 [12] CENTRFORINTEGRATIVE BIOINFORMATICSVU E The citric-acid cycle M. A. Huynen, T. Dandekar and P. Bork ``Variation and evolution of the citric acid cycle: a genomic approach'' Trends Microbiol, 7, (1999) b) Individual species might not have a complete CAC. This diagram shows the genes for the CAC for each unicellular species for which a genome sequence has been published, together with the phylogeny of the species. The distance-based phylogeny was constructed using the fraction of genes shared between genomes as a similarity criterion29. The major kingdoms of life are indicated in red (Archaea), blue (Bacteria) and yellow (Eukarya). Question marks represent reactions for which there is biochemical evidence in the species itself or in a related species but for which no genes could be found. Genes that lie in a single operon are shown in the same color. Genes were assumed to be located in a single operon when they were transcribed in the same direction and the stretches of non-coding DNA separating them were less than 50 nucleotides in length.

13 [13] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Experimental Structural genomics Functional genomics Protein-protein interaction Metabolic pathways Expression data

14 [14] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Communicability: Functional Genomics Interpretation of genome-scale gene expression data External Program DNA-chip data Cluster of coregulated genes gene 1 gene 2... gene n PFMP query Pathways affected pathway 1 pathway 2

15 [15] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Communicability: Functional Genomics Interpretation of genome-scale gene expression data External Programs DNA-chip data Cluster of coregulated genes gene 1 gene 2... gene n PFMP query Similarities with known regulatory sites site 1Factor 1 site 2Factor 2... Pattern discovery gene 1 gene 2... (putative regulatory sites)

16 [16] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Other Issues Partial information (indirect interactions) and subsequent filling of the missing steps Negative results (elements that have been shown not to interact, enzymes missing in an organism) Putative interactions resulting from computational analyses

17 [17] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Protein function categories Catalysis (enzymes) Binding – transport (active/passive) Protein-DNA/RNA binding (e.g. histones, transcription factors) Protein-protein interactions (e.g. antibody-lysozyme) (experimentally determined by yeast two-hybrid (Y2H) or bacterial two-hybrid (B2H) screening ) Protein-fatty acid binding (e.g. apolipoproteins) Protein – small molecules (drug interaction, structure decoding) Structural component (e.g.  -crystallin) Regulation Signalling Transcription regulation Immune system Motor proteins (actin/myosin)

18 [18] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Catalytic properties of enzymes [S] Moles/s V max V max /2 KmKm Michaelis-Menten equation: K m k cat E + S ES E + P E = enzyme S = substrate ES = enzyme-substrate complex (transition state) P = product K m = Michaelis constant K cat = catalytic rate constant (turnover number) K cat /K m = specificity constant (useful for comparison) V max × [S] V = K m + [S]

19 [19] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Protein interaction domains

20 [20] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Energy difference upon binding Examples of protein interactions (and functional importance) include: Protein – protein(pathway analysis); Protein – small molecules (drug interaction, structure decoding); Protein – peptides, DNA/RNA(function analysis) The change in Gibb’s Free Energy of the protein-ligand binding interaction can be monitored and expressed by the following;  G =  H – T  S (H=Enthalpy, S=Entropy and T=Temperature)

21 [21] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Experimentally measuring PPIs Yeast two-hybrid Bait – TF binding domain Prey – Activation domain TF: DNA binding and activation domain together set transcription in motion Yeast strains of opposite mating types Make yeast strains mate and have an easily observable reporter gene (e.g. luciferase) with appropriate TFBS Bait and Prey have to interact to activate reporter gene

22 [22] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Experimentally measuring PPIs Tandem affinity purification (TAP) Add TAP tag at end of target gene containing an IgG domain Separate protein-TAP-IgG complexes using affinity column containing IgG beads Wash off the column, target-IgG complex stays behind If target protein interacts with others, these are also retained on the column Separate proteins using SDS-PAGE and identify using mass-spec Can also use other protein in complex as target protein to verify complex formation

23 [23] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Protein function Many proteins combine functions For example, some immunoglobulin structures are thought to have more than 100 different functions (and active/binding sites) Alternative splicing can generate (partially) alternative structures

24 [24] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Protein function & Interaction Active site / binding cleft Shape complementarity

25 [25] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Protein function evolution Chymotrypsin

26 [26] CENTRFORINTEGRATIVE BIOINFORMATICSVU E How to infer function Experiment Deduction from sequence Multiple sequence alignment – conservation patterns Homology searching Deduction from structure Threading Structure-structure comparison Homology modelling

27 [27] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Cholesterol Biosynthesis: Cholesterol biosynthesis primarily occurs in eukaryotic cells. It is necessary for membrane synthesis, and is a precursor for steroid hormone production as well as for vitamin D. While the pathway had previously been assumed to be localized in the cytosol and ER, more recent evidence suggests that a good deal of the enzymes in the pathway exist largely, if not exclusively, in the peroxisome (the enzymes listed in blue in the pathway to the left are thought to be at least partly peroxisomal). Patients with peroxisome biogenesis disorders (PBDs) have a variable deficiency in cholesterol biosynthesis

28 [28] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Mevalonate plays a role in epithelial cancers: it can inhibit EGFR Cholesterol Biosynthesis: from acetyl-Coa to mevalonate

29 [29] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Epidermal Growth Factor as a Clinical Target in Cancer A malignant tumour is the product of uncontrolled cell proliferation. Cell growth is controlled by a delicate balance between growth-promoting and growth-inhibiting factors. In normal tissue the production and activity of these factors results in differentiated cells growing in a controlled and regulated manner that maintains the normal integrity and functioning of the organ. The malignant cell has evaded this control; the natural balance is disturbed (via a variety of mechanisms) and unregulated, aberrant cell growth occurs. A key driver for growth is the epidermal growth factor (EGF) and the receptor for EGF (the EGFR) has been implicated in the development and progression of a number of human solid tumours including those of the lung, breast, prostate, colon, ovary, head and neck.

30 [30] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Energy housekeeping: Adenosine diphosphate (ADP) – Adenosine triphosphate (ATP)

31 [31] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Chemical Reaction

32 [32] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Enzymatic Catalysis

33 [33] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Gene Expression

34 [34] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Inhibition

35 [35] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Metabolic Pathway: Proline Biosynthesis

36 [36] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Transcriptional Regulation

37 [37] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Methionine Biosynthesis in E. coli

38 [38] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Shortcut Representation

39 [39] CENTRFORINTEGRATIVE BIOINFORMATICSVU E High-level Interaction

40 [40] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Levels of Resolution

41 [41] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Cholesterol Biosynthesis

42 [42] CENTRFORINTEGRATIVE BIOINFORMATICSVU E SREBP Pathway

43 [43] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Signal Transduction Important signalling pathways: Map-kinase (MapK) signalling pathway, or TGF-  pathway

44 [44] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Transport

45 [45] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Phosphate Utilization in Yeast

46 [46] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Multiple Levels of Regulation Gene expression Protein activity Protein intracellular location Protein degradation Substrate transport

47 [47] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Graphical Representation – Gene Expression

48 [48] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Experimental Data – Gene Expression

49 [49] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Experimental Data – Transcriptional Regulation

50 [50] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Experimental Data – Transcriptional Regulation

51 [51] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Transcriptional Regulation Integrated View

52 [52] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Pathways and Pathway Diagrams Pathways Set of nodes (entities) and edges (associations) Pathway Diagrams XY coordinates Node splitting allowed Multiple views of the same pathway Different abstraction levels

53 [53] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Kegg database (Japan) Metabolic networks Glycolysis and Gluconeogenesis

54 [54] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Gene Ontology (GO) Not a genome sequence database Developing three structured, controlled vocabularies (ontologies) to describe gene products in terms of: biological process cellular component molecular function in a species-independent manner

55 [55] CENTRFORINTEGRATIVE BIOINFORMATICSVU E The GO ontology

56 [56] CENTRFORINTEGRATIVE BIOINFORMATICSVU E Gene Ontology Members FlyBase - database for the fruitfly Drosophila melanogaster Berkeley Drosophila Genome Project (BDGP) - Drosophila informatics; GO database & software, Sequence Ontology development Saccharomyces Genome Database (SGD) - database for the budding yeast Saccharomyces cerevisiae Mouse Genome Database (MGD) & Gene Expression Database (GXD) - databases for the mouse Mus musculus The Arabidopsis Information Resource (TAIR) - database for the brassica family plant Arabidopsis thaliana WormBase - database for the nematode Caenorhabditis elegans EBI GOA project : annotation of UniProt (Swiss-Prot/TrEMBL/PIR) and InterPro databases Rat Genome Database (RGD) - database for the rat Rattus norvegicus DictyBase - informatics resource for the slime mold Dictyostelium discoideum GeneDB S. pombe - database for the fission yeast Schizosaccharomyces pombe (part of the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute) GeneDB for protozoa - databases for Plasmodium falciparum, Leishmania major, Trypanosoma brucei, and several other protozoan parasites (part of the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute) Genome Knowledge Base (GK) - a collaboration between Cold Spring Harbor Laboratory and EBI) TIGR - The Institute for Genomic Research Gramene - A Comparative Mapping Resource for Monocots Compugen (with its Internet Research Engine) The Zebrafish Information Network (ZFIN) - reference datasets and information on Danio rerio


Download ppt "C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Bioinformatics Master Course: DNA/Protein Structure-Function Analysis and Prediction."

Similar presentations


Ads by Google