Download presentation
Presentation is loading. Please wait.
1
Protein Subcellular Localization
Shan Sundararaj July 22, 2004 Protein Subcellular Localization Shan Sundararaj University of Alberta Edmonton, AB Lecture 4.0 (c) 2004 CGDN
2
Why is Localization Important?
Shan Sundararaj July 22, 2004 Why is Localization Important? Function is dependent on context Co-localization of proteins of related function Valuable annotation for new proteins Design of proteins with specific targets Drug targeting Accessibility: Membrane-bound > cytoplasmic > nuclear Lecture 4.0 (c) 2004 CGDN
3
Why is Localization Important?
Shan Sundararaj July 22, 2004 Why is Localization Important? 1974 Nobel Prize in Physiology/Medicine George Palade “for discoveries concerning the structural and functional organization of the cell” 1999 Nobel Prize in Physiology/Medicine Günter Blobel “for the discovery that proteins have intrinsic signals that govern their transport and localization in the cell” George Palade pioneered the use of electron microscopy for biological samples The analysis of the secretory pathway by George Palade and his collaborators in the 1960s took advantage of advanced ultrastructural analysis and cell-fractionation procedures Lecture 4.0 (c) 2004 CGDN
4
Gram Positive (3-4 states) Gram Negative (5 states)
Bacteria Gram Positive (3-4 states) Gram Negative (5 states) Extracellular cytoplasm cytoplasm periplasm cytoplasmic membrane cell wall cytoplasmic membrane outer membrane Extracellular Lecture 4.0
5
(modified from Voet & Voet, Biochemystry; Wiley-VCH 1992)
Shan Sundararaj July 22, 2004 Eukaryotic Cell Compartmentalized Diverse range of specific organelles: Plants: chloroplasts, chromoplasts, other plastids Muscle: sarcoplasm Various endosomes, vesicles (modified from Voet & Voet, Biochemystry; Wiley-VCH 1992) Lecture 4.0 (c) 2004 CGDN
6
Yet more categories… Chloroplast Mitochondrion Yeast “specific”
Shan Sundararaj July 22, 2004 Yet more categories… Chloroplast Mitochondrion Yeast “specific” List here some of the more exotic subcellular localizations (i.e. some of the 22 localizations of yeast from the SGD, or some of the ones from the malaria parasite from the Yeh/Altman presentation. Lecture 4.0 (c) 2004 CGDN
7
Level of Annotation As simple as two states: Gross compartments:
Shan Sundararaj July 22, 2004 Level of Annotation As simple as two states: membrane protein vs. non-membrane protein secreted protein vs. non-secreted protein Gross compartments: cytoplasm, inner membrane, periplasm, cell wall, outer membrane, extracellular nucleus, mitochondria, peroxisome, vacuole… Fine compartments: Mitochondrial matrix, bud neck, spindle pole… Any of 1425 GO cellular compartments Lecture 4.0 (c) 2004 CGDN
8
Localization signaling
Shan Sundararaj July 22, 2004 Localization signaling Proteins must have intrinsic signals for their localization – a cellular address E.g. N-terminal signal sequences 321 Nuclear Inner Membrane Lane Nucleus, Intracellular county Eukaryotic Cell CL34V3M3 Lecture 4.0 (c) 2004 CGDN
9
Localization signaling
Shan Sundararaj July 22, 2004 Localization signaling Some signals are easily recognizable Signal peptidase cleavage site, consensus sequence for secretion extracellular Address printed neatly, postal code Others are difficult to understand Outer membrane b-barrel proteins, no consensus sequence, few sequence restraints Sloppy address, different kind of code that we don’t understand yet Lecture 4.0 (c) 2004 CGDN
10
Experimental determination
Shan Sundararaj July 22, 2004 Experimental determination Since don’t fully understand the language of proteins, our knowledge must often come from inference Predicting localization is like sorting mail based only on examples of where some mail has gone before Important to have good data sets of proteins with known localizations Lecture 4.0 (c) 2004 CGDN
11
Datasets Organelle_DB (http://organelledb.lsi.umich.edu/)
Shan Sundararaj July 22, 2004 Datasets Organelle_DB ( 25095 eukaryotic proteins from subcellular proteomics studies DBSubLoc ( Combines SwissProt and PIR annotations (64051 proteins) PSORTDB ( Bacterial Gram –ve proteins, 574 Gram +ve proteins SignalP ( 940 plant and 2738 human proteins YPL ( 2956 yeast proteins Lecture 4.0 (c) 2004 CGDN
12
Experimental Methods Electron microscopy
Shan Sundararaj July 22, 2004 Experimental Methods Electron microscopy GFP tagging / fluorescence microscopy Subcellular fractionation + detection Western blotting Mass spectrometry Lecture 4.0 (c) 2004 CGDN
13
(from Koster and Klumperman, Nat Rev Mol Cell Biol, Sep 2003, S6-10)
Shan Sundararaj July 22, 2004 Electron Microscopy Highest resolution, can work at the level of a single protein complex Immunolabel proteins of interest in conjunction with colloidal gold, and visualize Combined with electron tomography, can even visualize unlabeled complexes Use different sizes of gold for multiple protein visualization Electron tomography is microscopy including a third dimension (but much higher resolution than physical sectioning, down to as low at 2nm). Resolution of light microscopy is nm, so one fluorescent spot could be several vesicles, or could actually only be part of a larger organelle. When done in conjunction with EM however, such ambiguity can be resolved. In picture, the protein CD63 seems to be can be localized better by EM in figures f (endosomes) and g (golgi). (from Koster and Klumperman, Nat Rev Mol Cell Biol, Sep 2003, S6-10) Lecture 4.0 (c) 2004 CGDN
14
Fluorescence Microscopy
Shan Sundararaj July 22, 2004 Fluorescence Microscopy Tag gene at either 3’ or 5’ end Using GFP (or RFP, YFP, CFP, etc.) Using an epitope tag and a fluorescently labeled antibody Careful of removing signal peptides! Also use a subcellular-specific marker or stain Visualize with confocal fluorescence microscopy and analyze images for co-localization Lecture 4.0 (c) 2004 CGDN
15
Specific co-labeling (yeast)
Shan Sundararaj July 22, 2004 Specific co-labeling (yeast) Early Golgi:Cop1 Endosome: Snf7 ER to Golgi: Sec13 Golgi apparatus: Anp1 Late Golgi: Chc1 Lipid particle: Erg6 Mitochondrion: MitoTracker Nucleus: DAPI Nucleolus: Sik1 Nuclear periphery: Nic96 Peroxisome: Pex3 Vacuole: FM4-64 Nuclear-specific DAPI staining Lecture 4.0 (c) 2004 CGDN
16
Subcellular Fractionation
Shan Sundararaj July 22, 2004 Subcellular Fractionation transfer supernatant transfer supernatant transfer supernatant 1000 g 10,000 g 100,000 g Pellet microsomal Fraction (ER, golgi, lysosomes, peroxisomes) Pellet unbroken cells nuclei chloroplast Pellet mitochondria Super. Cytosol, Soluble enzymes tissue homogenate Lecture 4.0 (c) 2004 CGDN
17
Detergent Fractionation
Shan Sundararaj July 22, 2004 Detergent Fractionation Cells Extraction with Digitonin/EDTA supernatant pellet Extraction with TritonX100/EDTA Cytoplasmic Fraction supernatent pellet Organelle Membranes Extraction with SDS/EDTA supernatant pellet Nuclear Cytoskeletal (in SDS) Lecture 4.0 (c) 2004 CGDN
18
Fractionation Identification
Shan Sundararaj July 22, 2004 Fractionation Identification Once fractionated, take compartment of interest and separate proteins 2D gel or chromatography Identify separated proteins Mass spectrometry for high-throughput Western blot for specific proteins Lecture 4.0 (c) 2004 CGDN
19
Fractionation in proteomics
Shan Sundararaj July 22, 2004 Fractionation in proteomics Lecture 4.0 (c) 2004 CGDN
20
High-Throughput Experiments
Shan Sundararaj July 22, 2004 High-Throughput Experiments Kumar et al., Genes Dev 2002, 16: Epitope-tagged >60% of ORFs, visualized with fluorescently labeled antibody 2744 localizations (44% of S. cerevisiae genes) Huh et al., Nature 2003, 425: GFP tagged all ORFs, RFP tagged compartments 4156 localizations (75% of S. cerevisiae genes) Combined, now nearly 87% of yeast proteins have a localization annotation Add Ellison paper here! Lecture 4.0 (c) 2004 CGDN
21
High-Throughput Experiments
Shan Sundararaj July 22, 2004 High-Throughput Experiments Lopez-Campistrous et al, Mol Cell Proteomics, 2005 Subcellular fractionation of E. coli, 2D-gel separation, MS-MS 2,160 localizations to cytoplasm, inner membrane, periplasm, and outer membrane Add Ellison paper here! Lecture 4.0 (c) 2004 CGDN
22
Predictions from known data
Shan Sundararaj July 22, 2004 Predictions from known data Enough experimental data exists to build highly accurate computational predictors of localization Lecture 4.0 (c) 2004 CGDN
23
Predictions from known data
Shan Sundararaj July 22, 2004 Predictions from known data Different information used for predictions: Sequence motifs N-terminal: secretory signal peptides, mitochondrial targeting peptide, chloroplast transit peptide C-terminal: peroxisome import signal, ER retention signal Mid-sequence: nuclear localization signals Amino acid composition AA frequency, dipeptide composition. Homology - Sequence comparison to proteins of known localization Lecture 4.0 (c) 2004 CGDN
24
N-terminal signal peptides
Shan Sundararaj July 22, 2004 N-terminal signal peptides Common structure of signal peptides: positively charged n-region, followed by a hydrophobic h-region and a neutral but polar c-region. Eukaryotes Prokaryotes Gram-negative Gram-positive Total length (avg) 22.6 aa 25.1 aa 32.0 aa n-regions only slightly Arg-rich Lys+Arg-rich h-regions short, very hydrophobic slightly longer, less hydrophobic very long, less hydrophobic c-regions short, no pattern short, Ser+Ala-rich longer, Pro+Thr-rich -3,-1 positions small, neutral residues almost exclusively Ala +1 to +5 region no pattern rich in Ala, Asp/Glu, and Ser/Thr Lecture 4.0 (c) 2004 CGDN
25
N-terminal signal peptides
Shan Sundararaj July 22, 2004 N-terminal signal peptides Lecture 4.0 (c) 2004 CGDN
26
More work to do Multiple bacterial secretion pathways
Shan Sundararaj July 22, 2004 More work to do Multiple bacterial secretion pathways C-terminal signal peptides Internal mitochondrial transit peptides Structural aspects of targeting Gene re-localization Still a lot to discover in how signaling works! C-to-N translocation into the ER Co-translational import into mitochondria (ribosomes can bind to mitochondria) Lecture 4.0 (c) 2004 CGDN
27
Computational methods for predicting localization
Shan Sundararaj July 22, 2004 Computational methods for predicting localization Expert rule based methods Artificial Neural Nets (ANN) Hidden Markov Models (HMM) Naïve Bayes (NB) Support Vector Machines (SVM) Combination of above methods Lecture 4.0 (c) 2004 CGDN
28
Naïve Bayes Assumption: Structure: Prediction:
Shan Sundararaj July 22, 2004 Naïve Bayes Assumption: Features are conditionally independent, given class labels Structure: 1 level tree Class labels — root Features — leaf nodes Prediction: class(f) = argmax P(C=c)P(F=f | C=c) c C F1 F2 F7 … Lecture 4.0 (c) 2004 CGDN
29
Artificial Neural Network
Shan Sundararaj July 22, 2004 Artificial Neural Network Excellent for modeling non-linear input/output relationships Robust to noise in training data Widely used in bioinformatics … Input Hidden Output Lecture 4.0 (c) 2004 CGDN
30
Support Vector Machines
Shan Sundararaj July 22, 2004 Support Vector Machines Input vectors are separated into positive vs. negative instance Map to new feature space Find hyperplane that best separates the two classes by distance x w Half-space: w.x + b < 0 Class: -1 w.x + b > 0 Class: +1 Hyperplane: w.x + b = 0 margin Lecture 4.0 (c) 2004 CGDN
31
Evaluating Predictors - Precision
Shan Sundararaj July 22, 2004 Evaluating Predictors - Precision Predicted + - TP FN FP TN True # of proteins correctly labeled as “cyt” divided by the total # of proteins labeled as “cyt” How often the label is correct If there are 90 proteins correctly labeled as “cyt”, and 10 proteins incorrectly labeled as “cyt”, then the precision is 90/100 = 0.90. Lecture 4.0 (c) 2004 CGDN
32
Evaluating Predictors - Sensitivity
Shan Sundararaj July 22, 2004 Evaluating Predictors - Sensitivity Predicted + - TP FN FP TN True # of proteins correctly labeled as cytoplasmic divided by the total # of proteins that are cytoplasmic “How many of the true results were retrieved” (also called “recall” or “accuracy”) Lecture 4.0 (c) 2004 CGDN
33
Predictions from known data
Shan Sundararaj July 22, 2004 Predictions from known data Different information used for predictions: Sequence motifs N-terminal: secretory signal peptides, mitochondrial targeting peptide, chloroplast transit peptide C-terminal: peroxisome import signal, ER retention signal Mid-sequence: nuclear localization signals Amino acid composition AA frequency, dipeptide composition, hydrophobicity Homology - Sequence comparison to proteins of known localization Lecture 4.0 (c) 2004 CGDN
34
TargetP, SignalP, *P http://www.cbs.dtu.dk/services/
Shan Sundararaj July 22, 2004 TargetP, SignalP, *P Sequence-based methods TargetP (85-90% recall) Predicts mitochondria/chloroplast/secreted Contains SignalP and ChloroP LipoP lipoproteins and signal peptides in Gram negative bacteria SecretomeP non-classical secretion in eukaryotes Put out by the CBS (Centre for Biological Sequence Analysis at the Technical University of Denmark TargetP: predicts existence of mitochondrial targeting peptide (mTP), chloroplast transit peptide (cTP) or signal peptide (SP) – (uses ANN) SignalP v.3.0 just came out, considered the most accurate signal peptide predictor, actually used to reannotate several bacterial genes in Swiss-Prot as more accurate than experimental signal peptide determination SecretomeP: Non-classical secretion is mostly fibroblast growth factors (FGF), interleukins and galectins ANN method used, Lecture 4.0 (c) 2004 CGDN
35
SignalP result Common structure of signal peptides:
Shan Sundararaj July 22, 2004 SignalP result Common structure of signal peptides: positively charged n-region, followed by a hydrophobic h-region and a neutral but polar c-region. Cleavage site Prediction: Signal peptide Signal peptide probability: 0.945 Signal anchor probability: 0.000 Max cleavage site probability: between pos. 28 and 29 Lecture 4.0 (c) 2004 CGDN
36
Organellar Prediction
Shan Sundararaj July 22, 2004 Organellar Prediction Predotar ( (80% recall) Mitochondrial and plastid sequences; N-terminal sequences MitoPred ( (82% recall) Mitochondrial; PFAM domains, AA composition MitoProteome ( Database of experimentally predicted human mitochondrial MitoP ( Combines data from multiple experimental and computational sources to give a consensus score for each “mitochondrial” protein in yeast and human MitoP Database of yeast, human, and Neurospora mitochondrial proteins Lecture 4.0 (c) 2004 CGDN
37
The PSORT Family PSORT – plant sequences
Shan Sundararaj July 22, 2004 The PSORT Family PSORT – plant sequences Expert rule-based system PSORT II – eukaryotic sequences Probabilistic tree iPSORT – eukaryotic N-term. signal sequences ANN PSORT-B – bacterial sequences WoLF PSORT – eukaryotic Updated (2005) version of PSORTII PSORT – expert rule-based system PSORT II – probabilistic tree iPSORT – ANN PSORT-B – mix of several WoLF PSORT - sorting signal motifs and some correlative sequence features such as amino acid content and sequence homology Lecture 4.0 (c) 2004 CGDN
38
PSORT-B http://www.psort.org/psortb/
Lecture 4.0
39
PSORT-B - methods Signal peptides: Non-cytoplasmic
AA composition/patterns SVM’s trained for each location vs. all other locations Transmembrane helices: Inner membrane HMMTOP PROSITE motifs: all localizations Outer membrane motifs: Outer membrane Homology to proteins of known localization SCL-BLAST Integration with a Bayesian network Lecture 4.0
40
PSORT-B results SeqID: Unannotated_bacterial2 Analysis Report:
CMSVM Unknown [No details] CytoSVM Cytoplasmic [No details] ECSVM Unknown [No details] HMMTOP Unknown [No internal helices found] Motif Unknown [No motifs found] OMPMotif Unknown [No motifs found] OMSVM Unknown [No details] PPSVM Unknown [No details] Profile Unknown [No matches to profiles found] SCL-BLAST Cytoplasmic [matched : Cyto. protein] SCL-BLASTe Unknown [No matches against database] Signal Unknown [No signal peptide detected] Localization Scores: Cytoplasmic CytoplasmicMembrane Periplasmic OuterMembrane Extracellular Final Prediction: Lecture 4.0
41
Proteome Analyst http://www.cs.ualberta.ca/~bioinfo/PA/Sub/
Lecture 4.0
42
Proteome Analyst - Method
Prediction Unknown Sequence Predicted Class >?<Fly_01… MDLRATSSND… … >Cytoplasm<Fly_01 Training Sequences Machine Learning Algorithm Classifier >Extracellular<AFP1_BRANA… MAKSATIVTL … >Etracellular<AFP2_RAPSA… ACRAGMEEP… … Lecture 4.0
43
Proteome Analyst - Feature Extraction
Homolog >AFP1_ARATH >AFP1_HUMAN >AFP1_SINAL … Sequence MAKSATIVTL … PSI-BLAST Swiss-Prot Feature Lecture 4.0
44
Proteome Analyst: Feature Extraction
TOP 3 Homologs AFP1_ARATH AFP1_BRANA AFP2_ARATH KW Plant defense; Fungicide; Signal; Multigene Family; Pyrrolidone carboxylic acid DR: InterPro IPR002118; IPR003614 CC: Subcellular location Secreted Token Set: {Plant defense; Fungicide; Signal; Multigene Family; Pyrrolidone carboxylic acid; IPR002118; IPR003614; Secreted} Lecture 4.0
45
Contribution of each token
Shan Sundararaj July 22, 2004 PASub - Results Contribution of each token Red bar is for token “secreted”, it has greatest effect on extracellular bar “reduced residual”: combination of all other tokens “reduced prior”: correction based on size of data sets E-value cutoff of 0.001 Log scale Features Lecture 4.0 (c) 2004 CGDN
46
PASub - Interpretation
Bars represent -log probability, so a little difference is a lot! Naïve Bayes chosen as classifier because of transparency of method Each token gives a probability that can be summed and shown graphically Neural network actually has higher recall Can change token set, ask to explain with different features Lecture 4.0
47
Save Time: Pre-computed Genomes
PSORTDB Browse, search, BLAST, download 103 Gram –ve bacteria, 45 Gram +ve bacteria Proteome Analyst (PA-GOSUB) 15 bacterial and 8 eukaryotic Lecture 4.0
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.