Presentation is loading. Please wait.

Presentation is loading. Please wait.

Babak Alipanahi Prof. Ming Li CS882-Fall 2006 Beta-Barrel Discrimination.

Similar presentations

Presentation on theme: "Babak Alipanahi Prof. Ming Li CS882-Fall 2006 Beta-Barrel Discrimination."— Presentation transcript:

1 Babak Alipanahi Prof. Ming Li CS882-Fall 2006 Beta-Barrel Discrimination

2 2/60 Outline Outline: A tale of two barrels Membrane proteins A review of β-barrels Folding Mechanism Seven families, some examples Literature Review What I have done What will I do…

3 3/60 Two Kinds of Closed Barrels There are two kinds of closed barrels –α/β barrels (Globular) –β barrels (Transmembrane) These two types of proteins are similar in the way that in both types (Branden 99) –Similar structures have very different a.a. sequences –The function of protein is determined by the loops and not by strands or helices (α/β barrels only). (Actually, all strands and helices are only needed to form the barrel and usually β strands and α helices are structurally equivalent) They are different in the way that –In α/β barrels, β strands are parallel and are connected to each other by α helices while in β barrels they are anti-parallel and are connected to each other by (usually) simple loops –They have a very fundamental difference (actually this is the important difference between all transmembrane and globular proteins ). I will come back to this later…

4 4/60 An example of α/β Barrel (Branden 99) In the right picture, β-Core of Glycolate Oxidase (8 β - stranded α/β barrel which is in an enzyme) is depicted. Note that all β -strands are parallel The eight-stranded α/β barrel is one of the largest and most regular of all domain structures At least 200 a.a. are required for formation of this structure Most of them are enzymes with completely different a.a. sequences and diverse functions

5 5/60 An example of α/β Barrel (cntd.) As it can be seen, parallel β strands are connected to each other by α helices Eight β strands enclose a tightly packed hydrophobic core formed entirely by β strands side chains The active site in all α/β barrels is formed by loops at one end of the barrel

6 6/60 β-Barrels β-barrel proteins are found in the outer membranes of Gram- negative bacteria, mitochondria and chloroplasts (Schulz 00) It has been hypothesized that possibly most of integral outer membrane proteins of mitochondria and chloroplasts are β- barrels because these are relics of their evolutionary history as symbiotic intracellular Gram-negative bacteria (Wimley 03) Abundant mitochondrial voltage-dependent anion channel (VDAC) has been long been thought to be a β-barrel (Wimley 03)

7 7/60 Membrane Proteins Hallmark of Gram-negative bacteria is their cell envelope which has two membranes (inner and outer, called IM and OM respectively) separated by periplasm (Ruiz 05) Image from Nature

8 8/60 Membrane Proteins The structure, function, and composition of IM and OM is dramatically different. IM is in direct contact with cytoplasm and periplasm while OM is in contact with extracellular environment (Ruiz 05) Image from Nature

9 9/60 Analysis of E. coli cell envelope: IM (Ruiz 05) IM, which is the major permeability barrier between cell’s inside and outside (Tamm 04), is a bilayer composed of phospholipids (PL) and proteins: –Integral IM proteins: Span the IM with α-helical transmembrane domains –Lipoproteins: Anchored to outer leaflet of IM by lipid modifications of the N-terminal All of the membrane-bound biochemical process that occur in eukaryotic cells such as oxidative phosphorylation, lipid biosynthesis and protein translocation, occur in IM (Ruiz05). In other words, most membrane-associated metabolic functions are carried out in IM (Tamm 04) It should be noted that surface of integral IM proteins is less hydrophobic than OM proteins and they have less complex folding mechanism (Tamm 04)

10 10/60 10% of the cell volume is occupied by periplasm that is comprised of soluble proteins and peptidoglycan layer. Periplasm is an oxidizing environment and contains enzymes that catalyse formation of disulphide bonds Periplasm is ATP free, so all the activities are done in absence of an obvious energy source Peptidoglycan functions as an extracytoplasmic cytoskeleton and prevents cell from lysing in dilute environments Analysis of E. coli cell envelope: Periplasm (Ruiz 05)

11 11/60 OM is unique in a sense that unlike most other eukaryotic and prokaryotic membranes,it is asymmetric. Upper and lower leaflets composed of mainly LPS 1 and PL respectively OM functions as a selective barrier and inhibits entry of toxic and unwanted molecules which is a crucial task for bacterial survival in many (possibly hostile) environments. For example, E. Coli is resistant to bile salts which helps bacteria to live in intestines There are two kinds of proteins in OM: –Lipoproteins: 90% of lipoproteins are in OM –β-barrels: These are called OM proteins (OMP). Some of them act as channel. Since the membranes are impermeable to hydrophilic solutes; these channels are necessary for nutrient intake and excretion of toxic waste products (we will revisit OMPS diverse functions later) Analysis of E. coli cell envelope: OM (Ruiz 05) 1 : Lipopolysacharide

12 12/60 Barrel Construction Principles (Schulz 00) 1.“The number of β strands is even and both N and C terminal are at the periplasmic barrel end” 2.“The β -strand tilt is always around 45° and corresponds to the common β- sheet twist. Only one of the two possible tilt directions is assumed, the other one is an energetically disfavored mirror image” 3.“All β strands are anti-parallel and connected locally to their next neighbors along the chain, resulting in a maximum neighborhood correlation” Image from Schulz 00 OmpX, a defense protein which is a toxin binder

13 13/60 Barrel Construction Principles (cntd.) 4.“The shear number of an n- stranded barrel is positive and around n+2, in agreement with the observed tilt” 5.“The strand connections at the periplasmic barrel end are short turns of a couple of residues named T1, T2 and so on” 6.“At the external barrel end, the strand connections are usually long loops named L1, L2 and so on” Images from (Waldispühl 06) with complete modifications

14 14/60 Barrel Construction Principles (cntd.) 7.“The β -barrel surface contacting the nonpolar membrane interior consists of aliphatic side chains forming a nonpolar ribbon with a width of about” 27 Å (Tamm 04) 8.“The aliphatic ribbon is lined by two girdles of aromatic side chains, which have intermediate polarity and contact the two nonpolar–polar interface layers of the membrane” 9.“The sequence variability of all parts of the β barrel during evolution is high when compared with soluble proteins” 10.“The external loops show exceptionally high sequence variability and they are usually mobile”. “The loops exhibit the largest sequence variability and thus contain the most of functional characteristics of each protein…” (Tamm 04) Image from (Wimley 02) with complete modification

15 15/60 β-Barrels folding mechanism (Tamm 04) Folding and membrane insertion of OmpA –Unfolded state U hydrophobically collapses intro intermediate water soluble state I W –This intermediate chain binds to membrane and forms intermediate state I M1 –I M1 proceeds to intermediate state I M2 or molten disk. Some part of β-strands are formed in this state – Next, four Trps on the four beta hairpins move to center of bilayer (intermediate state I M3 ) – I M3 is more globular and is called molten globule but still has not reached its native tertiary structure Folding and membrane insertion are coupled processes Membrane interface is involved in the folding Image from (Tamm 04) Blue balls are Tryptophan (Trp) in the above image. Technique used for finding these steps is Time-resolved Trp Fluorescence Quenching (TDFQ)

16 16/60 Assisted folding of β-Barrels (Tamm 04) As told before periplasmic region is ATP free, so during the evolutionary process, mechanisms have been devised that let OMPs spontaneously insert into OM after being translocated to periplasm Two periplasmic proteins have been proposed for helping β- barrels folding process: –Skp is a soluble protein that can also bind to phospholipid bilayer. Three or four Skps bind to a newly synthesized and unfolded OMP immediately after it is translocated through IM and act as a passive chaperon (remember that periplasmic region is ATP free) and prevent aggregation. But this protein does not assist folding process –SurA is a periplasmic peptidyl-prolyl isomerace that has been shown to assist the folding of OMPs. Experiments show that “Sequences containing aromatic-random-aromatic motifs bind particularly to SurA”. It has a long 50 Å docking cleft for accommodating unfolded peptide chains

17 17/60 Features of OMPs Nearly 2~3% of genes in Gram-negative bacteria genomes encode β barrels. In E. Coli genome, 60 proteins are annotated as known or probable OMPs (Wimley 03) Average length of β-strands is 11 a.a. residues in trimeric porins and 13-14 residues in monomeric β-barrels (Tamm 04) Regarding the 40~45° tilt of β-barrels from membrane normal, the average rise per residue is 3.8*sin(45) which is 2.7 Å rise per residue (Tamm 04) Most OMPs lack Cysteines so no possible disulphide bonds in the OMPs

18 18/60 Interior facing TM β-strands of β-barrels are rich in small and polar a.a. such as glycine (Gly) threonine (Thr), serine (Ser), asparagine (Asn) and glutamine (Gln). (Tamm 04), (Wimley 03) 40% of lipid exposed residues are aromatic (Wimley 03), also aromatic residues tyrosine (Tyr) and tryptophan (Trp) are abundant in loop regions (Tamm 04) Images from (Wimley 03) Features of OMPs (cntd.)

19 19/60 Six families of OMPs (based on Tamm 04) General Porins: porins typically control the diffusion of small metabolites like sugars, ions, and amino acids Passive Transporters: these proteins are selective passive transporters of maltose, sucrose and fatty acids Active Transporters of Siderophores and Vitamin B12: They receive their energy through interaction with IM proteins Enzymes: proteases and phospholipases Defensive Proteins: fight hostile molecules Structural Proteins: membrane anchors Toxins (non-constitutive): kill target cell

20 20/60 Some examples of OMPs Name: OmpA β-Strands: 8 Oligometric State: monomer Organism: E. Coli Residues: 171 Function: Structural protein Features: –The residues inside the barrel are so tightly packed that lumen inside is filled with polar side chains that interact with each other through some Hydrogen bonds and electrostatic reactions. Groups of water molecules are also can be found in the lumen –They link the outer membrane to the periplasmic peptidoglycan, in other words they are some kind of membrane anchors –“Extensive mutagenesis studies show that OmpA is quite robust agianst many mutations especially in the loop, turn and bilayer facing area.” Surprising fact is that transmembrane spanning domain of OmpA “can even be circularly permutated without impairing its assembly and functions” (Tamm 04)

21 21/60 Name: FepA β-Strands: 22 Oligometric State: monomer Organism: E. Coli Residues: 724 Function: iron transporter (active transporter) Features: FepA which is a TonB- dependent active Fe-siderophore transporter, uses metabolic energy through interaction with IM proteins. C- terminal forms the β-barrel domain while the N-terminal forms a hatch domain that plugs the barrel and regulates iron transport (Tamm 04), (Wimley 03) Some examples of OMPs

22 22/60 MspA: a very long porin Name: MspA β-Strands: 8x2 Oligometric State: octamer Organism: M. smegmatis Residues: 184 Function: mycobacterial porin Features: It has two sequential β- barrels of different diameter, the narrow barrel which has a hydrophobic surface which is 37Å long, because mycobacteria’s membrane do not contain LPS but very long mycolic fatty acids. It should be noted that members of mycobacteria cause tuberculosis (Tamm 04) Bottom image from (Tamm 04)

23 23/60 TolC: involved in multi-drug resistance Name: TolC β-Strands: 3x4 Oligometric State: trimmer Organism: E. Coli Residues: 428 Function: active export channel Features: TolC is a small molecule transporter that is involved in multi- drug resistance of bacteria (it facilitates drug efflux (Bigelow 04)). It derives its energy from its interactions with IM proteins. Lumen of β-barrel is connected to the lumen of an α-helical bundle that extends through periplasm to IM (i.e. a direct path to cytoplasm) (Wimley 03), (Tamm 04)

24 24/60 OmpLA: an enzyme Name: OmpLA Β-Strands: 12 Oligometric State: dimmer Organism: E. Coli Residues: 269 Function: enzyme Features: Phospholipase OmpLA is only active in the dimmer form. Active site is at the outer edge of barrels and in the interface between two barrels. It role is possibly hydrolyzing the PL that have migrated to extracellular leaflet of OM, where normally they should not be there (Tamm 04), (Wimley 03) Active site

25 25/60 α-Hemolysin : a deadly toxin Name: TolC β-Strands: 7x2 Oligometric State: heptamer Organism: S. aureus Residues: 293 Function: toxin Features: This toxin is secreted as monomeric protein that ultimately forms a 14-stranded β-barrel with each monomer contributing a β-hair pin to the heptamer. After insertion into the victim cell’s membrane, they form an ungated pore that leads to osmotic cytolysis. Note that how clean is the pore (Wimley 03), (Tamm 04)

26 26/60 Β-barrel discrimination: Literature review The research done on β-barrels can be categorized into two major groups (both of them rely only on a.a. sequence): –Secondary structure (herein after: S.S.) prediction –Discrimination of β-barrels from globular and IM proteins Usually, most methods for secondary structure prediction also provide a side-kick algorithm for discrimination because: –Unlike globular (water soluble) proteins that have a hydrophobic core and a hydrophilic surface, β-barrels have a hydrophilic core (interior wall of lumen) and a hydrophobic surface (lipid exposed) –Two very similar β-barrels can have very different sequences that do not show even little signs of homology Discrimination accuracy of α-helical TM proteins from non- α- helical TM proteins is very high (99% accuracy is reported) because of their unique features (Hirokawa 98)

27 27/60 Some definitions After a.a. sequence is feed into discrimination algorithm, it determines whether it is an OMP (positive) or not (negative). A positive answer, can be true (true positive, TP) or false (false positive, FP). likewise a negative answer can be true (true negative, TN) or false (false negative, FN). So, we define: –TP: # of correctly classified OMPs –TN: # of correctly classified non-OMPS –FP: # of non-OMPs classified as OMP –FN: # of OMPs classified as non-OMP

28 28/60 Some definitions (cntd) Sensitivity (SEN): fraction of OMPs correctly discovered by the algorithm. this shows the ability to correctly predict OMPs (Park 05) Specificity (SPC): fraction of correctly discovered OMPs. This shows the ability to reject non-OMPS (Park 05) A dumb algorithm that declares every input to be OMP will have sensitivity of 100% and specificity of 0%! Some people really cheat! we will see…

29 29/60 Some definitions (cntd) Overall accuracy (ACC) is very useful for determination of overall performance, but it is not enough. Our dumb algorithm will have a 50% accuracy! (assuming # of OMPs and non-OMPs are the same)

30 30/60 Matthews correlation coefficient (MCC) is a very powerful measure of performance. It is zero for completely random algorithms (our dumb algorithm’s MCC is zero) and a perfect algorithm’s MCC is one (Park 05) Some definitions (cntd)

31 31/60 Prediction approaches (1) Profile-based HMMs: HMM is trained by sequence profiles computed from a multiple sequence alignment. Two major studies are –(Martelli 02): A very successful and highly cited research. In this study, every residue can be either loop or β-strand. Discrimination is done by calculating posterior probability of sequence based on the given model. S.S. prediction accuracy is 84%, discrimination accuracy (ACC) is 84% and false positive rate is 10% (SEN=90%) –(Bigelow 04): The algorithm, PROFtmb, is mainly based on (Martelli 02) with some modifications, like having four state for each residue: up- strand, down-strand, periplasmic- loop and outer-loop. S.S. prediction accuracy is 86%, SPC=100% and SEN=45%

32 32/60 Prediction approaches (2) (Zhai 02): in β-barrel finder (BBF), hydropathy and amphipathicity values are used for discrimination. A sliding-window of size seven residues is used to calculate hydropathy and amphipathicity values for all a.a. in the protein sequence. Since the resulting function is noisy, it is averaged over multiple aligned sequences. They claim that every TM β-strand corresponds in position to a peak of hydropathy and one of amphipathicity

33 33/60 (Waldispühl 06): This method, uses pairwise interstarnd residue statistical potential derived from globular proteins for prediction of super-secondary structure of OMPs. transFOLD algorithm employs a generalized HMM (multi-tape S-attribute grammar (MTSAG)) to describe potential β-barrel structure and then computes the minimum free energy by dynamic programming –They claim that unlike other approaches, they consider long range interactions between residues –S.S. prediction accuracy is 79% but rate of correctly predicted structures is 93% –For OMP discrimination, they use four parameters: sequence length, folding pseudo-energy in water-filled and non-water-filled lumen model and overall hydrophobicity. Discrimination is performed by SVM. SEN=88% and SPC=63% and ACC=75% Prediction approaches (3)

34 34/60 Neural Network based (Jacoboni 01): This work has been cited many times and is highly appreciated as one of the first reasonably good prediction methods –A feed-forward neural network is implemented and trained using the error back-propagation algorithm for discrimination of β-strands from extra membrane regions (i.e. a two state prediction, β-strand or non-β- strand) –Evolutionary information is given as input in form of sequence profile after multiple-sequence alignments –S.S. prediction accuracy is nearly 78% Prediction approaches (4)

35 35/60 Methods based on peptide and dipeptide composition In these methods, abundance of single a.a. or a.a. pairs is used for discrimination of OMPs It has been shown that a.a. and a.a. pair composition is reasonably different in OMPs and non-OMPs Methods using a.a.composition as classification features, have much better performance in comparison to methods using other features such as hydrophobicity or posterior probability in HMM-based methods With these features at hand, several techniques have been applied for classification such as k-nearest neighbors (k-NN), SVM, simple a.a. weighting and neural network

36 36/60 Methods based on peptide and dipeptide composition (cntd) a.a. abundance in lipid exposed and barrel interior (Wimley 02): in this research a clever observation made that the relative abundance of a.a. (relative to whole genome) in interior and lipid exposed areas are very different If we show a lipid exposed a.a. by E and barrel interior a.a. by I, a β-strand will have this pattern: –…EIEIEIEIEIE… Images from (Wimley 03)

37 37/60 (Wimley 02) (cntd.) In Aj+i is I assumption, it is assumed that a.a. j+i in sequence is barrel interior facing a.a. so it will be scored based on barrel interior a.a. relative abundance table and vice versa It has been assumed that β-strand length is 10 but this is not so realistic No performance measure is given

38 38/60 k-NN: (Garrow 05) in TMBhunt, features are comp(i) values. For a new query, its k nearest neighbors are found (by calculating the Euclidian distance) and by majority vote, its class is identified. Performance is reinforced by including differentially weighted a.a., evolutionary information and by calibrating the scoring system. SEN=91%, SPC=93.8% and ACC=92.5% (these results were doubted in (Park 05) to be 89.2 %) sum-of-deviations: (Gromiha 04) in this study, the average comp(i) in all proteins for each class (OMP or non-OMP) is computed. For a new query, comp(i) values are computed and the absolute value of deviation comp(i) from each class is computed. The query is of the type that has less total deviation from (They could use Euclidian distance which is more meaningful). SPC=80%, SEN=84% Methods based on peptide and dipeptide composition (cntd)

39 39/60 sum-of-deviations: (Gromiha 05-a) this study is virtually the same is the previous one but the new algorithm works only with averaged dipeptide abundance values ( dipep(i,j) ). For a new query, dipep(i,j) values are computed (400 values) and then weighted with regard to pre-calculated dipeptide abundance difference table for OMPs and non-OMPs (only globular proteins). Finally the decision is made based on the sign of the summation of weighted terms. SEN=94.7%, SPC=79.2% and ACC=84.8%. Major problem of this method is that training data has not been filtered for homologous sequences giving overestimated results Neural-Network: (Gromiha 05-b) discrimination method is exactly the same as (Gromiha 04) but they have introduced neural network for S.S. prediction that has a prediction accuracy of 73.2% Methods based on peptide and dipeptide composition (cntd)

40 40/60 Methods based on peptide and dipeptide composition (cntd) SVM: (Park 05) (note: Gromiha is the second author!) sequences used for training are filtered by all-to-all sequence similarity check using CD- HIT (Li 01) that produces a non-redundant protein data base. They used SVM with radial basis function (RBF) kernel for discrimination. This study is actually the first organized study with well-defined definitions and representation of results They use composition values (xC means that x comp(i) values have been used for discrimination ) and dipeptide values (yD means y dipep(i,j) values has been used). x and y are found using backward and forward feature selection algorithms I have defined some notations for ease of results presentation –OMP: outer membrane proteins –TMH: trans membrane α-helices proteins –GLB: globular proteins –NOM: non-outer membrane protein So, OMP-TMH classification means discrimination of OMP and TMH proteins

41 41/60 Results of SVM-peptide composition method Prediction rate (%)SENSPCACCMCC OMP-TMH (15C)9992.795.90.920 OMP-GLB (17C+8D)8896.494.40.846 OMP-NOM (18C+10D)90.994.793.90.816 OMP-NOM (20C+400D)79.399.095.20.840 Results are better than any previous methods but are far from the accuracy rates for TMH set (99%) It is interesting that the discrimination between OMP and NOM (which is TMH+GLB) is less than each of OMP-TMH and OMP-GLB. Also, OMP-TMH has the highest discrimination rate

42 42/60 What I have done: 1-Data Set The data set I have used is the same as study done by (Park 05) which has been shown that be one of the most comprehensive and challenging data sets that contain –208 non-homologous OMPs –206 non-homologous TMHs –673 non-homologous GLBs that consist of 155 all α proteins 156 all β proteins 184 α+β proteins 179 α/β proteins For finding the optimal features, I first started with a.a composition ratios (20C), then added sequence length (L) and finally I found that β-strand score (B) (as defined in (Wimley 02)) can enhance the performance

43 43/60 Averaged a.a. composition

44 44/60 β-strand quality factor I have assumed that mean β-strand length is 12 because it is the best choice for covering all β-barrels (including newly discovered ones) β-factor is calculated (and is called B feature) by summing squared values of β-strand quality factor for all residues

45 45/60 What I have done: 2-Feature Selection There is a very useful and usual scaling insensitive measure for linear classification that can give some information even for non-linear classification called Fisher Discrimination Ratio (FDR) which is defined as (Park 05):

46 46/60 FDR values for all the features

47 47/60 FDR values for all features for OMP and TMH classification

48 48/60 Two good features: β-factor and sequence length

49 49/60 Another good feature: Serine composition

50 50/60 3-Algorithms used for prediction I have used several algorithms for classification including : –Support Vector Machine (SVM): SVM with radial basis function (RBF) kernel –Locally Linear Neurofuzzy Model (LLNM): LLNM with locally linear model tree (Lolimot) model construction method –Neural Network: multi-layer perceptron (MLP) feed-forward network with error back propagation learning algorithm The prediction accuracy is nearly the same for all algorithms so none has clear advantage over the others, however since SVM is much faster, I have chosen it A very possible danger when using powerful algorithms is overfitting that that destroys the generalization capability. When training dataset is small, overfitting is a fatal risk To avoid overfitting, usually n-fold cross validation is used specially when the training data set is small –Data set is divided into n subsets, at each step algorithm is trained by n-1 subsets and validated by the remaining 1 subset. This process is repeated for all n subsets and performance is averaged over all n experiments

51 51/60 A little note about over-fitting

52 52/60 Another important factor: Scaling It has been shown that most machine learning algorithms are sensitive to scaling, especially when different features have different natures. For example sequence length (with a mean of 550) has nothing to do with composition ratios (mostly in the order of 0.1) To scale the data, usually data is scaled to [-1 1], so all data points lie within a n-dimensional hypercube (n is dimension of data or # of features) A really common mistake is to scale the training and validation data at the same time which leads to better but wrong results Finally, performance measures highly depend on the data set used (I will give an example why later). So, results of different studies are not easily comparable to each other

53 53/60 4-Performance results Prediction rate (%)SENSPCACCMCC OMP-NOM (L+20C+B)8597.695.10.844 OMP-NOM (L+20C+B)-(Xdata) 92.696.595.30.888 OMP-NOM (20C)77.595.691.80.7347 OMP-NOM (18C+10D)-(Park)90.994.793.90.816 OMP-TMH (20C+B)9693.8950.899 OMP-GLB (L+20C+B)88.398.195.70.883 TMH-NTM (L+20C+10D+B)85.999.396.60.892

54 54/60 Estimated Pr(error) for all proteins The above figure, is not very promising. Pr(error)=1 means that whenever the protein is not in the training set, it will be classified incorrectly

55 55/60 What will I do? I am looking to improve the prediction accuracy rate up to 99%, so like TMH discrimination, the research on discrimination finishes, to do so –I will examine the proteins that are always wrong and determine the major weakness of the features that leads to the wrong decision –Perform a large and sophisticated feature selection –Possibly, add some new features Also, I want to use three state-of-the-art newly proposed algorithms that improve classification accuracy –Metric Learning –Metric Learning by collapsing classes –Metric Learning for Large Margin Nearest neighbor classification

56 56/60 New classification methods Metric Learning: In metric learning, a linear transformation L is found so that the distance between similar transformed data points (data points in the same class) is reduced while the distance between different transformed data points (data points in different classes) is increased. Classification then will be done on the transformed data points (Xing 02) Metric Learning by collapsing classes: similar to the previous method, a linear transformation is found but the difference is that in this approach, the final goal is that all similar transformed data points collapse to a single point and pushing other classes’ data points infinitely away from this point (Globerson 05)

57 57/60 New classification methods (cntd) Metric Learning for Large Margin Nearest neighbor classification: In this study, the linear transformation is trained with the goal that all the k-nearest neighbors always belong to the same class while data points from other classes be far enough and do not “invade” the local neighborhood of data points of other classes (Weinberger 06) In the figure on the right, local neighborhood is purified after training image from (Weinberger 06)

58 58/60 References (Bigelow 04) Bigelow,H.R., Petrey,D.S., Liu,J., Przybylski,D. and Rost,B. (2004) Predicting transmembrane beta-barrels in proteomes. Nucleic Acids Res., 32, 2566–2577 (Branden 99) Branden,C. and Tooze,C. (1999) Introduction to Protein Structure. Garland Publishing Inc., New York. (Globerson 05) Amir Globerson, Sam Roweis (2005), Metric Learning by Collapsing Classes, Neural Information Processing Systems 18 (NIPS'05). pp. 451-458 (Gromiha 05-a) Gromiha MM, Suwa M., 2005. A simple statistical method for discriminating outer membrane proteins with better accuracy. Bioinformatics, doi:10.1093/bioinformatics/bti126. (Hirokawa 98) Hirokawa,T., Boon-Chieng,S. and Mitaku,S. (1998) SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics, 14, 378–379 (Jacobni 02) Jacoboni,I., Martelli,P.L., Fariselli,P., De Pinto,V. and Casadio,R. (2001) Prediction of the transmembrane regions of beta-barrel membrane proteins with a neural network-based predictor. Protein Sci., 10, 779±787.

59 59/60 References (cntd.) (Martelli 02) Martelli,P.L., Fariselli,P., Krogh,A. and Casadio,R. (2002) A sequence-profile-based HMM for predicting and discriminating beta barrel membrane proteins. Bioinformatics, 18, S46– S53. (Schulz 00) Schulz, G.E. 2000. _-Barrel membrane proteins. Curr. Opin. Struct. Biol. 10: 443– 447. (Tamm 04) Tamm LK, Hong H, Liang B. Folding and assembly of beta-barrel membrane proteins. Biochim Biophys Acta 2004;1666:250–263. (Weinberger 06) K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance Metric Learning for Large Margin Nearest Neighbor Classification. in Y. Weiss, B. Schoelkopf, and J. Platt (eds.), Advances in Neural information Processing Systems 18. MIT Press: Cambridge, MA (Wimley 02) Wimley, W.C. 2002. Toward genomic identification of _-barrel membrane proteins: Composition and architecture of known structures. Protein Sci. 11: 301–312. (Wimley 03) Wimley WC. The versatile beta-barrel membrane protein. Curr. Opin Struct Biol 2003;13:404–411. (Xing 02) E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press (Zhai 02) Zhai,Y. and Saier,M.H.,Jr (2002) The beta-barrel finnder (BBF) program, allowing identification of outer membrane beta-barrel proteins encoded within prokaryotic genomes. Protein Sci., 11, 2196-2207.

60 60/60 Thanks for your patience! ….any questions?

Download ppt "Babak Alipanahi Prof. Ming Li CS882-Fall 2006 Beta-Barrel Discrimination."

Similar presentations

Ads by Google