Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Novel Approach for Analyzing Kinetic Data from Variants of a Calcium-Binding Protein Third Biennial Undergraduate Statistics Project Competition 2011.

Similar presentations


Presentation on theme: "A Novel Approach for Analyzing Kinetic Data from Variants of a Calcium-Binding Protein Third Biennial Undergraduate Statistics Project Competition 2011."— Presentation transcript:

1 A Novel Approach for Analyzing Kinetic Data from Variants of a Calcium-Binding Protein Third Biennial Undergraduate Statistics Project Competition 2011

2 Research Focus The overall focus of this project is to gain a more precise understanding of the physiological role of the sarcoplasmic calcium-binding protein (SCP) in invertebrate muscle relaxation. In the species studied (the freshwater crayfish Procambarus clarkii), this protein is composed of two subunits, each of which has three calcium- binding sites. Two of these sites bind calcium (Ca 2+ ) exclusively. Four of these sites can bind Ca 2+ or magnesium. Calcium-Specific Sites Calcium/Magnesium Sites Single SCP Subunit

3 Muscle Contraction/Relaxation For each muscle cell, contraction is triggered by the release of large quantities of Ca 2+ from intracellular storage sites. This molecule interacts with and activates a series of proteins, which leads to the generation of force. For relaxation to occur, Ca 2+ must be returned to the storage sites. The exact pathway by which this occurs is currently unknown. In invertebrates, SCP has been proposed to assist with the contraction/relaxation cycle by one of two mechanisms: 1.SCP actively transports calcium from the cytoplasm back to storage. This role directly promotes muscle relaxation. 2.SCP acts as a Ca 2+ -buffer, meaning that it binds Ca 2+ but does not interact with other proteins. This allows it to regulate the total level of Ca 2+ present during contraction/relaxation cycles without directly promoting relaxation.

4 Prior Work Three variants of SCP have been identified in P. clarkii (pcSCP1a, pcSCP1b, pcSCP1c). Reducing the amount of pcSCP in living P. clarkii causes significant deficits in their level of activity and physical response to stimulation (Two sample T test, T = 3.86, p-value = 0.002). pcSCP has been found to be highly expressed in tail muscle tissue. However, studies of the expression patterns of pcSCP variants have revealed no distinctions between the variants in different tissues (ANOVA block design, F = 0.9638, p-value = 0.3868).

5 This Project The purpose of this project has been to characterize the Ca 2+ - binding kinetics of pcSCP, in order to determine if the difference between these protein variants is biochemical. For all three variants of pcSCP, protein has been isolated and kinetic data has been collected. Three approaches have been utilized for the analysis of these data: 1.A formal comparison of the kinetic behavior for each variant using statistical inference techniques. 2.Determine and compare the kinetic parameters of pcSCP calcium-binding activity using dose-response curve fitting. 3.Compare the overall behavior of pcSCP variants using principal components and multivariate classification techniques.

6 Biochemical Isolation of pcSCP E. coli was transformed with variant-specific pcSCP cDNA, and expression of the variants was induced in high quantities by exposure to IPTG. pcSCP proteins were separated from E. coli proteins by liquid chromatography. Analysis of pcSCP1c purity. The dark bands signify proteins of different sizes; the arrow indicates the band which represents pcSCP. (A) All protein collected from E. coli prior to purification. (B) pcSCP1c collected after purification. BA E. coli proteins (impurities) Impurites pcSCP1c

7 Tryptophan Fluorescence The amino acid tryptophan, which is a part of most proteins, is capable of fluorescing (a measureable phenomenon) after exposure to certain wavelengths of light. The fluorescence of molecules is highly sensitive to environmental changes. Combining these two facts, tryptophan fluorescence is commonly used for kinetic experiments. In this study, changes in fluorescent spectra were considered to be indicative of Ca 2+ binding events by pcSCP proteins.

8 Kinetic Measurements Individual aliquots of pcSCP variants were diluted in a buffer containing EGTA. This compound selectively binds Ca 2+ ions, allowing precise control over the concentration of Ca 2+ available. The fluorescence profile of each sample was determined by measuring fluorescent emission from 303 to 400 nm. After obtaining each fluorescence spectrum, a small aliquot of Ca 2+ was added to the solution, and another measurement taken. Fifteen measurements spanning the range from 10 -11 to 10 -5 M Ca 2+ were obtained in triplicate for each pcSCP aliquot. Three protein samples (one of each pcSCP variant) were tested in random order each time the experiment was performed.

9 Data Processing All spectra were standardized by subtracting the spectrum of a sample containing no protein. For curve-fitting and formal statistical analysis, each individual spectrum was integrated, then all spectra obtained using the same aliquot were standardized to be between zero (no Ca 2+ bound) and one (Ca 2+ saturated). An example of data from a single fluorescence experiment with pcSCP1a. Left, standardized spectra. Right, fully standardized fluorescence.

10 Data All data, shown in standardized form. Points connected by a line were obtained from the same sample. According to biochemical theory, kinetic data for proteins with multiple sites will form a sigmoidal curve between 0 and 1. The kinetically relevant portion of the graph is the transition between plateaus. The data collected in these experiments was highly reproducible, but tended not to form plateaus. This is likely caused by structural changes unrelated to Ca 2+ - binding.

11 Initial Data Exploration Two formal inference procedures were used to compare the kinetic behavior of pcSCP variants: a Mack-Skillings test and a GLM, both blocking for concentration. Sensitivity analysis indicated that it was impossible to focus on kinetically relevant information when using the full data set, due to the overwhelming proportion of kinetically irrelevant information in the plateau regions. For formal inferences for block design procedures, only the most kinetically viable observations (all except the first four and the last three data points) were included in the dataset. An Anderson-Darling test for normality confirmed the normality of the data (AD = 0.5925, p-value = 0.1177).

12 Formal Statistical Inference Results Mack-Skillings and GLM procedures both indicated significant differences between pcSCP variants (MS = 13.95, p-value = 0.0009; F = 12.584, p < 0.001, respectively). Multiple comparisons procedures for Mack-Skillings, which emphasizes consistency of comparisons over magnitude, determined pcSCP1a to be significantly different from both pcSCP1b and pcSCP1c (both p-values < 0.05). Multiple comparisons procedures for the GLM, which emphasizes magnitude of comparisons over consistency, concluded that pcSCP1c was significantly different from both other variants (both p-values ≤ 0.0049). These results indicate differences between the pcSCP variants. However, these analyses are not satisfying, primarily because they did not take into account the relationship between concentration and fluorescence.

13 Dose-Response Curve Fitting To account for Ca 2+ concentration, dose-response curve fitting was employed. This is the most common approach used by biochemists studying protein kinetics. Curve fitting allows for the computation and comparison of kinetic parameters. The parameters of interest for comparing the pcSCP variants in this study were: 1.The dissociation constant, K D, a measure of attraction between protein and ligand (in this case Ca 2+ ). 2.Cooperativity, or interactions between binding sites which alters the K D of one site depending on whether a ligand is bound at another.

14 The standardized fluorescent data were fit to the widely applied log-logistic model: In this model, c is the bottom plateau; d is the top plateau; b is a measure of cooperativity; and e is the K D (the halfway point between plateaus). This equation was simultaneously fit to the data for each variant using the drc package in R. This program uses least squares to obtain initial values of b and e, using the transformation To control for the aberrant behavior seen at high and low concentrations of calcium, values of c and d were held at 0 and 1, respectively. Applying the Log-Logistic Model

15 Curve Fitting Results Fitted log-logistic models for pcSCP variants. FONT SIZE IS AN ISSUE IN THIS GRAPH, WOULD IT BE POSSIBLE TO REMAKE IT IN EXCEL?

16 Kinetic Parameter Estimation For all variant comparisons, selectivity indices indicated significant differences (all 3 p-values ≤ 0.0002) between K D parameters. Values of b were significantly different when comparing pcSCP1a to pcSCP1b and when comparing pcSCP1a to pcSCP1c (both p-values ≤ 0.0008). pcSCP1apcSCP1bpcSCP1c b1.480 ± 0.736*3.651 ± 2.704.962 ± 4.871 Log(K D ) -7.963 ± 0.172 -7.740 ± 0.074 -7.116 ± 0.150 *95% confidence intervals

17 Issues with Curve Fitting Lack-of-fit tests were significant, implying that the log-logistic model does not provide a good fit (F = 18.203, p-value < 0.001). The residual plot (right) reveals a systematic linear pattern, indicating departures from the model assumptions. The non-kinetic trends in the tails are likely responsible for part of this lack of fit, but these results are, overall, unsatisfying.

18 Multivariate Analysis Approach The unsatisfactory results of curve-fitting as well as concerns about losing information due to excessive standardization of the data led to the employment of multivariate techniques. This is a novel approach for studies of protein kinetics. The program Pirouette ® (Infometrix, Inc.) was used to perform multivariate exploratory and classification methods. The standardized spectra were used for both techniques. Prior to analysis, these spectra were pre-processed with a 15-point smooth, area normalization, and mean-centering.

19 Principal Components Analysis Principal components analysis was used as a method of exploratory analysis. This technique finds linear combinations of variables which account for the maximal amounts of variation, and plots the data using the first three principal components. This reduces the dimensionality of the data, allows the display of intersample relationships to be optimized, and can reveal natural clustering patterns.

20 PCA Scores Plots Two views of the PCA scores reveal clustering of the data along different factors by variant. Each point represents a single fluorescence spectrum: Pink, pcSCP1a; Blue, pcSCP1b; Orange, pcSCP1c. Factor 1, 94.7% of the variability in standardized spectra; Factor 2, 2.62%; Factor 3, 0.856%.

21 Impact of Ca 2+ Concentration PCA scores plots show separation by free calcium concentration. This confirms that fluorescence spectroscopy is measuring a response by pcSCP to increasing Ca 2+. Solid points represent spectra taken at the corresponding level of Ca 2+. Low Ca 2+ Intermediate Ca 2+ High Ca 2+

22 Classification Analyses Two classification techniques were used to analyze these data, k nearest neighbors, and soft independent modeling of class analogy. These techniques are based on the idea that the closer samples lie in a measurement space, the more likely they are to be in the same category. KNN classifies an unknown by computing its Euclidean distance to all categorized samples and polling the classes of the k closest samples, while SIMCA develops principal components models for each category and assesses fit of an unknown projected into the space of each model to determine its classification. These analyses provide a quantitative comparison of the variants by determining how precisely their classifications can be distinguished.

23 Classification Results Only 0.04% of samples were misclassified using KNN (K = 7, the optimal value). Only 0.03% of samples were misclassified using SIMCA (3 factors for each variant). For all misclassified samples, the second choice was correct. This ability to separately identify pcSCP variants’ fluorescence spectra confirms they are different. Predicted pcSCP1a Predicted pcSCP1b Predicted pcSCP1c Actual pcSCP1a 4131 Actual pcSCP1b 0441 Actual pcSCP1c 0144 Predicted pcSCP1a Predicted pcSCP1b Predicted pcSCP1c Actual pcSCP1a 4500 Actual pcSCP1b 0450 Actual pcSCP1c 0441 KNN SIMCA

24 Conclusions The analyses presented here provide the first indication of significant differences between pcSCP variants. Multivariate analyses provided the most complete and valid comparison of pcSCP variants. This novel approach is promising as a broadly applicable tool for the comparative analysis of protein kinetics, and its application should be further examined. The dose-response curve fitting presented here provides a start towards specific kinetic parameter estimation and inference. Currently, mutants of pcSCP have been generated and their kinetics are being characterized, in order to pinpoint variations responsible for differences in biochemical properties.


Download ppt "A Novel Approach for Analyzing Kinetic Data from Variants of a Calcium-Binding Protein Third Biennial Undergraduate Statistics Project Competition 2011."

Similar presentations


Ads by Google