A Novel Approach for Analyzing Kinetic Data from Variants of a Calcium-Binding Protein Third Biennial Undergraduate Statistics Project Competition 2011.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Chapter 12 Multiple Regression
Statistical Methods Chichang Jou Tamkang University.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Introduction to Probability and Statistics Linear Regression and Correlation.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Multiple Regression Models
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Chemometrics Method comparison
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Chapter 12 Multiple Regression and Model Building.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Chapter 14 Introduction to Multiple Regression
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
AP Stat Review Descriptive Statistics Grab Bag Probability
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Metabolomics Metabolome Reflects the State of the Cell, Organ or Organism Change in the metabolome is a direct consequence of protein activity changes.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Academic Research Academic Research Dr Kishor Bhanushali M
1 UV-Vis Absorption Spectroscopy Lecture Measurement of Transmittance and Absorbance: The power of the beam transmitted by the analyte solution.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Stats Methods at IC Lecture 3: Regression.
23. Inference for regression
Model validation and prediction
The Practice of Statistics in the Life Sciences Fourth Edition
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
Undergraduated Econometrics
Λ-Repressor Oligomerization Kinetics at High Concentrations Using Fluorescence Correlation Spectroscopy in Zero-Mode Waveguides  K.T. Samiee, M. Foquet,
Essentials of Statistics for Business and Economics (8e)
Presentation transcript:

A Novel Approach for Analyzing Kinetic Data from Variants of a Calcium-Binding Protein Third Biennial Undergraduate Statistics Project Competition 2011

Research Focus The overall focus of this project is to gain a more precise understanding of the physiological role of the sarcoplasmic calcium-binding protein (SCP) in invertebrate muscle relaxation. In the species studied (the freshwater crayfish Procambarus clarkii), this protein is composed of two subunits, each of which has three calcium- binding sites. Two of these sites bind calcium (Ca 2+ ) exclusively. Four of these sites can bind Ca 2+ or magnesium. Calcium-Specific Sites Calcium/Magnesium Sites Single SCP Subunit

Muscle Contraction/Relaxation For each muscle cell, contraction is triggered by the release of large quantities of Ca 2+ from intracellular storage sites. This molecule interacts with and activates a series of proteins, which leads to the generation of force. For relaxation to occur, Ca 2+ must be returned to the storage sites. The exact pathway by which this occurs is currently unknown. In invertebrates, SCP has been proposed to assist with the contraction/relaxation cycle by one of two mechanisms: 1.SCP actively transports calcium from the cytoplasm back to storage. This role directly promotes muscle relaxation. 2.SCP acts as a Ca 2+ -buffer, meaning that it binds Ca 2+ but does not interact with other proteins. This allows it to regulate the total level of Ca 2+ present during contraction/relaxation cycles without directly promoting relaxation.

Prior Work Three variants of SCP have been identified in P. clarkii (pcSCP1a, pcSCP1b, pcSCP1c). Reducing the amount of pcSCP in living P. clarkii causes significant deficits in their level of activity and physical response to stimulation (Two sample T test, T = 3.86, p-value = 0.002). pcSCP has been found to be highly expressed in tail muscle tissue. However, studies of the expression patterns of pcSCP variants have revealed no distinctions between the variants in different tissues (ANOVA block design, F = , p-value = ).

This Project The purpose of this project has been to characterize the Ca 2+ - binding kinetics of pcSCP, in order to determine if the difference between these protein variants is biochemical. For all three variants of pcSCP, protein has been isolated and kinetic data has been collected. Three approaches have been utilized for the analysis of these data: 1.A formal comparison of the kinetic behavior for each variant using statistical inference techniques. 2.Determine and compare the kinetic parameters of pcSCP calcium-binding activity using dose-response curve fitting. 3.Compare the overall behavior of pcSCP variants using principal components and multivariate classification techniques.

Biochemical Isolation of pcSCP E. coli was transformed with variant-specific pcSCP cDNA, and expression of the variants was induced in high quantities by exposure to IPTG. pcSCP proteins were separated from E. coli proteins by liquid chromatography. Analysis of pcSCP1c purity. The dark bands signify proteins of different sizes; the arrow indicates the band which represents pcSCP. (A) All protein collected from E. coli prior to purification. (B) pcSCP1c collected after purification. BA E. coli proteins (impurities) Impurites pcSCP1c

Tryptophan Fluorescence The amino acid tryptophan, which is a part of most proteins, is capable of fluorescing (a measureable phenomenon) after exposure to certain wavelengths of light. The fluorescence of molecules is highly sensitive to environmental changes. Combining these two facts, tryptophan fluorescence is commonly used for kinetic experiments. In this study, changes in fluorescent spectra were considered to be indicative of Ca 2+ binding events by pcSCP proteins.

Kinetic Measurements Individual aliquots of pcSCP variants were diluted in a buffer containing EGTA. This compound selectively binds Ca 2+ ions, allowing precise control over the concentration of Ca 2+ available. The fluorescence profile of each sample was determined by measuring fluorescent emission from 303 to 400 nm. After obtaining each fluorescence spectrum, a small aliquot of Ca 2+ was added to the solution, and another measurement taken. Fifteen measurements spanning the range from to M Ca 2+ were obtained in triplicate for each pcSCP aliquot. Three protein samples (one of each pcSCP variant) were tested in random order each time the experiment was performed.

Data Processing All spectra were standardized by subtracting the spectrum of a sample containing no protein. For curve-fitting and formal statistical analysis, each individual spectrum was integrated, then all spectra obtained using the same aliquot were standardized to be between zero (no Ca 2+ bound) and one (Ca 2+ saturated). An example of data from a single fluorescence experiment with pcSCP1a. Left, standardized spectra. Right, fully standardized fluorescence.

Data All data, shown in standardized form. Points connected by a line were obtained from the same sample. According to biochemical theory, kinetic data for proteins with multiple sites will form a sigmoidal curve between 0 and 1. The kinetically relevant portion of the graph is the transition between plateaus. The data collected in these experiments was highly reproducible, but tended not to form plateaus. This is likely caused by structural changes unrelated to Ca 2+ - binding.

Initial Data Exploration Two formal inference procedures were used to compare the kinetic behavior of pcSCP variants: a Mack-Skillings test and a GLM, both blocking for concentration. Sensitivity analysis indicated that it was impossible to focus on kinetically relevant information when using the full data set, due to the overwhelming proportion of kinetically irrelevant information in the plateau regions. For formal inferences for block design procedures, only the most kinetically viable observations (all except the first four and the last three data points) were included in the dataset. An Anderson-Darling test for normality confirmed the normality of the data (AD = , p-value = ).

Formal Statistical Inference Results Mack-Skillings and GLM procedures both indicated significant differences between pcSCP variants (MS = 13.95, p-value = ; F = , p < 0.001, respectively). Multiple comparisons procedures for Mack-Skillings, which emphasizes consistency of comparisons over magnitude, determined pcSCP1a to be significantly different from both pcSCP1b and pcSCP1c (both p-values < 0.05). Multiple comparisons procedures for the GLM, which emphasizes magnitude of comparisons over consistency, concluded that pcSCP1c was significantly different from both other variants (both p-values ≤ ). These results indicate differences between the pcSCP variants. However, these analyses are not satisfying, primarily because they did not take into account the relationship between concentration and fluorescence.

Dose-Response Curve Fitting To account for Ca 2+ concentration, dose-response curve fitting was employed. This is the most common approach used by biochemists studying protein kinetics. Curve fitting allows for the computation and comparison of kinetic parameters. The parameters of interest for comparing the pcSCP variants in this study were: 1.The dissociation constant, K D, a measure of attraction between protein and ligand (in this case Ca 2+ ). 2.Cooperativity, or interactions between binding sites which alters the K D of one site depending on whether a ligand is bound at another.

The standardized fluorescent data were fit to the widely applied log-logistic model: In this model, c is the bottom plateau; d is the top plateau; b is a measure of cooperativity; and e is the K D (the halfway point between plateaus). This equation was simultaneously fit to the data for each variant using the drc package in R. This program uses least squares to obtain initial values of b and e, using the transformation To control for the aberrant behavior seen at high and low concentrations of calcium, values of c and d were held at 0 and 1, respectively. Applying the Log-Logistic Model

Curve Fitting Results Fitted log-logistic models for pcSCP variants. FONT SIZE IS AN ISSUE IN THIS GRAPH, WOULD IT BE POSSIBLE TO REMAKE IT IN EXCEL?

Kinetic Parameter Estimation For all variant comparisons, selectivity indices indicated significant differences (all 3 p-values ≤ ) between K D parameters. Values of b were significantly different when comparing pcSCP1a to pcSCP1b and when comparing pcSCP1a to pcSCP1c (both p-values ≤ ). pcSCP1apcSCP1bpcSCP1c b1.480 ± 0.736*3.651 ± ± Log(K D ) ± ± ± *95% confidence intervals

Issues with Curve Fitting Lack-of-fit tests were significant, implying that the log-logistic model does not provide a good fit (F = , p-value < 0.001). The residual plot (right) reveals a systematic linear pattern, indicating departures from the model assumptions. The non-kinetic trends in the tails are likely responsible for part of this lack of fit, but these results are, overall, unsatisfying.

Multivariate Analysis Approach The unsatisfactory results of curve-fitting as well as concerns about losing information due to excessive standardization of the data led to the employment of multivariate techniques. This is a novel approach for studies of protein kinetics. The program Pirouette ® (Infometrix, Inc.) was used to perform multivariate exploratory and classification methods. The standardized spectra were used for both techniques. Prior to analysis, these spectra were pre-processed with a 15-point smooth, area normalization, and mean-centering.

Principal Components Analysis Principal components analysis was used as a method of exploratory analysis. This technique finds linear combinations of variables which account for the maximal amounts of variation, and plots the data using the first three principal components. This reduces the dimensionality of the data, allows the display of intersample relationships to be optimized, and can reveal natural clustering patterns.

PCA Scores Plots Two views of the PCA scores reveal clustering of the data along different factors by variant. Each point represents a single fluorescence spectrum: Pink, pcSCP1a; Blue, pcSCP1b; Orange, pcSCP1c. Factor 1, 94.7% of the variability in standardized spectra; Factor 2, 2.62%; Factor 3, 0.856%.

Impact of Ca 2+ Concentration PCA scores plots show separation by free calcium concentration. This confirms that fluorescence spectroscopy is measuring a response by pcSCP to increasing Ca 2+. Solid points represent spectra taken at the corresponding level of Ca 2+. Low Ca 2+ Intermediate Ca 2+ High Ca 2+

Classification Analyses Two classification techniques were used to analyze these data, k nearest neighbors, and soft independent modeling of class analogy. These techniques are based on the idea that the closer samples lie in a measurement space, the more likely they are to be in the same category. KNN classifies an unknown by computing its Euclidean distance to all categorized samples and polling the classes of the k closest samples, while SIMCA develops principal components models for each category and assesses fit of an unknown projected into the space of each model to determine its classification. These analyses provide a quantitative comparison of the variants by determining how precisely their classifications can be distinguished.

Classification Results Only 0.04% of samples were misclassified using KNN (K = 7, the optimal value). Only 0.03% of samples were misclassified using SIMCA (3 factors for each variant). For all misclassified samples, the second choice was correct. This ability to separately identify pcSCP variants’ fluorescence spectra confirms they are different. Predicted pcSCP1a Predicted pcSCP1b Predicted pcSCP1c Actual pcSCP1a 4131 Actual pcSCP1b 0441 Actual pcSCP1c 0144 Predicted pcSCP1a Predicted pcSCP1b Predicted pcSCP1c Actual pcSCP1a 4500 Actual pcSCP1b 0450 Actual pcSCP1c 0441 KNN SIMCA

Conclusions The analyses presented here provide the first indication of significant differences between pcSCP variants. Multivariate analyses provided the most complete and valid comparison of pcSCP variants. This novel approach is promising as a broadly applicable tool for the comparative analysis of protein kinetics, and its application should be further examined. The dose-response curve fitting presented here provides a start towards specific kinetic parameter estimation and inference. Currently, mutants of pcSCP have been generated and their kinetics are being characterized, in order to pinpoint variations responsible for differences in biochemical properties.