Presentation is loading. Please wait.

Presentation is loading. Please wait.

BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties 08-31-2010.

Similar presentations


Presentation on theme: "BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties 08-31-2010."— Presentation transcript:

1 BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties 08-31-2010

2 Physico-chemical properties and amino acids  Physico-chemical properties of the amino acids in a peptide chain determine its folding process, and therefore its 3D structure and functions.  A full understanding of these properties is necessary for understanding the folding process, and therefore design algorithms to predict protein 3D structures from the amino acid sequences.  Commonly known physico-chemical properties of amino acids that affect protein folding: 1.Volume/size: Sum of van der Waals volume of the atoms; Partial volume: the increase in water volume when solved in water Balkiness: the ratio of side chain volume and its length— cross section area

3 Physico-chemical properties and amino acids 2.Polarity index : Electrostatic force of the amino acid that acts on its surroundings at a distance of 10 Å, which is a combination of force due to electrical charge and dipole movement of polarized amino acid. 3.Isoelectric point (pI): the pH value of the solution in which the net charge on the amino acid is zero. - OH H H+H+ H+H+ pH = pI pH < pI pH > pI + + - + 10 Å Amino acid Testing charge +

4 Physico-chemical properties and amino acids 4. Hydrophobicity: a measure of the solubility of an AA in water. Hydrophobic: difficult to solve in water Hydrophilic: easy to solve in water Hydrophobicity scales: Kyle and Doolittle scale: based on free energy cost when moving an AA from the inside of a protein to its surface. Engleman, Steitz and Goldman scale: based on free energy cost when moving an AA from a lipid bilayer membrane to water. Hydrophobic AAs have positive hydrophobicity values, and Hydrophilic AAs have negative hydrophobicity values. 5. Water accessible area : related to the portion of the side chain that is buried in a folded protein

5 Physico-chemical properties and amino acids  How can we visulize and quantitatively analyze the similarity and difference of AAs based on these and even more physico-chemical prosperities ?

6 Analysis of amino acids based on their physico- chemical properties  A simple visualization analysis of amino acids using molecular volume and isoelectric point (pI)  Such a results is not satisfactory, and more sophysticated methods are needed.

7 Principle component analysis (PCA)  Simply speaking, PCA rescale and transform the data based on the relationship of the properties of the data, such that the data point can be separated based on a few new computated properties (principle components) y2y2 y1y1 p 1 p 2 p 3 d 1 d 2 … d 11 p 1 p 2 p 3 d 1 d 2 … d 11 RescaleTransform c 1 c 2 c 3 d 1 d 2 … d 11 c 1 c 2 d 1 d 2 … d 11 Dimension reduction

8 Principle component analysis (PCA)  A general dataset can be represented as an N (rows) x P (columns) matrix X: p 1 p 2 … p j … p P d1d2…di…dNd1d2…di…dN where N is the number of objects/data points (eg. 20 amino acids), and P is the number of properties that each object has (eg. 8 properties of AAs).

9 Principle component analysis (PCA)  Rescale the data by normalization: 1. Compute the mean of each column j: 2. Compute the standard deviation of each column j 3. Normalization:

10 Principle component analysis (PCA)  Rescale the data (matrix X) by normalization, generating matrix Z: Matrix XMatrix Z

11 Principle component analysis (PCA)  Transform the matrix Z to a new matrix Y by multiplying Z by a special matrix V: Z N x P V P x P Y N x P  V is computed based on the relationships of the properties of the data, and has the following properties: 1.Each two columns j and k in V are orthologous: 2.Each column in V is a unit vector:

12 Principle component analysis (PCA)  Each column of Y is called a principle component of the data.  If we rank the principle components according to the their variance, a few columns are predominate relative to the other ones, and they can be used to visulize the data in a reduced dimensional space. … …

13 Principle component analysis (PCA)  The variety of data can be largely visualized in the reduced space. Plot of 20 AA’s on the first two components of the PCA Two vectors in V that give largest variance in two columns in Y Plot on molecular volume and isoelectric point (pI)

14 Principle component analysis (PCA)  To compute matrix V, we first compute a P x P matrix of correlation coefficients between each of columns of X (or Z), C:

15 Principle component analysis (PCA)  The correlation coefficient matrix is symmetric, c ij =c ji, and c ii =1.  The correlation coefficient matrix of the 20 AA based on the 8 properties:

16 Principle component analysis (PCA)  It can be shown that the columns of V are eigenvectors of C, i.e., n is the corresponding eigenvalue.

17 Principle component analysis  It can be shown that the variance the n-th component of the matrix Y is equal to the corresponding eigenvalue of the n-th eigenvector:  The total variance of the data is P, so fraction of variance of the first a few (e.g. 2) components is a measure of the representativeness of these principle components of the dataset.  In our PCA analysis of the 20 AAs dataset,


Download ppt "BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties 08-31-2010."

Similar presentations


Ads by Google