Presentation is loading. Please wait.

Presentation is loading. Please wait.

First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 0 0 The Effects of Centering and Scaling the Rows of Multidimensional Data on Their.

Similar presentations


Presentation on theme: "First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 0 0 The Effects of Centering and Scaling the Rows of Multidimensional Data on Their."— Presentation transcript:

1 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 0 0 The Effects of Centering and Scaling the Rows of Multidimensional Data on Their Graphical and Correlation Structures Ali S. Hadi and Rida Moustafa ahadi@aucegypt.edu ali-hadi@cornell.edu www.aucegypt.edu/faculty/hadi

2 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 1 1 Outline of the Talk 1.Introduction 2.Types of Centering and/or Scaling 3.Effects of Centering and/or Scaling 4.The Main Theoretical Results 5.Illustrative Examples 6.Conclusions

3 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 2 2 1. Introduction Before performing certain statistical analysis methods (e.g., principal components and factor analyses), it may be necessary to preprocess the data to make them suitable for the analysis.

4 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 3 3 1. Introduction Examples: Data Editing Imputation of missing values Transformation Identification of outliers Centering and/or Scaling (e.g., Rao, 2005) etc.

5 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 4 4 1. Introduction Given an data matrix X, which represents n multivariate observations on p variables, the columns and/or the rows of X may be centered and/or scaled before applying a statistical method to the data matrix X.

6 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 5 5 2. Type of Centering and/or Scaling 1. Column (Variable) Centering: The ij-th element of can be written as: where is the mean of the i-th row. Hence we have:

7 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 6 6 2. Type of Centering and/or Scaling 2.Row (observations) Centering: The ij-th element of can be written as: where is the mean of the i-th row. Hence we have:

8 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 7 7 2. Type of Centering and/or Scaling 3.Column and Row Centering: The ij-th element of can be written as: where is the mean of all elements of X.

9 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 8 8 2. Type of Centering and/or Scaling 4. Row Scaling (each row of X or : This can be obtained by: a.Scaling by the L1-norm b.Scaling by the L2-norm c.Scaling by the standard deviation (SD)

10 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 9 9 2. Type of Centering and/or Scaling 4a.Scaling the rows of the matrix X or by the L1-norm of its rows as follows: where is diagonal matrix with its i-th diagonal element equals to the reciprocal of the L1-norm of the i-th row of X or.

11 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 10 2. Type of Centering and/or Scaling 4b.Scaling the rows of the matrix X or by the L2-norm of its rows as follows: where is diagonal matrix with its i-th diagonal element equals to the reciprocal of the L2-norm of the i-th row of X or.

12 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 11 2. Type of Centering and/or Scaling 4c.Scaling the rows of the matrix X or by the standard deviation (SD) of its rows: where is diagonal matrix with its i-th diagonal element equals to the reciprocal of the standard deviation of the i-th row of X.

13 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 12 2. Type of Centering and/or Scaling 5.Standardizing the variables (each column of X : where is diagonal matrix with its i-th diagonal element equals to the reciprocal of the standard deviation of the j-th column of X.

14 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 13 2. Type of Centering and/or Scaling 6.Centering and standardizing both rows and columns of X : This is obtained by an iterative standardization process of rows and columns until the rows and columns are approximately standardized.

15 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 14 2. Type of Centering and/or Scaling The above row and column transformations have been used by several authors in practical applications: For example: Holter et al. (2000), Wen et al. (2007), Pielou (1984), Jackson (1991), Pyle (1999), van der Werf, Jellema, and Hankemeier (2005), van den Berg et al. (2006).

16 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 15 2. Type of Centering and/or Scaling We argue that of the 11 methods of centering/scaling the data matrix mentioned above only two (the which deal with the columns of the data) are meaningful. The other 9, which deal with the rows of the data, are not generally recommended. Why? Next slide.

17 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 16 3. Effects of Centering and/or Scaling There are several reasons why centering and/or scaling the rows is not a good to do: 1.Centering and scaling the observations is not always possible. For example, when an observation consists of the same numerical value on all dimensions, centering will replace the observation by zeros and scaling would not be possible.

18 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 17 3. Effects of Centering and/or Scaling 2.Even when centering rows is possible, it creates a perfect collinearity among the variables even if the original variables are orthogonal. This is because for any data matrix X.

19 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 18 3. Effects of Centering and/or Scaling 3. In addition, it alters the correlation structure among the variables. For example, two positively correlated variables will turn into two perfectly negatively correlated variables after row- centering. This is because the two variables in the row-centered data are

20 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 19 3. Effects of Centering and/or Scaling Two positively correlated variables turn into two negatively correlated variables

21 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 20 3. Effects of Centering and/or Scaling 4.Centering and/or scaling observations are not statistically meaningful because We cannot attach a unit of measurement to the mean or the standard deviation of the observations because we may be adding variables that have very different units of measurements.

22 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 21 3. Effects of Centering and/or Scaling 5.After row scaling, the observations on the same variable would have different units of measurements. Thus, the observations on the same variable will have different origin and/or different scale.

23 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 22 3. Effects of Centering and/or Scaling Example:

24 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 23 3. Effects of Centering and/or Scaling 6.Finally, perhaps the most damaging effect of centering and/or scaling the observations is that they distort the graphical structure of the observations in the multidimensional space and substantially alters the correlation structure among the variables.

25 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 24 Outline of the Talk 1.Introduction 2.Types of Centering and/or Scaling 3.Effects of Centering and/or Scaling 4.The Main Theoretical Results 5.Illustrative Examples 6.Conclusions

26 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 25 4. The Main Theoretical Results Theorem 1. Centering the rows of X : For simplicity of notation, let Y be the matrix obtained by centering the rows of X or. Then the columns of Y are linearly dependent even if X and are of full- column rank.

27 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 26 4. The Main Theoretical Results Theorem 2. Scaling the rows of X by the L1- norm: For simplicity of notation, let Y be the matrix obtained by scaling the rows of X or by the L1-norm. Then the rows of Y lie on the surface of a parallelogram with sides.

28 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 27 4. The Main Theoretical Results Theorem 3. Scaling the rows of X by the L2- norm: For simplicity of notation, let Y be the matrix obtained by scaling the rows of X or by the L2-norm. Then the rows of Y lie on the surface of a sphere in p -dimensional space.

29 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 28 4. The Main Theoretical Results Theorem 4. Scaling the rows of X using the standard deviation: Let be the matrix obtained by centering and standardizing the rows of. Then the rows of lie on the surface of a dimensional ellipsoid, centered at the origin.

30 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 29 5. Illustrative Examples Example 1. Two Dimensional Data

31 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 30 5. Illustrative Examples Example 1. Two Dimensional Data

32 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 31 5. Illustrative Examples Example 1. Two Dimensional Data

33 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 32 5. Illustrative Examples Example 2. DNA Microarrays Data The DNA microarrays dataset consists of genome-wide expression measurements, where for each of n = 2467 genes, p=79 measurements have been taken resulting in a data matrix X of 246 rows and 79 Columns (The National Academy of Sciences Website www.pnas.org).

34 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 33 5. Illustrative Examples Example 2. DNA Microarrays Data The dataset has been analyzed by many authors. For example, Schena et al. (1996), Shalon et al. (1996), Cho et al. (1998), Eisen et al. (1998), Spellman et al. (1998). The authors scale the rows of X by the L2-norm.

35 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 34 5. Illustrative Examples Example 2. DNA Microarrays Data p = 3

36 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 35 5. Illustrative Examples Example 2. DNA Microarrays Data p = 3

37 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 36 5. Illustrative Examples Example 2. DNA Microarrays Data p = 3

38 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 37 5. Illustrative Examples Example 2. DNA Microarrays Data p = 79

39 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 38 6. Conclusions 1.Centering and/or scaling the rows of X distorts the graphical structure of the observations in the multi-dimensional space and substantially alters the correlation structure among the variables.

40 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 39 6. Conclusions 2.Accordingly, analysts who use such row centering and/or scaling should first demonstrate that the process results in a new, more appropriate structure for their questions.

41 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 40 Outline of the Talk 1.Introduction 2.Types of Centering and/or Scaling 3.Effects of Centering and/or Scaling 4.The Main Theoretical Results 5.Illustrative Examples 6.Conclusions

42 First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 41 Thank You!!


Download ppt "First Al-Khawarezmi Conference: Qatar, December 6-8, 2010 Ali Hadi 0 0 The Effects of Centering and Scaling the Rows of Multidimensional Data on Their."

Similar presentations


Ads by Google