SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity.

Slides:



Advertisements
Similar presentations
Descriptive Statistics-II
Advertisements

Computing Transformations
4/12/2015Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Strategy for Complete Regression Analysis
Assumption of normality
TH EDITION LIAL HORNSBY SCHNEIDER COLLEGE ALGEBRA.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Detecting univariate outliers Detecting multivariate outliers
A Simple Guide to Using SPSS© for Windows
Multiple Regression – Assumptions and Outliers
Multiple Regression – Basic Relationships
A Further Look at Transformations
8/2/2015Slide 1 SPSS does not calculate confidence intervals for proportions. The Excel spreadsheet that I used to calculate the proportions can be downloaded.
Assumption of Homoscedasticity
Introduction to SPSS Short Courses Last created (Feb, 2008) Kentaka Aruga.
SW388R6 Data Analysis and Computers I Slide 1 One-sample T-test of a Population Mean Confidence Intervals for a Population Mean.
Slide 1 Testing Multivariate Assumptions The multivariate statistical techniques which we will cover in this class require one or more the following assumptions.
8/9/2015Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of normality Transformations Assumption of normality script Practice problems.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Assumption of linearity
Assumptions of multiple regression
8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.
SW388R7 Data Analysis & Computers II Slide 1 Analyzing Missing Data Introduction Problems Using Scripts.
Sampling Distribution of the Mean Problem - 1
Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.
8/20/2015Slide 1 SOLVING THE PROBLEM The two-sample t-test compare the means for two groups on a single variable. the The paired t-test compares the means.
SW388R7 Data Analysis & Computers II Slide 1 Logistic Regression – Hierarchical Entry of Variables Sample Problem Steps in Solving Problems.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of Homoscedasticity Homoscedasticity (aka homogeneity or uniformity of variance) Transformations.
Hierarchical Binary Logistic Regression
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 PROBABILITIES FOR CONTINUOUS RANDOM VARIABLES THE NORMAL DISTRIBUTION CHAPTER 8_B.
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Stepwise Multiple Regression
Introduction to Algebra
Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
6/2/2016Slide 1 To extend the comparison of population means beyond the two groups tested by the independent samples t-test, we use a one-way analysis.
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
6/4/2016Slide 1 The one sample t-test compares two values for the population mean of a single variable. The two-sample t-test of population means (aka.
11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.
Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)
Chi-square Test of Independence
SW388R7 Data Analysis & Computers II Slide 1 Hierarchical Multiple Regression Differences between hierarchical and standard multiple regression Sample.
SW318 Social Work Statistics Slide 1 Percentile Practice Problem (1) This question asks you to use percentile for the variable [marital]. Recall that the.
SW388R6 Data Analysis and Computers I Slide 1 Percentiles and Standard Scores Sample Percentile Homework Problem Solving the Percentile Problem with SPSS.
Copyright © 2013, 2009, 2005 Pearson Education, Inc. 1 4 Inverse, Exponential, and Logarithmic Functions Copyright © 2013, 2009, 2005 Pearson Education,
Chapter Eight: Using Statistics to Answer Questions.
SW388R7 Data Analysis & Computers II Slide 1 Detecting Outliers Detecting univariate outliers Detecting multivariate outliers.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/11/2016Slide 1 Extending the relationships found in linear regression to a population is procedurally similar to what we have done for t-tests and chi-square.
1/23/2016Slide 1 We have seen that skewness affects the way we describe the central tendency and variability of a quantitative variable: if a distribution.
Today: Standard Deviations & Z-Scores Any questions from last time?
2/24/2016Slide 1 The standard deviation statistic is challenging to present to our audiences. Statisticians often resort to the “empirical rule” to describe.
(Slides not created solely by me – the internet is a wonderful tool) SW388R7 Data Analysis & Compute rs II Slide 1.
Logarithmic Functions Logarithms Logarithmic Equations Logarithmic Functions Properties of Logarithms.
AP PHYSICS 1 SUMMER PACKET Table of Contents 1.What is Physics? 2.Scientific Method 3.Mathematics and Physics 4.Standards of Measurement 5.Metric System.
SW388R7 Data Analysis & Computers II Slide 1 Assumption of linearity Strategy for solving problems Producing outputs for evaluating linearity Assumption.
Thinking Mathematically
Assumption of normality
DEPARTMENT OF COMPUTER SCIENCE
Multiple Regression.
Computing Transformations
4.1 Objective: Students will look at polynomial functions of degree greater than 2, approximate the zeros, and interpret graphs.
Multiple Regression – Split Sample Validation
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

SW388R7 Data Analysis & Computers II Slide 1 Computing Transformations Transforming variables Transformations for normality Transformations for linearity

SW388R7 Data Analysis & Computers II Slide 2 Transformations: Transforming variables to satisfy assumptions  When a metric variable fails to satisfy the assumption of normality, homogeneity of variance, or linearity, we may be able to correct the deficiency by using a transformation.  We will consider three transformations for normality, homogeneity of variance, and linearity:  the logarithmic transformation  the square root transformation, and  the inverse transformation  plus a fourth that may be useful for problems of linearity:  the square transformation

SW388R7 Data Analysis & Computers II Slide 3 Transformations change the measurement scale In the diagram to the right, the values of 5 through 20 are plotted on the different scales used in the transformations. These scales would be used in plotting the horizontal axis of the histogram depicting the distribution. When comparing values measured on the decimal scale to which we are accustomed, we see that each transformation changes the distance between the benchmark measurements. All of the transformations increase the distance between small values and decrease the distance between large values. This has the effect of moving the positively skewed values to the left, reducing the effect of the skewing and producing a distribution that more closely resembles a normal distribution.

SW388R7 Data Analysis & Computers II Slide 4 Transformations: Computing transformations in SPSS  In SPSS, transformations are obtained by computing a new variable. SPSS functions are available for the logarithmic (LG10) and square root (SQRT) transformations. The inverse transformation uses a formula which divides one by the original value for each case.  For each of these calculations, there may be data values which are not mathematically permissible. For example, the log of zero is not defined mathematically, division by zero is not permitted, and the square root of a negative number results in an “imaginary” value. We will usually adjust the values passed to the function to make certain that these illegal operations do not occur.

SW388R7 Data Analysis & Computers II Slide 5 Transformations: Two forms for computing transformations  There are two forms for each of the transformations to induce normality, depending on whether the distribution is skewed negatively to the left or skewed positively to the right.  Both forms use the same SPSS functions and formula to calculate the transformations.  The two forms differ in the value or argument passed to the functions and formula. The argument to the functions is an adjustment to the original value of the variable to make certain that all of the calculations are mathematically correct.

SW388R7 Data Analysis & Computers II Slide 6 Transformations: Functions and formulas for transformations  Symbolically, if we let x stand for the argument passes to the function or formula, the calculations for the transformations are:  Logarithmic transformation: compute log = LG10(x)  Square root transformation: compute sqrt = SQRT(x)  Inverse transformation: compute inv = -1 / (x)  Square transformation: compute s2 = x * x  For all transformations, the argument must be greater than zero to guarantee that the calculations are mathematically legitimate.

SW388R7 Data Analysis & Computers II Slide 7 Transformations: Transformation of positively skewed variables  For positively skewed variables, the argument is an adjustment to the original value based on the minimum value for the variable.  If the minimum value for a variable is zero, the adjustment requires that we add one to each value, e.g. x + 1.  If the minimum value for a variable is a negative number (e.g., –6), the adjustment requires that we add the absolute value of the minimum value (e.g. 6) plus one (e.g. x , which equals x +7).

SW388R7 Data Analysis & Computers II Slide 8 Transformations: Example of positively skewed variable  Suppose our dataset contains the number of books read (books) for 5 subjects: 1, 3, 0, 5, and 2, and the distribution is positively skewed.  The minimum value for the variable books is 0. The adjustment for each case is books + 1.  The transformations would be calculated as follows:  Compute logBooks = LG10(books + 1)  Compute sqrBooks = SQRT(books + 1)  Compute invBooks = -1 / (books + 1)

SW388R7 Data Analysis & Computers II Slide 9 Transformations: Transformation of negatively skewed variables  If the distribution of a variable is negatively skewed, the adjustment of the values reverses, or reflects, the distribution so that it becomes positively skewed. The transformations are then computed on the values in the positively skewed distribution.  Reflection is computed by subtracting all of the values for a variable from one plus the absolute value of maximum value for the variable. This results in a positively skewed distribution with all values larger than zero.  When an analysis uses a transformation involving reflection, we must remember that this will reverse the direction of all of the relationships in which the variable is involved. Our interpretation of relationships must be adjusted accordingly.

SW388R7 Data Analysis & Computers II Slide 10 Transformations: Example of negatively skewed variable  Suppose our dataset contains the number of books read (books) for 5 subjects: 1, 3, 0, 5, and 2, and the distribution is negatively skewed.  The maximum value for the variable books is 5. The adjustment for each case is 6 - books.  The transformations would be calculated as follows:  Compute logBooks = LG10(6 - books)  Compute sqrBooks = SQRT(6 - books)  Compute invBooks = -1 / (6 - books)

SW388R7 Data Analysis & Computers II Slide 11 Transformations: The Square Transformation for Linearity  The square transformation is computed by multiplying the value for the variable by itself.  It does not matter whether the distribution is positively or negatively skewed.  It does matter if the variable has negative values, since we would not be able to distinguish their squares from the square of a comparable positive value (e.g. the square of -4 is equal to the square of +4). If the variable has negative values, we add the absolute value of the minimum value to each score before squaring it.

SW388R7 Data Analysis & Computers II Slide 12 Transformations: Example of the square transformation  Suppose our dataset contains change scores (chg) for 5 subjects that indicate the difference between test scores at the end of a semester and test scores at mid-term: -10, 0, 10, 20, and 30.  The minimum score is -10. The absolute value of the minimum score is 10.  The transformation would be calculated as follows:  Compute squarChg = (chg + 10) * (chg + 10)

SW388R7 Data Analysis & Computers II Slide 13 Transformations: Transformations for normality Both the histogram and the normality plot for Total Time Spent on the Internet (netime) indicate that the variable is not normally distributed.

SW388R7 Data Analysis & Computers II Slide 14 Transformations: Determine whether reflection is required Skewness, in the table of Descriptive Statistics, indicates whether or not reflection (reversing the values) is required in the transformation. If Skewness is positive, as it is in this problem, reflection is not required. If Skewness is negative, reflection is required.

SW388R7 Data Analysis & Computers II Slide 15 Transformations: Compute the adjustment to the argument In this problem, the minimum value is 0, so 1 will be added to each value in the formula, i.e. the argument to the SPSS functions and formula for the inverse will be: netime + 1.

SW388R7 Data Analysis & Computers II Slide 16 Transformations: Computing the logarithmic transformation To compute the transformation, select the Compute… command from the Transform menu.

SW388R7 Data Analysis & Computers II Slide 17 Transformations: Specifying the transform variable name and function First, in the Target Variable text box, type a name for the log transformation variable, e.g. “lgnetime“. Second, scroll down the list of functions to find LG10, which calculates logarithmic values use a base of 10. (The logarithmic values are the power to which 10 is raised to produce the original number.) Third, click on the up arrow button to move the highlighted function to the Numeric Expression text box.

SW388R7 Data Analysis & Computers II Slide 18 Transformations: Adding the variable name to the function First, scroll down the list of variables to locate the variable we want to transform. Click on its name so that it is highlighted. Second, click on the right arrow button. SPSS will replace the highlighted text in the function (?) with the name of the variable.

SW388R7 Data Analysis & Computers II Slide 19 Transformations: Adding the constant to the function Following the rules stated for determining the constant that needs to be included in the function either to prevent mathematical errors, or to do reflection, we include the constant in the function argument. In this case, we add 1 to the netime variable. Click on the OK button to complete the compute request.

SW388R7 Data Analysis & Computers II Slide 20 Transformations: The transformed variable The transformed variable which we requested SPSS compute is shown in the data editor in a column to the right of the other variables in the dataset.

SW388R7 Data Analysis & Computers II Slide 21 Transformations: Computing the square root transformation To compute the transformation, select the Compute… command from the Transform menu.

SW388R7 Data Analysis & Computers II Slide 22 Transformations: Specifying the transform variable name and function First, in the Target Variable text box, type a name for the square root transformation variable, e.g. “sqnetime“. Second, scroll down the list of functions to find SQRT, which calculates the square root of a variable. Third, click on the up arrow button to move the highlighted function to the Numeric Expression text box.

SW388R7 Data Analysis & Computers II Slide 23 Transformations: Adding the variable name to the function Second, click on the right arrow button. SPSS will replace the highlighted text in the function (?) with the name of the variable. First, scroll down the list of variables to locate the variable we want to transform. Click on its name so that it is highlighted.

SW388R7 Data Analysis & Computers II Slide 24 Transformations: Adding the constant to the function Following the rules stated for determining the constant that needs to be included in the function either to prevent mathematical errors, or to do reflection, we include the constant in the function argument. In this case, we add 1 to the netime variable. Click on the OK button to complete the compute request.

SW388R7 Data Analysis & Computers II Slide 25 Transformations: The transformed variable The transformed variable which we requested SPSS compute is shown in the data editor in a column to the right of the other variables in the dataset.

SW388R7 Data Analysis & Computers II Slide 26 Transformations: Computing the inverse transformation To compute the transformation, select the Compute… command from the Transform menu.

SW388R7 Data Analysis & Computers II Slide 27 Transformations: Specifying the transform variable name and formula First, in the Target Variable text box, type a name for the inverse transformation variable, e.g. “innetime“. Second, there is not a function for computing the inverse, so we type the formula directly into the Numeric Expression text box. Third, click on the OK button to complete the compute request.

SW388R7 Data Analysis & Computers II Slide 28 Transformations: The transformed variable The transformed variable which we requested SPSS compute is shown in the data editor in a column to the right of the other variables in the dataset.

SW388R7 Data Analysis & Computers II Slide 29 Transformations: Adjustment to the argument for the square transformation In this problem, the minimum value is 0, no adjustment is needed for computing the square. If the minimum was a number less than zero, we would add the absolute value of the minimum (dropping the sign) as an adjustment to the variable. It is mathematically correct to square a value of zero, so the adjustment to the argument for the square transformation is different. What we need to avoid are negative numbers, since the square of a negative number produces the same value as the square of a positive number.

SW388R7 Data Analysis & Computers II Slide 30 Transformations: Computing the square transformation To compute the transformation, select the Compute… command from the Transform menu.

SW388R7 Data Analysis & Computers II Slide 31 Transformations: Specifying the transform variable name and formula First, in the Target Variable text box, type a name for the inverse transformation variable, e.g. “s2netime“. Second, there is not a function for computing the square, so we type the formula directly into the Numeric Expression text box. Third, click on the OK button to complete the compute request.

SW388R7 Data Analysis & Computers II Slide 32 Transformations: The transformed variable The transformed variable which we requested SPSS compute is shown in the data editor in a column to the right of the other variables in the dataset.

SW388R7 Data Analysis & Computers II Slide 33 Using the script to compute transformations When the script tests assumptions, it will create the transformations that are checked. If you want to retain the transformed variable to use in an analysis, clear the checkbox that tells the script to delete the transformed variables it created.

SW388R7 Data Analysis & Computers II Slide 34 The transformed variables The transformed variables are added to the data editor. The variable names attempt to identify the transformation in the variable name. The variable labels fully identify the transformation, including the function and formula used to compute it.

SW388R7 Data Analysis & Computers II Slide 35 Which transformation to use The recommendation of which transform to use is often summarized in a pictorial chart like the above. In practice, it is difficult to determine which distribution is most like your variable. It is often more efficient to compute all transformations and examine the statistical properties of each.