Linear transformations

Slides:



Advertisements
Similar presentations
Describing distributions with numbers
Advertisements

Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
CHAPTER 7: Exploring Data: Part I Review
Week31 The empirical ( ) rule With a bell shaped distribution,  about 68% of the data fall within a distance of 1 standard deviation from the.
4.1 Scatterplots  Explanatory and Response Variables  Scatterplots  Interpreting Scatterplots  Categorical Variables in Scatterplots 1.
Chapter 2 The Normal Distributions. Section 2.1 Density curves and the normal distributions.
Chapter 2: Modeling Distributions of Data
Chapter 3: Describing Relationships
Statistics 200 Lecture #4 Thursday, September 1, 2016
CHAPTER 3 Describing Relationships
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
CHAPTER 7 LINEAR RELATIONSHIPS
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Statistics 200 Lecture #5 Tuesday, September 6, 2016
Good Afternoon! Agenda: Knight’s Charge-please wait for direction
Good Afternoon! Agenda: Knight’s Charge-please get started Good things
Ninth grade students in an English class were surveyed to find out about how many times during the last year they saw a movie in a theater. The results.
Describing Location in a Distribution
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Topic 5: Exploring Quantitative data
Chapter 2: Modeling Distributions of Data
CHAPTER 4: Scatterplots and Correlation
Unit 4 Vocabulary.
Chapter 2: Modeling Distributions of Data
Examining Relationships
CHAPTER 2 Modeling Distributions of Data
Organizing Data AP Stats Chapter 1.
CHAPTER 3 Describing Relationships
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 3 Describing Relationships
3.1: Scatterplots & Correlation
CHAPTER 3 Describing Relationships
Chapter 2: Modeling Distributions of Data
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Chapter 2: Modeling Distributions of Data
Honors Statistics Review Chapters 4 - 5
Chapter 2: Modeling Distributions of Data
CHAPTER 3 Describing Relationships
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Association between 2 variables
CHAPTER 2 Modeling Distributions of Data
CHAPTER 3 Describing Relationships
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
CHAPTER 3 Describing Relationships
Describing Relationships
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Association between 2 variables
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
Chapter 2: Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 3 Describing Relationships
CHAPTER 2 Modeling Distributions of Data
Presentation transcript:

Linear transformations A linear transformation changes the original value x into a new variable xnew . xnew is given by an equation of the form, Example (i) A distance x measured in km. can be expressed in miles as follow, . (ii) A temperature x measured in degrees Fahrenheit can be converted to degrees Celsius by week3

Effect of a Linear Transformation Multiplying each observation in a data set by a number b multiplies both the measures of center (mean, median, and trimmed means) by b and the measures of spread (range, standard deviation and IQR) by |b| that is the absolute value of b. Adding the same number a to each observation in a data set adds a to measures of center, quartiles and percentiles but does not change the measures of spread. Linear transformations do NOT change the overall shape of a distribution. week3

x xnew s Measure Mean Median M a+bM Mode a+bMode Range R IQR Stdev week3

Example 1 A sample of 20 employees of a company was taken and their salaries were recorded. Suppose each employee receives a $300 raise in the salary for the next year. State whether the following statements are true or false. The IQR of the salaries will be unchanged increase by $300 be multiplied by $300 The mean of the salaries will week3

Nonlinear transformations A very common nonlinear transformation in statistic is the logarithm transformation. Recall: lnx = logex where e is the natural number e = 2.7183. If measurements on a variable x have a right skewed distribution. The distribution of lnx will be roughly symmetric. If measurements on a variable x have a left skewed distribution. The distribution of lnx will be even more left skewed. week3

Example 2 - Nonlinear transformations week3

Standardizing and z-scores Another measure of relative standing (in addition to percentiles) that is often used is the z-scores. The sample z-score make use of the mean and standard deviation of the data set in order to specify the relative location of the observation. A sample z-score tells us how many standard deviations the original observation falls away from the mean of the data. The sample z-score for an observation x is week3

Population z-scores If x is an observation from a population that has mean  and standard deviation  , the population z-score of x is given by A z-score is often referred to as the standardized value of x. A z-score tells us how many standard deviations the original observation falls away from the mean of the distribution. Standardizing is a linear transformation that transform the data into the standard scale of z-scores. Therefore, standardizing does not change the shape of a distribution, but changes the value of the mean and stdev. week3

Example The heights of women has mean  = 64.5 inches and standard deviation  = 2.5 inches. The standardized height is The standardized value (z-score) of height 68 inches is or 1.4 std. dev. above the mean. A woman 60 inches tall has standardized height or 1.8 std. dev. below the mean. week3

Looking at data - relationships Two variables measured on the same individuals are associated if some values of one variable tend to occur more often with some values of the second variable than with other values of that variable. When examining the relationship between two or more variables, we should first think about the following questions: What individuals do the data describe? What variables are present? How are they measured? Which variables are quantitative and which are categorical? Is the purpose of the study is simply to explore the nature of the relationship, or do we hope to show that one variable can explain variation in the other? week3

Response and explanatory variables A response variable measure an outcome of a study. An explanatory variable explains or causes changes in the response variables. Explanatory variables are often called independent variables and response variables are called dependent variables. The ides behind this is that response variables depend on explanatory variables. We usually call the explanatory variable x and the response variable y. The relationship between two quantitative variables is called bivariate relationship. week3

Scatterplot A scatterplot is a two-dimensional plot, with one variable’s values plotted along the vertical axis and the other along the horizontal axis. A scatterplot shows the relationship between two quantitative variables measured on the same individuals. Each individual in the data appears as a point in the plot fixed by the values of both variables for that individual. Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis) of a scatterplot. week3

Examining and interpreting Scatterplots Look for overall pattern and striking deviations from that pattern. The overall pattern of a scatterplot can be described by the form, direction and strength of the relationship. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. If large values of one variables are associated with large values of the second variable, we say that the two variables are “positively related” or “positively correlated”. Alternatively, if large values of one variables are associated with small values of the second variable, we say that the two variables are “negatively correlated”. week3

Example There is some evidence that drinking moderate amounts of wine helps prevent heart attack. A data set contain information on yearly wine consumption (litters per person) and yearly deaths from heart disease (deaths per 100,000 people) in 19 developed nations. Answer the following questions. What is the explanatory variable? What is the response variable? Examine the scatterplot below. week3

week3

Graph > Scatterplot Interpretation of the scatterplot The pattern is fairly linear with a negative slope. No outliers. The direction of the association is negative . This means that higher levels of wine consumption are associated with lower death rates. This does not mean there is a causal effect. There could be lurking variables. For example, higher wine consumption could be linked to higher income, which would allow better medical care. MINITAB command for scatterplot Graph > Scatterplot week3

Categorical variables in scatterplots To add a categorical variable to a scatterplot, use a different colour or symbol for each category. The scatterplot below shows the relationship between the world record times for 10,000m run and the year for both men and women. week3

Categorical explanatory variables Scatterplots display the association between two quantitative variables. To display a relationship between a categorical explanatory variable and a quantitative response variable, make a side-by-side comparison of the distributions of the response for each category. A back-to-back stemplot compares two distributions. Side-by-side boxplots compare any number of distributions. week3

Example We want to investigate to association between how much education a person has and his/her income. Education appears as a categorical variable. 1 = did not reach high school, 2 = some high school but no high school diploma. up to 6 = postgraduate degree. Order the categories and make side-by side boxplots for the income. week3

The side-by-side boxplots show a strong positive association between education and earnings. week3

Question (Term Test May 98) Descriptive statistics of scores of 3 groups of students are given below. Variable Group N Mean Median StDev Post1 B 22 6.682 6.500 2.767 D 22 9.773 10.000 2.724 S 22 7.773 7.000 3.927 Using the information above estimate the following in some reasonable way. State any assumptions that you have to make. (a) The 98th percentile of the post1 scores using method B. b) The proportion of post1 scores that would be 7 or higher for those using method D. week3