STAT 211 – 019 Dan Piett West Virginia University Lecture 2.

Slides:



Advertisements
Similar presentations
Chapter 3, Numerical Descriptive Measures
Advertisements

Chapter 2 Exploring Data with Graphs and Numerical Summaries
Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Probabilistic & Statistical Techniques
Correlation and Linear Regression.
Describing the Relation Between Two Variables
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson2-1 Lesson 2: Descriptive Statistics.
Descriptive Statistics A.A. Elimam College of Business San Francisco State University.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch. 2-1 Statistics for Business and Economics 7 th Edition Chapter 2 Describing Data:
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Basic Statistical Concepts Psych 231: Research Methods in Psychology.
Basic Statistical Concepts
Statistics Psych 231: Research Methods in Psychology.
Descriptive Statistics
Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.
Chap 3-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 3 Describing Data: Numerical Statistics for Business and Economics.
Correlation and Regression Analysis
Variability Ibrahim Altubasi, PT, PhD The University of Jordan.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Statistics for Managers.
Numerical Descriptive Techniques
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
1.3: Describing Quantitative Data with Numbers
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
CHAPTER 7: Exploring Data: Part I Review
STAT 211 – 019 Dan Piett West Virginia University Lecture 3.
Wednesday, May 13, 2015 Report at 11:30 to Prairieview.
Describing distributions with numbers
Variation This presentation should be read by students at home to be able to solve problems.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 3-1 Chapter 3 Numerical Descriptive Measures Business Statistics, A First Course.
Chapter 8 Making Sense of Data in Six Sigma and Lean
Exploring Data 1.2 Describing Distributions with Numbers YMS3e AP Stats at LSHS Mr. Molesky 1.2 Describing Distributions with Numbers YMS3e AP Stats at.
1 Descriptive statistics: Measures of dispersion Mary Christopoulou Practical Psychology 1 Lecture 3.
Measures of Center vs Measures of Spread
Numerical Measures of Variability
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
Data Summary Using Descriptive Measures Sections 3.1 – 3.6, 3.8
Is their a correlation between GPA and number of hours worked? By: Excellent Student #1 Excellent Student #2 Excellent Student #3.
Describing Data: Two Variables
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 3-1 Week 3 Numerical Descriptive Measures Statistical Methods.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Midterm Review IN CLASS. Chapter 1: The Art and Science of Data 1.Recognize individuals and variables in a statistical study. 2.Distinguish between categorical.
MM150 ~ Unit 9 Statistics ~ Part II. WHAT YOU WILL LEARN Mode, median, mean, and midrange Percentiles and quartiles Range and standard deviation z-scores.
Descriptive Statistics ( )
Describing Data: Two Variables
Statistics 200 Lecture #6 Thursday, September 8, 2016
Business and Economics 6th Edition
MATH-138 Elementary Statistics
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Understanding Research Results: Description and Correlation
Linear transformations
Unit 7: Statistics Key Terms
1.2 Describing Distributions with Numbers
BUS173: Applied Statistics
Chapter 1 Warm Up .
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Business and Economics 7th Edition
Presentation transcript:

STAT 211 – 019 Dan Piett West Virginia University Lecture 2

Last Lecture Population/Sample Variable Types Discrete/Continuous Numeric & Ranked/Unranked Categorical Displaying Small Sets of Numbers Dot Plots, Stem and Leaf, Pie Charts Histograms Frequency/Density and Symmetric vs Right/Left Skewed Measures of Center Mean/Median

Overview 2.3 Measures of Dispersion 2.5 Boxplots 3.1 Scatterplots 3.2 Correlation 3.3 Regression

Section 2.3 Measures of Dispersion

Descriptive Statistics Describing the Data How do we describe data? Graphs (Last Class) Measures Center (Last Class) Mean/Median Dispersion/Spread (This Class) Variance, Standard Deviation, IQR

Spread of Data Example: Spread Data 1: 8, 8, 9, 9, 10, 11, 11, 12, 12 Data 2: -30, -20, -10, 0, 10, 20, 30, 40,50 Data 1 – Mean = Median = 10 Data 2 – Mean = Median = 10 Both have the same measure of center but how do they differ? Data 2 is much more spread out.

Sample Standard Deviation Sample Standard Deviation (S) is a measure of how spread out the data is S can be any number >= 0 Larger S indicates a larger spread Unit Associated with S is the same unit as the variable Example: Mean of 110 lb, Standard Deviation 10 lb The square of the sample standard deviation is called the sample variance

Standard Deviation Example Data 1 (8, 8, 9, 9, 10, 11, 11, 12, 12) S = 1.58 Data 2 (-30, -20, -10, 0, 10, 20, 30, 40,50) S = As you can see, the standard deviation of Data 2 is much larger than Data 1.

Population Variance/Standard Deviation Much like the sample mean (xbar) estimates the population mean (mu), the sample variance/standard deviation (s) can be used to estimate the true population standard deviation (sigma)

Linear Transformations and Changes of Scale By adding or subtracting a constant to every value in a data set The mean is increased/decreased by the same amount The median is increased/decreased by the same amount The standard deviation is unchanged By multiplying each value by a constant The mean is multiplied by the same amount The median is multiplied by the same amount The standard deviation is multiplied by the same amount

Section 2.5 Boxplots

Quartiles Quartiles are numbers which partition the data into 4 subgroups (ie 4 quarters in a dollar) Q1 The data separating lowest 25% of the data values Q2 aka. Median The data separating the lowest 50% of the data values Q3 The data separating the lowest 75% of the data values Q4 aka. Maximum The largest data value

Quartiles Example You can think of Q1 as the median of the bottom half of the data and Q3 as the median of the top half of the data

Interquartile Range (IQR) The IQR is another measure of spread, much like S. Larger IQR results in more spread data IQR is calculated as Q3 - Q1 Example Data 1 (8, 8, 9, 9, 10, 11, 11, 12, 12) IQR = =3 Data 2 (-30, -20, -10, 0, 10, 20, 30, 40,50) IQR = 35-(-15) = 50

Boxplots Boxplots are a graphical representation of the quartiles.

Using IQR to Find Potential Outliers One method to find potential outliers is as follows: 1. Find the IQR 2. Add 1.5*IQR to Q3 Anything larger than this value can be flagged as a potential outlier 3. Likewise, subtract 1.5*IQR from Q1 Anything smaller than this value can be flagged as a potential outlier Example Data 1 (8, 8, 9, 9, 10, 11, 11, 12, 12) Data 2 (-30, -20, -10, 0, 10, 20, 30, 40,50)

Section 3.1 Scatterplots

Bivariate Data Bivariate data is data consisting of two variables from the same individual Examples Height and Weight Classes skipped and GPA Graphed using a scatterplot

Scatterplot Example

Section 3.2 Correlation

Pearson Correlation Coefficient We have discussed ways to describe data of one variable. This section will discuss how to describe two variables on the same individual together. The correlation coefficient, r, is a measure of the strength of a linear (straight line) relationship between bivariate data. (You will not need to know the formula for r) To say two variables are correlated is two say that an increase/decrease in one corresponds to an increase/decrease in the other.

More on r r can take on values between -1 and 1 The strength of the correlation depends on how close you are to the extreme values of -1 or 1 r = -.78 is a stronger correlation than r =.50 There are three types of correlation Positive Negative No Correlation

Positive Correlation Positive Correlation exists when r is between 0 and 1. The closer r is to 1, the stronger the relationship This implies that if you increase one of the variables, the other one will also increase. Examples: Height and Weight, Temperature and Ice Cream Sales

Negative Correlation Positive Correlation exists when r is between -1 and 0. The closer r is to -1, the stronger the relationship This implies that if you increase one of the variables, the other one will decrease. Example: Temperature and Hot Chocolate Sales

No Correlation No Correlation exists when r is approximately 0 This implies that if you increase one of the variables the other one does not change Example: Temperature and Cookie Sales

Interpretation of r Although we may find that two variables are correlated, this does not mean that there is necessarily a causal relationship. Example: High School Teachers who are paid less tend to have students who do better on the SATs than Teachers who are paid more. It has been found that there is a negative correlation between teacher salary and students SAT scores. Therefore we should pay our teachers less so students score higher. Clearly this is not a causal relationship. There is likely a third variable, that is explaining this. One possibility may be the age of the teacher.

Section 3.3 Regression

Regression Intro So we have decided that two variables are correlated, we are now going to use the value of one of the variables, “x”, to predict the value of the other variable, “y”. Example: Use height (x) to predict weight (y) Use temperature (x) to predict ice cream sales (y)

Regression Equation

Calculating a Regression Equation Given the slope and intercept

Plotting a Regression Line

Notes on Regression Lines

Residuals A residual is the distance between a point (observed y-value) and the regression line (predicted y-value) Formula: Observed Value – Predicted Value Using the Cholesterol Example: For TV Hours = 3, our predicted value was The actual value on the graph is 220. The residual for this particular point is = =7.8 A residual may be positive or negative The interpretation is that the observed y-value is 7.8 units larger than the predicted y value for TV Hours = 3