Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistics

Similar presentations


Presentation on theme: "Introduction to Statistics"— Presentation transcript:

1 Introduction to Statistics
Topics Nellie Hedrick

2 Topic 7 – Displaying and Describing Distribution
Center – the center of data distribution is the most important part of the data analyzing Spread, variability, consistency – how data are distributed is a second important part of data analysis. Shape of distribution third important component of analyzing data.

3 Symmetric and Skew Distribution
Skewed to the Left Skewed to the Right Symmetric – Single Pick Symmetric – Two Picks

4 Graphical Representations of Data Quantitative Variables
Stem plot (21, 20, 40, 22, 31, 19, 25, 23, 22, 18, 10) Stem Leaf Stem Leaf 1 2 3 4 980 102532 1 1 2 3 4 089 012235 1

5 Activity 7-5 Exercise 7-10 Exercise 7-21

6 Definition Side by side Stem plot- common set of stems is placed in the middle of the display with leaves branching out in either direction to the left and right. The convention is to order the leaves from the middle out from least to greatest. Histogram is graphical display similar to dot plot or stem plot. Histogram is more feasible with the larger dataset. Construct the range data into subintervals (bins) of equal length. Counting the number(frequency) of observational units in each subinterval. The bar height represent proportions (relative frequencies) of observational units in the subinterval.

7 Wrap up, Watch out and in Brief
Direction of skewed is the indicated by the longer tail Pay attention to the units of the stem plot Pay attention to outliers – identify them, investigate possible explanations for their occurrences. Make sure if it is not typo error Remember context! Your description of the data should be clear for everyone to be able to read. Remember to label Examine different type of graph to see which gives you better representation Anticipate features of the data by considering the nature of the variable involved.

8 Topic 8 – Measures of Center
Mean – is the ordinary average. It is calculated by adding all the numbers and dividing it by the number of observational units. Median – the value of the middle observational units when observational units are sorted low to high. Median of the odd number of observational units is in (n+1)/2 location Median of even number of observational units in average of the middle two values. Resistant, a measure whose value is relatively unaffected by the presence of outliers in a distribution. Median is resistant, mean is not. Mode - numerical value that appears more often in a distribution.

9 Describing Distributions with Numbers
Example: 20, 40, 22, 22, 21, 31, 19, 25, 23 Mean - Average Median – Measuring Center Mode – Most repeated Minimum – smallest value Maximum – largest value in the data set

10 Describing Distributions with Numbers
Example: 20, 40, 22, 22, 21, 31, 19, 25, 23 Mean – Average Median – Measuring Center Minimum Maximum Mode Sort the data: Median: 9 different data + 1 is 10, the divide by 2 is 5 so the median is the 5th location. (22) Minimum = 19, Maximum = 40, Mode = 22

11 Describing Distributions with Numbers
Example: 20, 40, 22, 22, 21, 31, 19, 25, 23 Mean - Average Median – Measuring Center Minimum Maximum Mode TI83: [1.edit] Enter all the data in the example 1 for L1. Press  after each entry. After completing data entry, press [Quit] [calc] [1:1-var stats]  [L1] . Use (or ) to view all the information.

12 Median and Mean of a Density Curve
symmetric Mean Median Mode Mean Mean Median Mode Mode Median Skewed right Skewed left

13 Wrap up and Warning - Center is a property. Mean and median are two ways to measure center. Neither one is synonymous with center. Either one is have their own properties and straight. Center is only one aspect of a distribution of data. Measures of center do not tell the whole story. Other important features are spread, shape, cluster and outliers. Mode does not apply to categorical as well as quantitative variables. Notion of center does not make sense in categorical values.

14 Exercise 8-7 page 161 Exercise 8-9 page 161 Exercise 8-17 page 163

15 Topic 9 – Measures of Spread
Range – difference between maximum and minimum Lower quartile – data located ¼th = 25% location Upper quartile – data located 3/4th = 75% location Inter quartile range (IQR) difference between upper and lower quartile Start here

16 Measuring the Spread The Standard Deviation (s) – Square root of the Variance Standard deviation: Measure of the spread about the mean of a distribution. It is an average of the squares of the deviations of the observations from their mean, also equal to the square root of the variance.

17 Describing Distributions with Numbers
Be aware that various software packages and calculators might use slightly different rules for calculating quartiles It can be tempting to regard range and IQR as an interval of values, but they should each be reported as a single number that measures the spread of the distribution Measure of spread apply only to quantitative variables, not categorical ones.

18 Activity 9-5 page 182 Exercise 9-12 page 190 Exercise 9-22 page 193

19 Watch out Variability can be tricky concept to grasp! But it is the absolute fundamental to working with data When looking at the variable distribution, make sure to focus on variability in the horizontal values (the variable) and not the heights (frequency) The number of distinct values represented in a histogram does not necessary indicates greater variability. Consider how far the values fall from the center more than the variety of their exact numerical values.

20 Mound-Shaped Distribution – Empirical rule
68% of data fall within one standard deviation from Mean 95% of data fall within two standard deviation from Mean 99.7% of data fall within three standard deviation from Mean 68% 95% 99.7% The rule

21 Attendance at a university's basketball games follows a normal distribution with mean µ = 8,000 and standard deviation σ = 1,000. Use the 68–95–99.7 rule and give your answer as a percent. Estimate the percentage of games that have between 6,000 to 8,000 people in attendance. Estimate the percentage of games that have more than 7000 people in attendance Estimate the percentage of games that have less than 6,000 people in attendance Estimate the percentage of games that have less than 8,000 people in attendance Estimate the percentage of games that have less than 5,000 people in attendance Estimate the percentage of games that have more than 10,000 people in attendance

22 Mound-Shaped Distribution – Empirical rule
68% of data fall within one standard deviation from Mean 95% of data fall within two standard deviation from Mean 99.7% of data fall within three standard deviation from Mean 34% 34% 13.5% 13.5% 2.35% 2.35% 0.15% 0.15% The rule

23 The Standard Normal Distribution
As rule suggest all the normal distribution share a common property. Z-score The z-score is process of standardization. If x is an observation from a distribution that has a mean  and standard deviation , the standardized value of x is

24 Calculating Standard Normal Z
Example: Calculate standard normal for x = 120, where Mean  =170 and standard deviation  = 30. µ = 170  = 30 120 µ = 0  = 1 -1.67

25 Normal distribution Same Mean, but different standard deviation (S2 < S1) larger spread with larger standard deviation. S2 S1

26 The length of human pregnancies from conception to birth is known to be normally distributed with a mean of 266 days and standard deviation of 16 days What proportion of pregnancies last between 250 and 282 days? 2. What proportion of pregnancies last between 232 and 282 days?

27 Wrap up In study of variability, you see that even if two databases have similar center, the spread of the values might differ substantially. Z-score is a useful tool when you are comparing two or more dataset. Z-score serves as a ruler for measuring distances. Variability is a property of a distribution; standard deviation and IQR are two ways to measure variability. Standard deviation, mean absolute deviation, loosely interpreted as the typical deviation of an observation from the mean.

28 Topic 10 – More Summary Measures and Graph
Five-number summery (FNS) – the FNS provides a quick and convenient description of where the four quarters of the data in a distribution fall. Median Quartiles (Q1, Q3) Extremes (min, max) Box Plot – the FNS forms the basis for a graph called a box-plot. Box plot are especially useful for comparing distributions of a quantitative variable across two or three groups.

29 Measuring the Center and Spread
Five-number summary Mean and standard deviation Choosing a Summary Five-number summary Mean and standard deviation Symmetric distribution Skewed distribution Outlier

30 The Five-Number Summary
Box Plot Maximum Q3 Median Q1 Minimum

31 Modified box plot Modified box plot – convey additional information by treating Outliner differently. On these graphs the outlier is marked differently using special symbol and extended the whisker to the next non-outliers. We call any observation falling more than 1.5 times the IQR away from the nearer quartile to be an outlier.

32 Activity 10-1 page Exercise 10-22 page 217

33 Watch out and Wrap up Box plot can be tricky to read and interpret. It only provides that data is divided into 4 pieces and each containing 25% of the data. Box plot and modified box plot is nice tool to compare between groups. Make sure to use a same scaling.


Download ppt "Introduction to Statistics"

Similar presentations


Ads by Google