Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:

Similar presentations


Presentation on theme: "Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:"— Presentation transcript:

1

2 Describing Data Week 1

3 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where: Where was the data measured? When: When was the measurement done? HoW: How was the data measured? Why: Why was the measurement done?

4 Always Check the W’s Anytime you see data always check the W’s. This will help spot questionable statistics. ALWAYS QUESTION DATA

5 Variables (The What) Variables are characteristics that are recorded about each individual. Categorical variables are non-numeric in nature. Quantitative variables are measurements and have units

6 Displaying and Describing Categorical Data

7 Terms Frequency table: Categories and counts Distribution: lists the frequencies of each category Distribution: lists the relative frequencies of each category Contingency Table: The frequencies or relative frequencies of 2 variables.

8 Terms Marginal Distribution: the totals found on the margins of the chart. The distribution of one of the two variables Conditional distribution: the distribution of one row or column of a contingency table. Independence: two variables are independent if the conditional distribution of all the values of a variable is the same as the marginal distribution of that variable. (Huh!)

9 Three Rules of Data Analysis First, make a picture!

10 Or you could

11 Why? Pictures reveal things charts don’t. Patterns can be revealed that are not readily apparent from the numbers. Pictures are the easiest way to explain to others about the data

12 To Make a Graph Make piles. Organize the data into like groups Make a frequency table Make a relative frequency table by finding the percentages

13 Make a Graph Probably a bar chart graphing the frequencies or... A pie chart to graph the relative frequencies Beware of the area principle. Stay 2-D

14 To Make a Graph of Categorical Data Think  Check W’s  Identify the variables  Check to see if categories overlap  Data are counts

15 To Make a Graph of Categorical Data Show  Select the appropriate graph to compare categories  Bar Graph for frequencies  Pie Chart for relative frequencies (percents)  Stacked bar graph can be used instead of a pie chart

16 To Make a Graph of Categorical Data Tell  Interpret the results  Describe the results in the context of the problem  Answers are sentences not numbers

17 Displaying Quantitative Data More Graphs

18 Histograms Think:  Must be quantitative data  Want to see the distribution  Could be counts or percents

19 Stem and Leaf Plots Think  Must be quantitative data  Want to see the distribution  Usually counts  Relatively small sample size

20 Stem and Leaf Plot Show  Scale is usually vertical  Put the ‘Stems’ on the vertical scale  Stems are usually the data without the last digit  Might be rounded  If there are a lot of leaves with one stem make dual stems and put 0-4 on one and 5-9 on the other  Plot the ‘leaves’

21 Dot Plot Think  Must be quantitative data  Want to see the distribution  Usually counts  Relatively small sample size

22 Dot Plot Show  Scale can be vertical or horizontal  Place a dot at the appropriate location

23 Describing the Distribution Tell  Shape  How many humps? Unimodal Bimodal - maybe more than one group thrown together Multimodal  Uniform  Symmetric  Skewed  Gaps  Clusters

24 Describing the Distribution Tell (continued)  Center  What is the middle value  What is the middle range

25 Describing the Distribution Tell (Continued)  Spread  Range = Maximum value - minimum value  Variation: How much does the data jump around

26 Outliers Discuss any data points that do not seem to fit the overall pattern. Is there a logical explanation for them to be that different?

27 Comparing Two Distributions Compare the centers of the two distributions Compare the shapes of the two distributions Compare the spread of the two distributions Compare any extreme values (outliers) of the two distributions.

28 Time Plot Think:  Quantitative data  Looking for trends Show  Time is horizontal scale  Plot data  Connect the dots  Can use calculator

29 Describing Distributions with Numbers

30 Measurements of the Center Mean: The ‘Average’ µ mean of a population mean of a sample Unique Median: The middle score Sort the data Middle score or the average of the middle two scores Unique

31 More Center Measurers Mode: The most common score  Not necessarily unique  Does Not necessarily exist

32 Finding Quartiles Sort the data Find the median The 1st quartile (25% mark) is the median of the smaller half of the data The 3rd quartile (75% mark) is the median of the larger half of the data

33 The Five Number Summary The minimum data point The 1st quartile The median The 3rd quartile The largest data point

34 InterQuartile Range and Outliers Outliers are data points that do not fit the pattern of the distribution. Interquartile range IQR is the difference of the 3rd quartile - the 1st quartile An outlier is a point more that one and half times the IQR below the 1st quartile number or one and half times the IQR above the 3rd quartile

35 Checking for Outliers Find the 5 number summary Calculate the Interquartile Range IQR = 3rd quartile - 1st quartile Lower cut off point = 1st quartile– 1.5(IQR) Upper cut off point = 3rd quartile+ 1.5(IQR) Check for data outside the cut off points

36 The Normal Model Density Curves and Normal Distributions

37 A Density Curve: Is always on or above the x axis Has an area of exactly 1 between the curve and the x axis Describes the overall pattern of a distribution The area under the curve above any range of values is the proportion of all the observations that fall in that range.

38 Mean vs Median The median of a density curve is the equal area point that divides the area under the curve in half The mean of a density function is the center of mass, the point where curve would balance if it were made of solid material

39 Normal Curves Bell shaped, Symmetric,Single-peaked Mean = µ Standard deviation = Notation N(µ, ) One standard deviation on either side of µ is the inflection points of the curve

40 68-95-99.7 Rule 68% of the data in a normal curve at least is within one standard deviation of the mean 95% of the data in a normal curve at least is within two standard deviations of the mean 99.7% of the data in a normal curve at least is within three standard deviations of the mean

41 Why are Normal Distributions Important? Good descriptions for many distributions of real data Good approximation to the results of many chance outcomes Many statistical inference procedures are based on normal distributions work well for other roughly symmetric distributions

42 Standard Normal Curve

43 Standardizing (z-score) If x is from a normal population with mean equal to µ and standard deviation,  then the standardized value z is the number of standard deviations x is from the mean Z = (x - µ)/ The unit on z is standard deviations

44 Standard Normal Distribution A normal distribution with µ = 0 and  = 1, N(0,1) is called a Standard Normal distribution Z-scores are standard normal where z=(x-µ)/

45 Standard Normal Tables Table B (pg 552) in your book gives the percent of the data to the left of the z value. Or in your Standard Normal table Find the 1st 2 digits of the z value in the left column and move over to the column of the third digit and read off the area. To find the cut-off point given the area, find the closest value to the area ‘inside’ the chart. The row gives the first 2 digits and the column give the last digit

46 Solving a Normal Proportion State the problem in terms of a variable (say x) in the context of the problem Draw a picture and locate the required area Standardize the variable using z =(x-µ)/ Use the calculator/table and the fact that the total area under the curve = 1 to find the desired area. Answer the question.

47 Finding a Cutoff Given the Area State the problem in terms of a variable (say x) and area Draw a picture and shade the area Use the table to find the z value with the desired area Go z standard deviations from the mean in the correct direction. Answer the question.

48 Assessing Normality In order to use the previous techniques the population must be normal To assessing normality :  Construct a stem plot or histogram and see if the curve is unimodal and roughly symmetric around the mean


Download ppt "Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:"

Similar presentations


Ads by Google