Presentation is loading. Please wait.

Presentation is loading. Please wait.

Definitions Data: A collection of information in context.

Similar presentations


Presentation on theme: "Definitions Data: A collection of information in context."— Presentation transcript:

1 Definitions Data: A collection of information in context.
DAY 1 Definitions Data: A collection of information in context. Population: A set of individuals that we wish to describe and/or make predictions about. Individual: Member of a population. Variable: Characteristic recorded about each individual in a data set.

2 Be sure to label the graph and use proper scales.
Types of Data Categorical – places individuals into groups (sometimes referred to as qualitative) Example: gender, eye color, zip code, dominant hand Quantitative – consists of numerical values (it makes sense to find an average) Example: height, weight, income, vertical leap Graphs for Categorical data are Bar graphs and Pie charts Be sure to label the graph and use proper scales.

3 Making a Bar Graph Here are the distributions for car colors in North America in 2008. Color Percent of Vehicle White 20 Black 17 Silver 17 Blue 13 Gray 12 Red 11 Beige/Brown 5 Green 3 Yellow/Gold 2 a.) What percent of vehicles had colors others than those listed? b.) Display the data in a bar graph. c.) Would it be appropriate to make a pie chart? If not explain why. If it is appropriate for this data then create one.

4 Tips for Making a Bar Graph
The vertical axis should be labeled with values. The horizontal axis should be label with categories. Make sure to label values and categories equally on their axes. There should be spaces between the individual bars.

5

6

7 Tips for Making the Pie (or Circle) Graph
You must convert the percent to degrees by multiply the percent value by 360. Draw a radius somewhere on the circle and start there and work your way around the circle counter-clockwise until you get back to the radius you started with.

8

9 Measuring Centers Mean (“average”) – Add together all of the data values and divide by the number of observations X = X1 + X2 + X3 + … + Xn n Sum of observations Number of observations = Median – it is the number that is in the middle of the given data, once it is placed in numerical order (least to greatest) 1.) Place data in numerical order 2.) If there are an odd number of observations, the median is the center observation in the ordered list. 3.) If there are an even number of observations, the median is the average of the two center observations in the ordered list. Mode – the value that appears the most in the observations

10 If the Mean, Median and Mode for the following observations
78, 98, 48, 63, 84, 100, 95, 86, 91 ,87, 48, 94, 94, 89 ,95, 95, 97, 41 ,65, 85 Mean = 20 = Median: 41, 48, 48, 63, 65, 78, 84, 85, 86, 87, 89, 91, 94, 94, 95, 95, 95, 97, 98, 100 median is = 88 2 Mode is 95

11 Use the data set below to complete each table
Measure of Center Brief Definition Value Mean Median Mode

12 Graphs for Quantitative Data
DAY 2 Graphs for Quantitative Data Dotplots – each data value is shown as a dot above its location on a number line. Stem and Leaf Plots – The stem is the larger place value and the leaves are the smaller place values. This graph is used to give a description of the distribution while using the actual values. Important to have a key for the reader.

13 Describing Data Graphically
Histograms – graph of the distribution, using ranges. No spaces between the bars on the graph (unless there is no data value observed in that range). There is no set rule to how to set up the ranges, start with 6. Box and Whisker – give you a graph of the five number summary. Minimum, Quartile 1, Median, Quartile 3, Maximum

14 Dotplots A dotplot is a graph where dots are used to represent individual data points. The dots are plotted above a number line. Dotplots can be used to represent frequencies for categorical or quantitative data. Dotplots can be used to see how data items compare.

15 Draw a Dotplot Draw a dotplot for the data set below. 25, 25, 20, 25, 16, 20, 25, 30, 25, 31, 26, 28, 30

16 Making a Stem and Leaf Plot
1.) Separate each observation into a stem, consisting of all but the final digit, and the leaf, the final digit. 2.) Write the stems in a vertical column with the smallest on top and draw a vertical line at the right of this column. (Do not skip any stems even if there is not an observation for that particular stem.) 3.) Write each leaf in the row to the right of its stem. (The leave should be listed in increasing order out from the stems.) 4.) Provide a key that explains in context what the stems and leaves represent.

17 Make a Stem and Leaf Plot
35, 15, 20, 25, 16, 20, 9, 25, 30, 25, 31, 36, 28, 30, 9, 16, 18 Stem Leaf Key: means 15

18 Boxplots Min Q Median Q Max Lower Upper Quartile Quartile

19 Finding First and Third Quartile
The first quartile lies one quarter of the way up the list. (25% of the data is below the first quartile.) The third quartile lies three quarters of the way up the list. (75% of the data is below the third quartile.) 1.) Find the median to divide the data is half. 2.) Find the “median” of the lower half, this point will be the first quartile. 3.) Find the “median” of the upper half, this point will be the third quartile. ***In the case of two data points being either the first or third quartiles, then use same method as the median (add together and divide by 2)

20 Measures of Spread How much do values typically vary from the center?
Range - is the difference of the maximum and minimum value - spread of the entire data set Interquartile Range (IQR) - is the difference of the upper quartile (Q3) & the lower quartile (Q1) – spread of the middle 50% of the data

21 Find the minimum, maximum, median (Q2), lower quartile (Q1), and upper quartile (Q3) for the following sets of data and draw the Box and Whisker plot (or Boxplot) 32, 40, 35, 29, 14, 32 6, 1, 7, 6, 5, 5, 0, 1, 0, 8, 4 121, 143, 98, 144, 165, 118

22 Now, check the previous data for outliners.
Checking for Outliers An outlier is data point that is extremely far away from the rest of the data and may effect some of the measurements we take from that data. An outlier is any point that is farther than 1.5 x the IQR from the first or third quartile. Now, check the previous data for outliners.

23 What Is a Histogram? A histogram is a bar graph that shows the distribution of data. A histogram is a bar graph that represents a frequency table. The horizontal axis represents the intervals. The vertical axis represents the frequency. The bars in a histogram have the same width and are drawn next to each other with no gaps.

24 Constructing a Histogram
Step 1 - Count number of data points Step 2 - Compute the range Step 3 - Determine number of intervals (5-12) Step 4 - Compute interval width Step 5 - Determine interval starting & ending points Step 6 - Summarize data on a frequency table Step 7 - Graph the data

25 The data below shows the number of hours per week spent playing sports by a group of students.
What is the minimum, maximum, & range? Make a frequency tables using intervals you decide on. Draw a histogram. 2 7 17 9 6 13 8 4 5 12 3 11 1 15

26 Hours per week playing sports Frequency
Hours per week playing sports Frequency

27 Describing Distributions
Center Shape Spread Outliers

28 Center For describing the center of the data, we can use either the Median or the Mean. Both are useful, in cases where the data is skewed one way the Median is a better choice because it is more resistant to outliers.

29 Shape “Symmetrical/Normal” (mound shaped) “Skewed Left”
(extreme low values) “Skewed Right” (extreme high values) “Uniform”

30 Spread Even though we are not graphing a Box and whisker we still use these two measures of spread. Range - is the difference of the maximum and minimum value - spread of the entire data set Interquartile Range (IQR) - is the difference of the upper quartile (Q3) & the lower quartile (Q1) – spread of the middle 50% of the data We can also use the Standard deviation

31 DAY 3 Regression Line A regression line is a line that describes how a response variable y (sometimes referred to as the dependent variable) changes as an explanatory variable x (sometimes referred to as the independent variable) changes. We often use a regression line to predict the value of y for a given value of x. In Algebra, it was called the “Line of Best Fit”. In statistics, we use the equation y = a + bx, where b is the slope and a is the y-intercept or predicted value of y when x = 0.

32 Finding the regression line
We will be using the calculator to find the regression line 1.) Enter data into the STAT function of the calculator. 2.) Turn Stat Plot on (may have to use ZoomStat to see the values on the graph) 3.) Press STAT, arrow over to Calc and down to option 8:LinReg(a+bx) 4.) Press enter twice. 5.) Press Graph

33 Use the following values to find the regression line for the data

34 Y = x Per Pupil Expenditures

35 Using Regression Line to make Predictions
We usually only make prediction for values inside of the domain (x-values) that are given in the problem. It is not valid to assume these values continue infinitely in both directions, even though the Regression line does. To make a prediction: 1.) Press TRACE 2.) Choose option 1:Value 3.) Enter in X-value you want to make the prediction for 4.) Press Enter You may have to adjust the window to get your value.

36 Make Predictions For the previous example predict the following values. Estimate the expenditures per pupil for a state with an average salary of $32080. 2. Estimate the expenditures per pupil for a state with an average salary of $27250. Estimate the expenditures per pupil for a state with an average salary of $40500. 4. Estimate the state salary in a state where the per pupil expenditures are $6500.

37 Correlation, r The correlation, r, measures the direction and strength of the linear relationship between two quantitative variables. These values fall between -1 and 1, inclusive. R < 0, tells us a negative association between the variables R > 0, tells us a positive association between the variables R = 0, tells us there is a very weak linear relationship The closer the values are to 1 or -1, the stronger the linear relationship between the variables. When the points form an exact straight line the r values are either -1 or 1, this very rarely (almost never) happens in real world observations.

38 Finding the Correlation Coefficient, r
The calculator will find this value. 1.) Press 2nd and 0, to pull up the catalog 2.) Arrow down until you see: DiagnosticOn 3.) Press enter twice 4.) Find LinReg again and this time it should give you values for r and r2 Very Strong Very Strong Strong Moderate Moderate Strong Weak Find the correlation coefficient for our example and interpret its strength. R = .8582, These two variables have a strong positive correlation.

39 Standard Deviation is another measure of spread
DAY 6 Standard Deviation is another measure of spread is the typical amount that a data value will vary from the mean the larger the standard deviation, the more "spread out" the data set (the data points are far from the mean ) the smaller the standard deviation, the less "spread out" the data set (the data points are clustered closely around the mean.)

40 Calculating Standard Deviation
Step 1 – Find the mean of the data Step 2 – Subtract the mean from each value Step 3 – Square each of these values Step 4 – Find the sum of the squares Step 5 – Divide the sum by the number of data items minus 1 - called the “variance”. Step 6 – Take the square root of this number

41 Standard Deviation Find the standard deviation for this set of data.
3, 5, 5, 7, 8, 9, 9, 10 mean = variance = standard = deviation Value Deviation from mean Squared Deviation 3 5 7 8 9 10 3 – 7 =-4 5 – 7 = 5 – 7 = -2 7 – 7 = 0 8 – 7 = 1 9 – 7 = 2 10 – 7 = 3 16 4 = 42 = 6 (8 – 1) 9 √6 = 2.45

42 Find the Standard Deviation for the Following Data
Example #1 1, 3, 4, 4, 4, 5, 7, 8, 9 Example #2 338, 318, 353, 313, 318, 326, 307, 317 Remember to find the mean first. Standard deviation = 2.55 Standard deviation = 14.99

43 “Normal Distribution”
A normal distribution is a distribution where the mean is at the middle of a symmetric distribution of the observations. Normal distributions are good descriptions for some distributions of real data Normal distributions are good approximations to many kinds of data. All area under the curve equals 1 “All are wrong, Some models are useful” smartest people

44 Empirical Rule 68% of the data will be within +/- 1 standard deviation of the mean. 95% of the data will be within +/- 2 standard deviations of the mean. 99% of the data will be within +/- 3 standard deviations of the mean. 13.5% 13.5% 2 2

45 Making a “Normal Curve”
We need two pieces of information to create the Normal curve. The mean for the observations The standard deviation for the observations. For the following set of observations, find the mean and the standard deviation 2 7 16 9 6 13 8 4 5 12 3 11 1 15

46 Using the Calculator 1.) Go to Stats Function and enter values into L1 column 2.) Press Stat button, go to CALC and choose 1:1-Var Stats 3.) Press enter twice 4.) x = is the mean 5.) Sx = is the standard deviation

47 7.88 mean Mean = 7.88 Standard deviation = 4.53
Mean = 7.88 Standard deviation = 4.53 2% % % % % 2%

48 7.88 mean Mean = 7.88 Standard deviation = 4.53
2% % % % % % 1.) What percent of the points are more than one standard deviation above the mean? 2.) What percent of the points are more than one standard deviation away from the mean? 3.) What percent of the points are less than one standard deviation below the mean?

49 Making a “Normal Curve”
We need two pieces of information to create the Normal curve. The mean for the observations The standard deviation for the observations. For the following set of observations, find the mean and the standard deviation 2 7 16 9 6 13 8 4 5 12 3 11 1 15

50 What if the values are not exactly on one of the standard deviations?
DAY 7 What if the values are not exactly on one of the standard deviations? 1.) What percent of the values are below 9? 2.) What percent of the values are above 4? 3.) What percent of the values are between 10 and 2? First, we must standardized the data by changing the data to z-scores. Once, the scores are all standardized the graph will have 0 at the mean and markings will be shown by the standard deviations. 7.88 mean 2% % % % % %

51 We standardize the scores
To measure percentiles Compare data from two different distributions Easy to compare data measured differently since both sets are standardized (centimeters vs inches) To find information about the values that fall between the standard deviations

52 Reading Table A 1.) Convert the actual value to a z-score
2.) Look down left column to find the ones digit and tenths of your z-score. 3.) Look across to find the hundredths digit of your z-score. 4.) Follow row and column to decimal answer, which is your probability Find the probability for the given z-scores 1.) 0.56 2.) 3.10 3.) 1.93 4.) 2.04 = = = =

53 1.) What percent of the values are below 9?
2.) What percent of the values are above 4? 3.) What percent of the values are between 10 and 2? 1.) First, change the data point to its standardized value z = x - µ µ is the mean Sx Sx is the standard deviation = 9 – 7.88 4.53 = .25, so the value 9 is .25 standard deviations above the mean. 2.) Use Table A to find the proportion of values that are less then or equal to .66. .1 .3 So, this means that (or 59.87%) of the data falls below the value 9. Now, answer parts 2 and 3.

54 2.) Value above 4 Z = 4 – 7.88 4.53 = -.86 Table A value is .1994 So, 19.94% is BELOW 4. Subtract = .8006, which means 80.06% is above the value of 4. 3.) What percent of the values are between 10 and 2? Z = 10 – Z = 2 – 7.88 = = Table A value = .5840 58.40% of the values are between 10 and 2.


Download ppt "Definitions Data: A collection of information in context."

Similar presentations


Ads by Google