Presentation is loading. Please wait.

Presentation is loading. Please wait.

Module 7 to 10: Summarizing Data Graphically & Numerically

Similar presentations


Presentation on theme: "Module 7 to 10: Summarizing Data Graphically & Numerically"— Presentation transcript:

1 Module 7 to 10: Summarizing Data Graphically & Numerically
OLI, Concepts in Statistics

2 Here’s some data… Below are two sets of data. What could each represent? What couldn’t each represent? Data Set #1: 98, 94, 92, -31, 98, 93, 95, 97, 98, -98 Data Set #2: 4.0, 6, 7.1, 7.1, 7.2, 7.4, 7.7, 7.8, 12 Each group share out

3 Data is/are… Observations that you or someone else records
But data is/must be more than just numbers; it is numbers in context ... the story behind the numbers... That’s where statistics come in... According to textbook author Dr. Robert Gould, “The goal of statistics is finding meaning in data.”

4 Data, data, data... Who collects data? What data is collected?
Why do we/they collect data?

5 Think of an interesting question that involves 1 or 2 of the following topics...
Spiral back to gathering data for surveys, polls, experiments, etc. Dr. Gould, UCLA

6 It begins with a question...
Dr. Gould, UCLA

7 Remember, Context is Key!
Always, always comment, answer, compare, contrast… whatever the case.. in context! How can we find meaning if we don’t have context? Goal of statistics is finding meaning in data; interpret What are the objects? What was measured? What are the units of measure? Think about how we started our class tonight with the 2 data sets I gave you...

8 Two types of variables…
Can we organize data into two basic types of data, two different types of variables? How?

9 Sometimes it’s in camo…
Can you think of a categorical data that looks like numerical data… but it isn’t. It’s really categorical. Discuss for a minute…

10 Sometimes it’s in camo…
Can you think of a categorical data that looks like numerical data… but it isn’t. It’s really categorical. Discuss for a minute… Always ask yourself, does finding the mean (average) of this data make sense?

11 How many... ... contacts do you have in your phone?
Write anywhere; no need to organize in any special way. If you are male, please use a blue marker If you are female, please use a black marker

12 Observations? One minute to talk to the person next to you about one observation you can make about our data; be prepared to share out your observation

13 Observations... First, it’s always helpful to ...
Second, and probably more importantly, it’s always helpful to ...

14 Graphical representations...

15 Graphical representations...
Talk to the person next to you for 2 minutes. What type of graphical representation would you choose to best represent this data and why (you do not actually have to create the graphical representation at this time). Be prepared to explain/justify your reasoning/your choice. Share out.

16 Graphical representations for numeric (or quantitative) data include...
Dot plots Stem (and leaf) plots Histograms Box plots (later...) (and much later) ... Density curves, scatter plots, least-squares regression lines, Normal probability plots, etc. Why didn’t I list pie charts or bar charts/graphs?

17 No matter what... We always want to create a graphical representation; visuals help us process information, indentify trends more easily We always label & scale our graphical representations We always use technology when available (no need to create graphical representations by hand)

18 Lets create some graphical representations using our class data...
Dot plot... What’s good about dot plots? What’s not so good? Histogram... What’s good about histograms? What’s not so good?

19 Stem (and leaf) plots... What’s good about stem plots? What’s not so good?
Box Plots... What’s good about box plots? What’s not so good? Will learn much more about box plots in a little bit...

20 Let’s practice using big data sets...
Go to my website, click on COC Math 140 Survey Data spreadsheet. Find the column ‘How much do you weigh (in pounds).’ Copy and past into a column in Stat Crunch. Create a histogram, a stem plot, a dot plot, and a box plot of this data. Be sure to label your graphical representation. Looking at your graphical representation, what can you say about the distribution/the data? Be prepared to share out one thing you observe in the graph. Review... What are the strengths and/or weaknesses of each graphical representation? CI’s, Hyp testing, sampling distributions, etc. What’s likely? Unlikely?

21 Note on histograms... Frequency vs. Relative Frequency

22 Let’s look at Graphical representation of ‘weights’...
No matter which graphical representation you created with this data set, how did we describe the graphical representations? What types of characteristics did we consider when trying to describe the graph of this data?

23 SOCS... S – Shape. Symmetric? Skewed? Uni-Modal,
bi-modal, tri-modal, multi-modal? Gaps? O– Outlier(s) Is/are there unusually large or small values that are “away” from the majority of the rest of the data? C – Center What is the “typical*” value of the distribution/data? S – Spread Typically/on average*, how far apart or close together is the data/distribution? * Different types of ‘averages’ and ‘typical’. Will discuss further and in detail soon.

24 Let’s describe our data using SOCS...
Practice: Lets look at our contacts data with a histogram, dot plot, stem plot, and box plot; & describe the distribution using SOCS. What type of values would you consider ‘likely?’ Unlikely? If we consider ourselves a representative sample of typical COC students, what type of a statement would you feel confident in making (based on this data) about ALL COC students regarding cell phone contacts? Now let’s look at the ‘weights’ data and describe the distribution using SOCS. Look at a histogram, a dot plot, etc.

25 What types of graphical representations can we use? What can we not use? Why?
Type of first pet ... or favorite social media, favorite app for cell phone, hair color, make of car you drive, marital status, etc.

26 Graphical representations for categorical (qualitative) data...
Bar (charts) graphs (caution; very different from histograms; why?)

27 Caution... Bar graphs vs. histograms...
On left is bar graph; on right is histogram Be sure you understand the difference between the two graphical representations

28 Now back to Graphical representations for categorical (qualitative) data...
Bar (Charts) Graphs Pie Charts BIG IDEA... the same... visualizing data can be helpful in observing trends Can we analyze pie charts or bar graphs with SOCS? Why or why not? Whether categorical or numerical, always good to graph your data

29 Let’s graph SOME data... Let’s go to the Math 140 data set, and choose a set of categorical data; cut and paste into Stat Crunch; create bar graph and pie chart; make observations; ask questions

30 Deception… watch for it…

31 What’s wrong?

32 Deception…

33 … fixed ... Sort of....

34 Different bin widths in histograms... Not a good thing –very deceptive

35 More deception…

36 Deception… with the data, and the graphical representation…

37 Class/Group Activity…
Form groups of 3 randomly (how would we like to do this?) First, answer the BEFORE THE ACTIVITY questions below (1 paper for the whole group): 1. Do you think men and women will have different personal distances? Why? Will the larger distances be specified by the men or the women? 2. Which group do you think will have distances that are more spread out? 3. What do you think the shape of each of the distributions will be? Each group will have a measuring tape. The first person stands (preferably in front of a wall) and imagines that she or he is at an ATM getting cash. The second student stands behind the first. The first student tells the second student how far back he or she must stand for the first student to be just barely comfortable, saying for example, “Move back a little, now move forward just a tiny bit,” and so on. When that distance is set, the third student measures the distance between the hell of the first person’s right shoe to the toe of the second person’s right shoe. That will be called the ‘personal distance.’ For each student in your group, record the gender and personal distance. Write each of these personal distances on the board. Use blue for male and black for female. Note: Be respectful of other people’s personal space. Do not make physical contact with other students during this activity.

38 Class/Group Activity…
Input data into Stat Crunch 1-2 paragraph write up which answers the question, “Do men and women have different personal distances?” Include graphs (justify your group’s choice of graph) & numerical analysis (SOCS) of data/graphs (from Stat Crunch; cut and paste) All members of group must contribute Maximum points possible: 20 project points. From Robert Gould, Introductory Statistics

39 Modules 8, 9, & 10

40 Module 7... Four Corners: Go to your corner based on if your birthday falls in the Winter, Spring, Summer, or Fall; 1 minute In your group, come to a consensus about the three most important topics we learned and list them on the board. 5 minutes.

41

42 SOCS... Shape, Outlier(s), Center, Spread
We loosely defined ‘center’ and ‘spread’ Now we will be much more specific & detailed ... And remember, always embed context Here we go  ...

43 Word association time... When I say a word, you immediately write down what you think it means; don’t think, just write. Don’t talk; don’t say anything to anyone. Ready?

44 Word association time... Average

45 Patrons in a diner... $45,000 $48,000 $52,000 $40,000 $35,000 $58,000
The annual salaries of 7 patrons in a diner are listed below. Find the mean and the median using Stat Crunch Are the mean and the median similar? Would they represent a ‘typical’ or ‘average’ customer’s salary? Should we use the mean or the median in this case? Graph the data (let’s practice a histogram; then a box plot) using Stat Crunch. What shape is the distribution? $45,000 $48,000 $52,000 $40,000 $35,000 $58,000 $46,000

46 Now, Bill Gates walks into the diner...
Find the mean and the median using Stat Crunch Are the mean and the median similar? Would both or either represent a ‘typical’ or ‘average’ customer’s salary? Should we use the mean or the median in this case? Graph the data (histogram; box plot) using Stat Crunch. What shape is the distribution? $45,000 $48,000 $52,000 $40,000 $35,000 $58,000 $46,000 $3,710,000,000

47 What’s the moral of this story?
Means are excellent measures of central tendency if the data is (fairly) symmetric However, means are highly influenced by outlier(s) So, if the data has an outlier(s), then a better measure of central tendency is the median, which is not influenced by outliers; this is called ‘resistant’ So, consider the shape of data/distribution, then wisely choose an appropriate measure of central tendency

48 Which measure of central tendency should we use?
.

49 Which is larger: mean or median
Which is larger: mean or median? Which should we use to describe the ‘typical’ or middle value?

50 The ‘C’ in SOCS So, when we are analyzing a numerical distribution (like looking at a histogram, stem plot, box plot, etc.), we need to wisely choose which ‘C’ to use... mean or median Generally, if symmetric use mean (or median) as a measure of central tendency; they will be similar in value (or the same) If skewed (left or right) use median as a measure of central tendency; why?

51 Measures of Spread What is the median of each of the following data sets? What is the mean of each? (4, 4, 5, 6, 6) (5, 5, 5, 5, 5) Are they the same distribution/data set? Another characteristic that is helpful in describing distributions/data sets is the measure of spread (or the typical distance from the center)

52 Spread... The second ‘S’ in SOCS
Another characteristic that is helpful in describing distributions/data sets is the measure of spread (or the typical distance from the center) Two measures of spread that we will focus on in this course are the standard deviation & inter-quartile range

53 Standard Deviation is ... a typical distance of the observations from their mean is a number that measures how far away the typical observation is from the center of the distribution

54 Let’s play the standard deviation game...
Your team’s task: Create a data set of four whole numbers (from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10) with the lowest standard deviation value possible Input your four numbers (again use numbers from 0 to 10 only) into Stat Crunch, then calculate the standard deviation Change a value or values until you get the lowest possible standard deviation you can. 3 minutes. Go. Now create a data set (again only from 0 to 10) with the largest possible standard deviation.

55 Which distribution has the largest standard deviation? Why?

56 Calculating the standard deviation...

57 Variance... Another measure of spread
Not used very often; usually, if we use a mean as a measure of central tendency, we use the standard deviation as our measure of spread Variance is related to standard deviation variance = (standard deviation)2 standard deviation =

58 Data gathering time again...
# siblings you have on board & enter into Stat Crunch; OR we can use ‘Age in Years’ from the Math 140 data... Which do you want to do? Numerical analysis (statistical summary in Stat Crunch) and graphical representation Describe the distribution

59 Skewed? Shouldn’t use mean & standard deviation...
But we still need to describe the center and the spread of the distribution Center: Use median Spread: Inter-quartile Range (IQR); Q3 – Q1; space the middle 50% of the data occupy Median & IQR are not effected by outlier(s) (resistant)

60 Range of data... Another measure of variability (used with any distribution) is range Range = maximum value – minimum value Range for our data =

61 Boxplots ...based on 5-number summary

62 Boxplots...

63 Modified boxplot – shows outlier(s)

64 Two modified boxplots...

65 What are outliers? Boxplots are the only graphical representation where we specifically define an outlier Potential outliers are values that are more than 1.5 IQRs from Q1 or Q3 IQR x 1.5; add that product to Q3; any value(s) beyond that point is an outlier to the right Q1; any value(s) beyond that point is an outlier to the left

66 Go back to our data... Using Stat Crunch, calculate descriptive statistics Let’s calculate (by hand) to see if we have any outliers Q3 – Q1 = IQR IQR x 1.5; add this product to Q3; are there any values in our data set beyond this point to the right? IQR x 1.5; subtract product from Q1; are there any values in our data set beyond this point to the left? Now use Stat Crunch to create a boxplot; are our calculations confirmed with our boxplot?

67 Be careful with outliers...
Are they really an outlier? Is your data correct? Was it input accurately? COC’s recent 99-year-old graduate Don’t automatically throw out an unusual piece of data; investigate

68 Be careful... one more thing...

69 Partner Practice ...

70 Your turn... In pairs, choose a set of data from the Math 140 spreadsheet that is skewed (to left or right); you probably won’t know if the data is skewed until you copy and paste into Stat Crunch and create a graph Create a box plot; print out; put your names on it Label (on the graph) the 5-number summary (with arrows pointing to each value on the graph) Analyze through SOCS (which measure of central tendency should you use? Which measure of spread should you use?); be sure you show your work to justify that a point/points are outliers Now, using the same data, create a histogram. What characteristics of the data does the histogram show that the box plot does not?

71 Classifying Summary Statistics...
1. For each of the following sample statistics, classify it as a measure of spread (variability), a measure of center (average), or a measure of position. Then write a sentence describing what the statistic tells us. a) Mean b) Standard Deviation c) Minimum d) Range e) Median f) Quartile 3 (Q3) g) Interquartile Range (IQR) h) Maximum i) Quartile 1 (Q1) j) Mode k) Variance 2. Which measure of center is the most accurate for symmetric data sets? Which is the most accurate for skewed data sets? 3. Which measure of spread is the most accurate for symmetric data sets? Which is the most accurate for skewed data sets? 4. Use Stat Crunch and the Bear data to find all of the summary statistics we discussed for bear weights. You need to give the name of the statistic, the number and the units.

72 Let’s talk about Exam #1... Will cover Module 1 through Module 10
Topic review sheet on my website


Download ppt "Module 7 to 10: Summarizing Data Graphically & Numerically"

Similar presentations


Ads by Google