Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory Data Analysis EDA

Similar presentations


Presentation on theme: "Exploratory Data Analysis EDA"— Presentation transcript:

1 Exploratory Data Analysis EDA
Displaying Quantitative Data with Graphs Chapter 1.2 Stemplots and Back-to-Back Stemplots Lesson Plan: Chapter 1.2 Displaying Quantitative Data with Stemplots and Back to back Stemplots Stemplots Comparing stemplots using SOCS Points to Stress: Categorical variable and quantitative variable display Dotplots are much less abstract than the other plots In real world distribution of data are almost never perfectly symmetric. Do not worry much about a few ups and downs When talking about skew: think of it as being the shape of your Left or Right foot  Material MOTIVATIONAL Socks Time: Warm-up: 20mins Lesson: 40 Group activity: 15 mins Homework begin: pg 41 #37 – 49 odd Bring Graphing Calculator Read ahead on histograms

2 Quantitative Data Display
When displaying Categorical Data we use Pie Charts and/or Bar Graphs To display qualitative data we use Dotplots Stemplots Histograms Box Plots When analyzing the data we will describe the overall pattern (Shape, Center, and Spread) of the distribution

3 A) Dotplot Small datasets with a small range (max-min) can be easily displayed using a dotplot Draw and Label a number line from min to max Place one dot per observation above its value Stuck multiple observations evenly Describe Shape SOCS Shape: peak, cluster, tail direction, symmetry and skewed distribution Center: midpoint, Spread: range min-max Outliers: stand out values from the overall distribution How good was the 2012 Women’s soccer team? With players like Abby Wambach, Megan Rapinow and Hope Solo, the team put on an impressive showing on route to winning the gold medal at the 2012 Olimpics in London. These are the data on the number of goals scored by the team in the 12 months prior to 2012 olimpcs. Draw a Dotplot graph Analyze the min goals and the max

4 S- Shape: Symmetry and Skew
A distribution is roughly symmetric if the right and left sides of the graph are approximately mirror images of each other. A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. It is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. -do not expect perfection; roughly symmetric -another bimodal example is the length of the hair in class Symmetric Skewed-left Skewed-right

5 Comparing Distributions
In statistics we are mostly interested in Comparing two or more groups. Which diet of the two works best? Which North American University should one attend? What improves memory? ….and ofcourse Who gets more detentions per year in CDS: girls or boys?

6 Compare the distribution of Household size for U. K. and South Africa
Compare the distribution of Household size for U.K. and South Africa. Don’t forget your SOCS SHAPE? CENTER (midpoint) SPREAD Use SOCS strategy Do not list the features of eachh graph separately but compare them to each other using comparative words “er” “about the same as”, much greater than Easy to compare these two graphs since they are stacked vertically and have the same scale Household size example Shape: SA are skewed right and unimodal, while UK are skewed left and unimodar Outliser; SA has two outliers (15,26 members), UK has no outliers Center: typical SA household has 6 members which is larger than the UK which typically has 4 members (important not to say “the center of SA – instead it is correct to say “the center of the distribution of SA” Inference for a difference btw two means we will learn how to in Chapter 10 Spread; household in SA have more variability than the UK. In SA range of 23 members while the Uk has a range of about 4 household members OUTLIERS

7 TRY: Energy Cost Example Top VS Bottom Freezers
How do the annual energy costs (in dollars) compare for refrigerators with top freezers and refrigerators with bottom freezers? The data below are from the May 2010 issue of Consumer Reports. Problem: Compare the distributions of energy cost for these two types of refrigerators. Solution: Shape: The distribution for bottom freezers looks skewed to the right and possibly bimodal, with modes near $58 and $70 per year. The distribution for top freezers looks roughly symmetric, with its main peak centered near $55. Center: The typical energy cost for the bottom freezers is greater than the typical cost for the top freezers (midpoint of $69 vs. midpoint of $56). Spread: There is much more variability in the energy costs for bottom freezers. Outliers: There are a couple of bottom freezers with unusually high energy costs (over $140 per year). There are no outliers for the top freezers.

8 B)

9 How MANY pairs of shoes does a typical teenager have?
Random sample of 20 students

10 In the case of males? Split the stem

11 Back-to-Back Stemplots
When comparing use back-to-back stemplots If data is bunched-up split stems Females Males 50 26 31 57 19 24 22 23 38 13 34 30 49 15 51 14 7 6 5 12 38 8 10 11 4 22 35 Females 333 95 4332 66 410 8 9 100 7 Males 0 4 1 2 2 2 3 3 58 4 5 1 2 3 4 5 “split stems” Key: 4|9 represents a student who reported having 49 pairs of shoes.

12 AP Exam Common Errors When describing a distribution students forget to address all 4 characteristics of SOCS When comparing not explicitly comparing the characteristics . Discussing the SOCS for each distribution separately WILL NOT give partial credit Use phrases like “about the same as” Is much greater than When making stemplots: forgetting the Key or labels

13 TRY: Back-to-Back Stemplot
Who’s Taller? Who is taller, males or females? A sample of 14-year-olds from the United Kingdom was randomly selected using the CensusAtSchool Web site. Here are the heights of the students (in cm): Male: 154, 157, 187, 163, 167, 159, 169, 162, 176, 177, 151, 175, 174, 165, 165, 183, 180 Female: 160, 169, 152, 167, 164, 163, 160, 163, 169, 157, 158, 153, 161, 165, 165, 159, 168, 153, 166, 158, 158, 166

14 TRY these ones: #1 Correct Answer
In general, it appears that females have more pairs of shoes than males. The median report for the males was 9 pairs while the female median was 26. The females also have a larger range of 57−13 = 44 in comparison to the range of males, which is 38 − 4 = 34. Finally, both males and females have distributions that are skewed to the right, though the distribution for the males is evidenced by the three likely outliers at 22, 35, and 38. The females do not have any likely outliers. #2,3,4 is b)

15 Group Work: MOTIVATING SOCS

16 Homework!!! pg- 41 Must Bring your Graphing Calculator
#37 - #49 odd Must Bring your Graphing Calculator Read ahead if you can: page 33-40


Download ppt "Exploratory Data Analysis EDA"

Similar presentations


Ads by Google