1
**Categorical Data Example: Marada Inn**

Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are: Above Average Below Average Poor Average Below Average Above Average Average Average Above Average Below Average Poor Excellent 3 9 5 2 1

2
**Categorical Data Example: Marada Inn Poor Below Average Average**

Above Average Excellent Rating Frequency 2 3 5 9 1 Total

3
**Categorical Data Example: Marada Inn Relative Frequency Percent**

Poor Below Average Average Above Average Excellent Rating Frequency 2 .10 10 15 25 45 5 100 3 .15 5 .25 9 .45 1 .05 Total 1.00

4
**Marada Inn Quality Ratings**

Categorical Data Marada Inn Quality Ratings 1 2 3 4 5 6 7 8 9 10 Frequency What is the scale of Marada Inn Quality Rating? Nominal When a variable has nominal scale, the shape of the distribution is meaningless. When a variable has ordinal scale, the shape of the distribution is important. Rating Poor Below Average Average Above Average Excellent

5
**Marada Inn Quality Ratings**

Categorical Data Marada Inn Quality Ratings Excellent 5% Poor 10% Below Average 15% Above Average 45% INSIGHTS: One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent.” This might please the manager. For each customer who gave an “excellent” rating, there were two customers who gave a rating of “poor.” This should displease the manager. Average 25%

6
**Customer 5 purchased 2 items,**

Customer 5 purchased 2 items, … which cost her $54

7
**Pelican Stores data_pelican.xls**

Pelican Stores is chain of women's apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican's management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report Using graphs and tables, summarize the qualitative variables. A large majority of the customers use National Clothing's proprietary credit card. The overwhelming majority of customers are female. Most of the customers are married. data_pelican.xls

8
Quantitative Data Example: Hudson Auto Repair The manager of Hudson Auto would like to have better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar. 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 79 76 104 74 68 105 77 65 109 85 88 83 67 82 98 101 73

9
**Quantitative Data sorted minimum maximum 52 57 62 65 66 67 68 69 71 72**

Example: Hudson Auto Repair minimum 52 57 62 65 66 67 68 69 71 72 73 74 75 76 77 78 79 80 82 83 85 88 89 91 93 97 98 99 101 104 105 109 maximum

10
**Quantitative Data Cost ($) Frequency 50-59 60-69 70-79 80-89 90-99**

2 13 16 7 7 5 52 57 62 65 66 67 68 69 71 72 73 74 75 76 77 78 79 80 82 83 85 88 89 91 93 97 98 99 101 104 105 109 Frequency distribution Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes. Class Width = ( )/6 ~ 10

11
**Quantitative Data Cost ($) Frequency Relative Freq Percent Freq 50-59**

60-69 70-79 80-89 90-99 2/50 2 .04 4 13/50 13 .26 26 16/50 16 .32 32 7/50 7 .14 14 7/50 7 .14 14 5/50 5 .10 10 1.00 100 50

12
**Quantitative Data Tune-up Parts Cost Frequency Parts Cost ($) 18 16 14**

2 4 6 8 10 12 14 16 18 Frequency Parts Cost ($)

13
Quantitative Data Symmetric Moderately Skewed Left Highly Skewed Right

14
**Quantitative Data Ogive for Hudson Auto Repair Parts Cumulative Parts**

Cost ($) Parts Frequency Cumulative Frequency < 60 < 70 < 80 < 90 < 100 < 110 Cost ($) 50-59 60-69 70-79 80-89 90-99 2 13 16 7 5 2 15 31 oh - jive 38 45 50 50

15
**Quantitative Data Ogive for Hudson Auto Repair Cumulative Cumulative**

Relative Frequency Cumulative Percent Frequency < 60 < 70 < 80 < 90 < 100 < 110 Cost ($) Cumulative Frequency 2 15 31 38 45 50 4 30 62 76 90 100 .04 .30 .62 .76 .90 1.00

16
**Quantitative Data Example: Hudson Auto Repair Tune-up Parts Cost**

($110, 100%) Tune-up Parts Cost 20 40 60 80 100 ($100, 90%) ($90, 76%) Cumulative Percent Frequency ($50, 0%) ($80, 62%) ($70, 30%) Parts Cost ($) ($60, 4%)

17
**Pelican Stores -- continued**

Managerial Report Using graphs and tables, summarize the qualitative variables. Using graphs and tables, summarize the quantitative variables. Over half of the customers purchase 1 or 2 items, but a few make numerous purchases. The percent frequency distribution of net sales shows that 61% of the customers spent $50 or more. Customers are distributed across all adult age groups. data_pelican.xls

18
**Summarizing Two variables**

Price Range Colonial Log Split A-Frame Total < $99,000 > $99,000 Home Style 55 45 30 20 35 15 100 Example: Finger Lakes Homes.xls The number of Finger Lakes homes sold for each style and price for the past two years is shown below. qualitative variable Quantitative variable

19
**Summarizing Two variables**

Home Style Price Range Colonial Log Split A-Frame Total < $99,000 > $99,000 55 45 Total 30 20 35 15 100 Example: Finger Lakes Homes.xls

20
**Summarizing Two variables**

Home Style Price Range Colonial Log Split A-Frame Total < $99,000 > $99,000 55 45 Total 30 20 35 15 100 Price Range Home Style Colonial Log Split A-Frame Total 0.3273 0.1091 0.3455 0.2182 < $99,000 > $99,000 1.0000 0.2667 0.3111 0.3556 0.0667 1.0000

21
**Summarizing Two variables**

Home Style Price Range Colonial Log Split A-Frame < $99,000 > $99,000 Total 30 20 35 15 Price Range Home Style Colonial Log Split A-Frame 0.6000 0.30 0.5429 0.8000 < $99,000 > $99,000 0.4000 0.70 0.4571 0.2000 Total 1.0000 1.0000 1.0000 1.0000

22
**Male acceptance rate is higher when data is aggregated.**

Summarizing Two variables The crosstabulation for the aggregated UC-Berkley data is Admitted Denied Total Male 3738 4704 8442 Female 1494 2827 4321 5232 7531 12763 Male acceptance rate is higher when data is aggregated. Dividing all of the frequencies above by the number of observations yields what the joint probability table below Simpson’s Paradox Example: In the Fall of 1973, an observational study on possible gender bias was conducted at UC-Berkeley. There were 12,763 applicants for graduate admission, of whom 8442 were male and 4321 were female. Of the male applicants, 3738 were admitted, whereas of the female applicants only 1494 were admitted. Admitted Denied Total Male 0.2929 0.3686 0.6614 Female 0.1171 0.2215 0.3386 0.4099 0.5901 1.0000

23
**Summarizing Two variables**

Male Admitted Denied Total A 512 313 825 B 207 520 1345 Female Admitted Denied Total A 89 19 108 B 17 8 25 106 27 133 Compute the row percentages to show the Simpson’s Paradox Male Admitted Denied Total A 0.6206 0.3794 1.0000 B 0.6019 0.3981 Female Admitted Denied Total A 0.8241 0.1759 1.0000 B 0.6800 0.3200 At UC-Berkeley graduate admission decisions are made at the department level. In 1973, UC-Berkeley had 101 different graduate departments. For simplicity, we look at only the 2 largest departments. The tables represent the top two departments (A and B). data_simpson.xls

24
**Summarizing Two variables**

A Negative Relationship QBigMacs y 21 2 x PBigMacs 0.50 5.00

25
**Summarizing Two variables**

No Apparent Relationship QNoseHairTrimmers y x PBigMacs

26
**Summarizing Two variables**

Example: Panthers Football Team The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 1 3 2 14 24 18 17 30

27
**Summarizing Two variables**

y 5 10 15 20 25 30 35 Number of Points Scored The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. Higher points scored are associated with a higher number of interceptions The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. x 1 2 3 4 Number of Interceptions

28
**Pelican Stores -- continued**

Managerial Report Using graphs and tables, summarize the qualitative variables. Using graphs and tables, summarize the quantitative variables. Using pivot tables and scatter plots, summarize the variables. From the crosstabulation it appears that net sales are larger for promotional customers. Age is not a factor in determining net sales. data_pelican.xls

