Categorical Data Example: Marada Inn

Presentation on theme: "Categorical Data Example: Marada Inn"— Presentation transcript:

Categorical Data Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are: Above Average Below Average Poor Average Below Average Above Average Average Average Above Average Below Average Poor Excellent 3 9 5 2 1

Categorical Data Example: Marada Inn Poor Below Average Average
Above Average Excellent Rating Frequency 2 3 5 9 1 Total

Categorical Data Example: Marada Inn Relative Frequency Percent
Poor Below Average Average Above Average Excellent Rating Frequency 2 .10 10 15 25 45 5 100 3 .15 5 .25 9 .45 1 .05 Total 1.00

Marada Inn Quality Ratings
Categorical Data Marada Inn Quality Ratings 1 2 3 4 5 6 7 8 9 10 Frequency What is the scale of Marada Inn Quality Rating? Nominal When a variable has nominal scale, the shape of the distribution is meaningless. When a variable has ordinal scale, the shape of the distribution is important. Rating Poor Below Average Average Above Average Excellent

Marada Inn Quality Ratings
Categorical Data Marada Inn Quality Ratings Excellent 5% Poor 10% Below Average 15% Above Average 45% INSIGHTS: One-half of the customers surveyed gave Marada a quality rating of “above average” or “excellent.” This might please the manager. For each customer who gave an “excellent” rating, there were two customers who gave a rating of “poor.” This should displease the manager. Average 25%

Customer 5 purchased 2 items,
Pelican Stores Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Customer 5 purchased 2 items, … which cost her \$54

Pelican Stores data_pelican.xls
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report Using graphs and tables, summarize the qualitative variables. A large majority of the customers use National Clothing’s proprietary credit card. The overwhelming majority of customers are female. Most of the customers are married. data_pelican.xls

Quantitative Data Example: Hudson Auto Repair The manager of Hudson Auto would like to have better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar. 91 78 93 57 75 52 99 80 97 62 71 69 72 89 66 79 76 104 74 68 105 77 65 109 85 88 83 67 82 98 101 73

Quantitative Data sorted minimum maximum 52 57 62 65 66 67 68 69 71 72
Example: Hudson Auto Repair minimum 52 57 62 65 66 67 68 69 71 72 73 74 75 76 77 78 79 80 82 83 85 88 89 91 93 97 98 99 101 104 105 109 maximum

Quantitative Data Cost (\$) Frequency 50-59 60-69 70-79 80-89 90-99
2 13 16 7 7 5 52 57 62 65 66 67 68 69 71 72 73 74 75 76 77 78 79 80 82 83 85 88 89 91 93 97 98 99 101 104 105 109 Frequency distribution Use between 5 and 20 classes. Data sets with a larger number of elements usually require a larger number of classes. Smaller data sets usually require fewer classes. Class Width = ( )/6 ~ 10

Quantitative Data Cost (\$) Frequency Relative Freq Percent Freq 50-59
60-69 70-79 80-89 90-99 2/50 2 .04 4 13/50 13 .26 26 16/50 16 .32 32 7/50 7 .14 14 7/50 7 .14 14 5/50 5 .10 10 1.00 100 50

Quantitative Data Tune-up Parts Cost Frequency Parts Cost (\$) 18 16 14
2 4 6 8 10 12 14 16 18 Frequency Parts Cost (\$)

Quantitative Data Symmetric Moderately Skewed Left Highly Skewed Right

Quantitative Data Ogive for Hudson Auto Repair Parts Cumulative Parts
Cost (\$) Parts Frequency Cumulative Frequency < 60 < 70 < 80 < 90 < 100 < 110 Cost (\$) 50-59 60-69 70-79 80-89 90-99 2 13 16 7 5 2 15 31 oh - jive 38 45 50 50

Quantitative Data Ogive for Hudson Auto Repair Cumulative Cumulative
Relative Frequency Cumulative Percent Frequency < 60 < 70 < 80 < 90 < 100 < 110 Cost (\$) Cumulative Frequency 2 15 31 38 45 50 4 30 62 76 90 100 .04 .30 .62 .76 .90 1.00

Quantitative Data Example: Hudson Auto Repair Tune-up Parts Cost
(\$110, 100%) Tune-up Parts Cost 20 40 60 80 100 (\$100, 90%) (\$90, 76%) Cumulative Percent Frequency (\$50, 0%) (\$80, 62%) (\$70, 30%) Parts Cost (\$) (\$60, 4%)

Pelican Stores -- continued
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report Using graphs and tables, summarize the qualitative variables. Using graphs and tables, summarize the quantitative variables. Over half of the customers purchase 1 or 2 items, but a few make numerous purchases. The percent frequency distribution of net sales shows that 61% of the customers spent \$50 or more. Customers are distributed across all adult age groups. data_pelican.xls

Summarizing Two variables
Price Range Colonial Log Split A-Frame Total < \$99,000 > \$99,000 Home Style 55 45 30 20 35 15 100 Example: Finger Lakes Homes.xls The number of Finger Lakes homes sold for each style and price for the past two years is shown below. qualitative variable Quantitative variable

Summarizing Two variables
Home Style Price Range Colonial Log Split A-Frame Total < \$99,000 > \$99,000 55 45 Total 30 20 35 15 100 Example: Finger Lakes Homes.xls

Summarizing Two variables
Home Style Price Range Colonial Log Split A-Frame Total < \$99,000 > \$99,000 55 45 Total 30 20 35 15 100 Price Range Home Style Colonial Log Split A-Frame Total 0.3273 0.1091 0.3455 0.2182 < \$99,000 > \$99,000 1.0000 0.2667 0.3111 0.3556 0.0667 1.0000

Summarizing Two variables
Home Style Price Range Colonial Log Split A-Frame < \$99,000 > \$99,000 Total 30 20 35 15 Price Range Home Style Colonial Log Split A-Frame 0.6000 0.30 0.5429 0.8000 < \$99,000 > \$99,000 0.4000 0.70 0.4571 0.2000 Total 1.0000 1.0000 1.0000 1.0000

Male acceptance rate is higher when data is aggregated.
Summarizing Two variables The crosstabulation for the aggregated UC-Berkley data is Admitted Denied Total Male 3738 4704 8442 Female 1494 2827 4321 5232 7531 12763 Male acceptance rate is higher when data is aggregated. Dividing all of the frequencies above by the number of observations yields what the joint probability table below Simpson’s Paradox Example: In the Fall of 1973, an observational study on possible gender bias was conducted at UC-Berkeley.  There were 12,763 applicants for graduate admission, of whom 8442 were male and 4321 were female. Of the male applicants, 3738 were admitted, whereas of the female applicants only 1494 were admitted. Admitted Denied Total Male 0.2929 0.3686 0.6614 Female 0.1171 0.2215 0.3386 0.4099 0.5901 1.0000

Summarizing Two variables
Male Admitted Denied Total A 512 313 825 B 207 520 1345 Female Admitted Denied Total A 89 19 108 B 17 8 25 106 27 133 Compute the row percentages to show the Simpson’s Paradox Male Admitted Denied Total A 0.6206 0.3794 1.0000 B 0.6019 0.3981 Female Admitted Denied Total A 0.8241 0.1759 1.0000 B 0.6800 0.3200 At UC-Berkeley graduate admission decisions are made at the department level.  In 1973, UC-Berkeley had 101 different graduate departments. For simplicity, we look at only the 2 largest departments. The tables represent the top two departments (A and B). data_simpson.xls

Summarizing Two variables
A Negative Relationship QBigMacs y 21 2 x PBigMacs 0.50 5.00

Summarizing Two variables
No Apparent Relationship QNoseHairTrimmers y x PBigMacs

Summarizing Two variables
Example: Panthers Football Team The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 1 3 2 14 24 18 17 30

Summarizing Two variables
y 5 10 15 20 25 30 35 Number of Points Scored The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. Higher points scored are associated with a higher number of interceptions The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. x 1 2 3 4 Number of Interceptions

Pelican Stores -- continued
Pelican Stores is chain of women’s apparel stores. It recently ran a promotion in which discount coupons were set to customers of other National Clothing stores. Data collected for a sample of 100 in-store credit card transactions at Pelican Stores during one day while the promotion was running are shown in Table Customers who made a purchase using a discount coupon are referred to as promotional customers and customers who made a purchase but did not use a discount coupon are referred to as regular customers. Because the promotional coupons were not set to regular Pelican Stores customers, management considers the sales made to people presenting the promotional coupons as sales it would not otherwise make. Pelican’s management would like to use this sample data to learn about its customer base and to evaluate the promotion involving discounts. Managerial Report Using graphs and tables, summarize the qualitative variables. Using graphs and tables, summarize the quantitative variables. Using pivot tables and scatter plots, summarize the variables. From the crosstabulation it appears that net sales are larger for promotional customers. Age is not a factor in determining net sales. data_pelican.xls