Example 8.7 Cluster Analysis. 8.18.1 | 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | 8.88.28.38.48.58.68.8 CLUSTERS.XLS n This file contains demographic data on 49 of.

Slides:



Advertisements
Similar presentations
Dynamic Workforce Planning Models
Advertisements

Introduction to LP Modeling
Static Workforce Scheduling Models
Example 2.2 Estimating the Relationship between Price and Demand.
STATISTICS.
Wyndor Example; Enter data Organize the data for the model on the spreadsheet. Type in the coefficients of the constraints and the objective function.
Slides 2c: Using Spreadsheets for Modeling - Excel Concepts (Updated 1/19/2005) There are several reasons for the popularity of spreadsheets: –Data are.
Computer Programming (TKK-2144) 13/14 Semester 1 Instructor: Rama Oktavian Office Hr.: T.12-14, Th
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
Example 5.6 Non-logistics Network Models | 5.2 | 5.3 | 5.4 | 5.5 | 5.7 | 5.8 | 5.9 | 5.10 | 5.10a a Background Information.
Example 6.1 Capital Budgeting Models | 6.3 | 6.4 | 6.5 | 6.6 | Background Information n The Tatham Company is considering seven.
Example 14.3 Football Production at the Pigskin Company
Example 6.4 Plant and Warehouse Location Models | 6.2 | 6.3 | 6.5 | 6.6 | Background Information n Huntco produces tomato sauce.
Linear Programming Excel Solver. MAX8X 1 + 5X 2 s.t.2X 1 + 1X 2 ≤ 1000 (Plastic) 3X 1 + 4X 2 ≤ 2400 (Prod. Time) X 1 + X 2 ≤ 700 (Total Prod.) X 1 - X.
Example 6.2 Fixed-Cost Models | 6.3 | 6.4 | 6.5 | 6.6 | Background Information n The Great Threads Company is capable of manufacturing.
INTEGRALS Areas and Distances INTEGRALS In this section, we will learn that: We get the same special type of limit in trying to find the area under.
Example 5.3 More General Logistics Models | 5.2 | 5.4 | 5.5 | 5.6 | 5.7 | 5.8 | 5.9 | 5.10 | 5.10a a Background Information.
Example 5.5 Non-logistics Network Models | 5.2 | 5.3 | 5.4 | 5.6 | 5.7 | 5.8 | 5.9 | 5.10 | 5.10a a Background Information.
Example 9.1 Goal Programming | 9.3 | Background Information n The Leon Burnit Ad Agency is trying to determine a TV advertising schedule.
Copyright © Cengage Learning. All rights reserved. 5 Integrals.
Nonlinear Pricing Models
Example 4.4 Blending Models.
Example 14.1 Introduction to LP Modeling. 14.1a14.1a | 14.2 | Linear Programming n Linear programming (LP) is a method of spreadsheet optimization.
Linear Programming The Industrial Revolution resulted in (eventually) -- large companies, large problems How to optimize the utilization of scarce resources?
Transportation Models
Example 15.3 Supplying Power at Midwest Electric Logistics Model.
Example 15.4 Distributing Tomato Products at the RedBrand Company
Example 12.6 A Financial Planning Model | 12.2 | 12.3 | 12.4 | 12.5 | 12.7 |12.8 | 12.9 | | | | | | |
SECTION 3.2 MEASURES OF SPREAD Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Integer Programming Models
Example 4.5 Production Process Models | 4.2 | 4.3 | 4.4 | 4.6 | Background Information n Repco produces three drugs, A, B and.
Example 15.1 Daily Scheduling of Postal Employees Workforce Scheduling Models.
Example 5.8 Non-logistics Network Models | 5.2 | 5.3 | 5.4 | 5.5 | 5.6 | 5.7 | 5.9 | 5.10 | 5.10a a Background Information.
1-7 Data and Spread Big Idea Two measures of the spread of a data set are range and mean absolute deviation. Range, the difference between the maximum.
Chapter 7 Transportation, Assignment & Transshipment Problems
Integrals  In Chapter 2, we used the tangent and velocity problems to introduce the derivative—the central idea in differential calculus.  In much the.
Copyright © Cengage Learning. All rights reserved. 4 Integrals.
Example 2.3 An Ordering Decision with Quantity Discounts.
Statistics: Mean of Absolute Deviation
Transportation and Assignment Problems
Example 15.7 Capital Budgeting at the Tatham Company Integer Programming Models.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 2.1.
FIRST DRAFT Test Buddy Feedback: ‘This is a good spread sheet, it clear to see the income and outgoings for each month and the year, as well as the totals.
Numerical Statistics Given a set of data (numbers and a context) we are interested in how to describe the entire set without listing all the elements.
Measures of Center vs Measures of Spread
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
 Deviation: The distance that two points are separated from each other.  Deviation from the mean: How far the data point is from the mean. To find this.
Example 5.10 Project Scheduling Models | 5.2 | 5.3 | 5.4 | 5.5 | 5.6 | 5.7 | 5.8 | 5.9 | 5.10a a Background Information.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Introduction to Spreadsheets and Microsoft Excel FSE 200.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
The expected value The value of a variable one would “expect” to get. It is also called the (mathematical) expectation, or the mean.
CHAPTER 3 – Numerical Techniques for Describing Data 3.1 Measures of Central Tendency 3.2 Measures of Variability.
Copyright © Cengage Learning. All rights reserved. 4 Integrals.
V- Look Up. Lookup Tables Often, we need to retrieve data that is stored in a table For example, consider these metals and their properties: Metal Modulus.
Describing Distributions of Quantitative Data
Analyzing One-Variable Data
A Multiperiod Production Problem
Excel Solver IE 469 Spring 2017.
Measures of Central Tendency & Center of Spread
Excel Solver IE 469 Spring 2018.
Wyndor Example; Enter data
Step 1: Arrange all data from least to greatest to make it easier to calculate central tendencies. When arranged from least to greatest, you find the.
Lesson 1: Summarizing and Interpreting Data
Excel Solver IE 469 Fall 2018.
V- Look Up.
Intro to Excel CSCI-150.
Excel Solver IE 469 Spring 2019.
Frequency Distributions
Presentation transcript:

Example 8.7 Cluster Analysis

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | CLUSTERS.XLS n This file contains demographic data on 49 of the largest cities in the United States. n Some of the data appears in the shaded region of the figure on the next slide. n For example, Atlanta is 67% Black, 2% Hispanic, and 1% Asian. It has a median age of 31, a 5% unemployment rate, and a per capita income of $22,000. n We would like to group these 49 cities into four clusters of cities that are demographically similar.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 |

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | CLUSTERS.XLSCLUSTERS.XLS -- continued n The basic idea is to choose a city to “anchor” or “center” each cluster. n We then assign each city to the “nearest” cluster center, where “nearest” is defined in terms of the six demographic variables. n The objective is then to minimize the sum of the squared distances from each city to its cluster anchor.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Solution n The first problem is that if we use raw units, percentage black and Hispanic will drive everything because these values are more spread out than the other demographic attributes. n We can see this by calculating means and standard deviations of the characteristics with the AVERAGE and STDEV functions. n The figure on the next slide shows these calculations.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 |

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Solution -- continued n To remedy this problem we “standardize” each demographic attribute by subtracting the attribute’s mean and dividing the difference by the attribute’s standard deviation. n For example, the average city has % blacks with a standard deviation of 18.11%. n Thus on a standardized basis, Atlanta is larger by ( )/(18.11 = standard deviations on the percentage black attribute than a typical city.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Solution -- continued n By working with standardized values for each attribute, we ensure that the analysis will be unit-free. To create the standardized values shown in the table, enter the formula =(C15-AVERAGE(C$15:C$63))/STDEV(C$15:C$63) in cell I15 and copy it across to column N and down to row 63.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Developing the Model n Now that we have standardized values for all of the attributes, we can develop the spreadsheet model as follows. n The model is shown in two parts on the next two slides.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 |

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 |

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Developing the Model -- continued n The model can be created by following these steps. –Lookup table. One key to the model is to have an index (1 to 49) for the cities so that we can refer to them by index and then look up their characteristics with a VLOOKUP function. Therefore, name the range A15:N63 as Ltable. –Decision variables. The only changing cells appear in the Centers range of the figure. They are the indexes of the four cities chosen as cluster centers. Enter any four integers from 1 to 49 in these cells.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Developing the Model -- continued –Corresponding cities and standardized attributes. We find the names and standardized attributes of the cluster centers with VLOOKUP functions. First, enter the function VLOOKUP(B6,Ltable,2) in cell A6 and copy it to the range A6:A9. Then enter the formula =VLOOKUP($B6,Ltable,C$4) in C6 and copy it to the range C6:H9. Note for example, that the standardized PctBlack is the 9 th column of the lookup table. This explains the “column offset” entries in row 4. –Squared distances to centers. The next step is to see how “far” each city is from each of the cluster centers. Let z i be standardized attribute i for a typical city, and let c i be standardized attribute i for a typical cluster center.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Developing the Model -- continued –We measure the distance from this city to this cluster center with the usual “Euclidean” distance formula where the sum is over all six attributes. We can work just as well with squared distances appear in columns P through S of the last figure. For example, the value in cell P15 is the squared distance from Albuquerque to the first cluster center (Los Angeles), the value in Q15 is the squared distance from Albuquerque to the second equivalent ways. Probably the quickest way is to enter the formula =SUMPRODUCT(I15:N15-$C6:$H$6,I15-$C$6:$H$6) in cell P15 and copy it down column P.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Developing the Model -- continued –This rather novel use of the SUMPRODUCT function sums the products of the differences with the differences – that is, it sums the squares of the differences, exactly what we want. Then enter similar formulas in columns Q, R, and S. For example, the formula in column Q refers to row 7 instead of row 6 in the absolute references. –Assignments to cluster centers. Each city will be assigned to the cluster center that has the smallest squared distance. Therefore, find the minimum squared distances in column T by entering the formula =MIN(P15:S15) in cell T15 and copying it down. Then identify the cluster index (1 through 4) and city name of the cluster center that yields the minimum. We can use the MATCH function to obtain the cluster index. Enter the formula =MATCH(T15,P15:S15,0) in cell U15 and copy it down.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Developing the Model -- continued –For example, the 4.47 minimum squared distance for Albuquerque corresponds to the second squared distance, so Albuquerque is assigned to the second cluster center. Finally, to get the name of the second cluster center, we can use the INDEX function. Enter the formula =INDEX(CenterNames,U15,1) in cell V15 and copy it down. –Sum of squared distances. The objective is to minimize the sum of squared distances from all cities to the cluster centers to which they are assigned. Calculate this objective in the SumSqDists cell with the formula =SUM(MinSqDists).

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Using the Evolutionary Solver n The Solver dialog box should be set up as shown here.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Using the Evolutionary Solver -- continued n Because the changing cells represent indexes of cluster centers, they must be integer-constrained, and suitable lower and upper limits are 1 and 49. n Make sure you set the Evolutionary Solver options as we described in Example 8.1. This problem is considerably harder to solve, and we want to allow the Solver plenty of time to search through a lot of potential solutions.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Solution n The solution shown, which uses Los Angeles, Omaha, Memphis, and San Francisco, is the best we found. n You might find a slightly different solution, depending on your Solver settings and how long you let Solver run, but you should obtain a similar value in the target cell. n If you look closely at the cities assigned to each cluster center, this solution begins to make intuitive sense as seen in the figure on the next slide.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 |

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Solution -- continued n The San Francisco cluster consists of rich, older, highly Asian cities. The Memphis cluster consist of highly black cities with high unemployment rates. The Omaha cluster consists of average income cities with few minorities. The Los Angeles cluster consists of highly Hispanic cities with high unemployment rates. n Why four clusters? We could easily try three clusters or five clusters. Note that when we add a cluster, the sum of squared distances will certainly decrease.

| 8.2 | 8.3 | 8.4 | 8.5 | 8.6 | Solution -- continued n In fact, we could obtain an objective value of 0 by using 49 clusters, one for each city, but this would hardly provide much information! n Therefore, to choose the “optimal” number of clusters, we would stop adding clusters when the sum of squared distances failed to decrease by a substantial amount.