4 Purpose The star plot is a method of displaying multivariate data. Each star represents a single observation. Typically, star plots are generated in a multi-plot format with many stars on each page and each star representing one observation. Star plots are used to examine the relative values for a single data point (e.g., point 3 is large for variables 2 and 4, small for variables 1, 3, 5, and 6) and to locate similar points or dissimilar points
5 Sample Plot The plot below contains the star plots of 16 cars. The variable list for the sample star plot is: 1. Price 2. Mileage (MPG) 3. 1978 Repair Record (1 = Worst, 5 = Best) 4. 1977 Repair Record (1 = Worst, 5 = Best) 5. Headroom 6. Rear Seat Room 7. Trunk Space 8. Weight 9. Length
7 We can look at these plots individually or we can use them to identify clusters of cars with similar features. We can look at the star plot of the Cadillac Seville : it is one of the most expensive cars, gets below average (but not among the worst) gas mileage, has an average repair record, and has average-to-above-average roominess and size. We can then compare the Cadillac models (the last three plots) with the AMC models (the first three plots). The AMC models tend to be inexpensive, have below average gas mileage, and are small in both height and weight and in roominess. The Cadillac models are expensive, have poor gas mileage, and are large in both size and roominess.
8 Questions The star plot can be used to answer the following questions: What variables are dominant for a given observation? Which observations are most similar, i.e., are there clusters of observations? Are there outliers?
9 Weakness in Technique Star plots are helpful for small-to-moderate-sized multivariate data sets. Their primary weakness is that their effectiveness is limited to data sets with less than a few hundred points. After that, they tend to be overwhelming.
21 Box-and-Whisker Plot (3) There is a useful variation of the box plot that more specifically identifies outliers. To create this variation: Calculate the median and the lower and upper quartiles.medianlower and upper quartiles Plot a symbol at the median and draw a box between the lower and upper quartiles. Calculate the interquartile range (the difference between the upper and lower quartile) and call it IQ. Calculate the following points: L1 = lower quartile - 1.5*IQ L2 = lower quartile - 3.0*IQ U1 = upper quartile + 1.5*IQ U2 = upper quartile + 3.0*IQ The line from the lower quartile to the minimum is now drawn from the lower quartile to the smallest point that is greater than L1. Likewise, the line from the upper quartile to the maximum is now drawn to the largest point smaller than U1. Points between L1 and L2 or between U1 and U2 are drawn as small circles. Points less than L2 or greater than U2 are drawn as large circles. Questions The box plot can provide answers to the following questions: Is a factor significant? Does the location differ between subgroups? Does the variation differ between subgroups? Are there any outliers? Importance: Check the significance of a factor The box plot is an important EDA tool for determining if a factor has a significant effect on the response with respect to either location or variation. The box plot is also an effective tool for summarizing large quantities of information.