Exploring, Displaying, and Examining Data

Slides:



Advertisements
Similar presentations
Introduction to Stats Honors Analysis. Data Analysis Individuals: Objects described by a set of data. (Ex: People, animals, things) Variable: Any characteristic.
Advertisements

McGraw-Hill/Irwin Business Research Methods, 10e Dosen: Prof. Dr. Ir. Ujang Sumarwan, MSc Chapter 16 Exploring, Displaying, and Examining Data.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
B a c kn e x t h o m e Frequency Distributions frequency distribution A frequency distribution is a table used to organize data. The left column (called.
8/27/09Lecture 2-11 STOR 155 Introductory Statistics Lecture 2: Displaying Distributions with Graphs The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL.
Exploring, Displaying, and Examining Data
Chapter 2 Presenting Data in Tables and Charts
Ch. 2: The Art of Presenting Data Data in raw form are usually not easy to use for decision making. Some type of organization is needed Table and Graph.
Chapter Sixteen EXPLORING, DISPLAYING, AND EXAMINING DATA
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Exploring, Displaying, and Examining Data
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
A.P. Statistics: Semester 1 Review
Is it what it is. Depending on the data type, we can use different types of display. When dealing with categorical (nominal) data we often use a.
CHAPTER 1: Picturing Distributions with Graphs
STA220: Practice of Statistics 1 Section L0301: Health & Life Sciences September 17,
Examining Univariate Distributions Chapter 2 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.
©The McGraw-Hill Companies, Inc., 2001Irwin/McGraw-Hill Donald Cooper Pamela Schindler Chapter 16 Business Research Methods.
Chapter 16 Exploring, Displaying, and Examining Data McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All Rights Reserved.
Chapter 1 Exploring Data
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
© Copyright McGraw-Hill CHAPTER 2 Frequency Distributions and Graphs.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
1 1 Slide © 2006 Thomson/South-Western Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part B n Exploratory Data Analysis n Crosstabulations.
Methods for Describing Sets of Data
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 2 Descriptive Statistics: Tabular and Graphical Methods.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
AP Stats Chapter 1 Review. Q1: The midpoint of the data MeanMedianMode.
Midterm Review! Unit I (Chapter 1-6) – Exploring Data Unit II (Chapters 7-10) - Regression Unit III (Chapters 11-13) - Experiments Unit IV (Chapters 14-17)
Chapter 2 Describing Data.
1 Treat everyone with sincerity, they will certainly appear loveable and friendly.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
BIA 2610 – Statistical Methods Chapter 2 – Descriptive Statistics: Tabular and Graphical Displays.
Graphs, Charts and Tables Describing Your Data. Frequency Distributions.
Data Once the data starts to flow, our attention turns to data analysis –Data preparation – includes editing, coding and data entry –Exploring, displaying.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
Section 2.2 Bar Graphs, Circle Graphs, and Time-Series Graphs 2.2 / 1.
+ EXERCISE 2D TWO-WAY FREQUENCY TABLES AND SEGMENTED BAR CHARTS.
Summarizing Data. Statistics statistics probability probability vs. statistics sampling inference.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
McGraw-Hill/Irwin Business Research Methods, 10eCopyright © 2008 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 16 Exploring, Displaying,
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Engineering Statistics KANCHALA SUDTACHAT. Statistics  Deals with  Collection  Presentation  Analysis and use of data to make decision  Solve problems.
© Copyright McGraw-Hill CHAPTER 2 Frequency Distributions and Graphs.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 Take a challenge with time; never let time idles away aimlessly.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
AP Statistics. Chapter 1 Think – Where are you going, and why? Show – Calculate and display. Tell – What have you learned? Without this step, you’re never.
Exploring, Displaying, and Examining Data
Part Four ANALYSIS AND PRESENTATION OF DATA
Exploring, Displaying, and Examining Data
Displaying Distributions with Graphs
Exploring, Displaying, and Examining Data
Treat everyone with sincerity,
Essentials of Statistics for Business and Economics (8e)
Treat everyone with sincerity,
Ten things about Descriptive Statistics
Presentation transcript:

Exploring, Displaying, and Examining Data Chapter 17 Exploring, Displaying, and Examining Data This chapter presents the use of charts to present data and the initial exploration of data using tools like cross-tabulation.

Learning Objectives Understand . . . exploratory data analysis techniques provide insights and data diagnostics by emphasizing visual representations of the data how cross-tabulation is used to examine relationships involving categorical variables, serves as a framework for later statistical testing, and makes an efficient tool for data visualization and later decision-making

Exploratory Data Analysis This Booth Research Services ad suggests that the researcher’s role is to make sense of data displays Great data exploration and analysis delivers insight from data

Data Analysis Exploratory Confirmatory In exploratory data analysis, the researcher has the flexibility to respond to the patterns revealed in the preliminary analysis of the data. Patterns in the collected data guide the data analysis or suggest revisions to the preliminary data analysis plan. This flexibility is an important attribute of this approach. When the researcher is attempting to show causation, confirmatory data analysis is required. Confirmatory data analysis is an analytical process guided by classical statistical inference in its use of significance and confidence.

Exhibit 17-1 Data Exploration, Examination, and Analysis in the Research Process Exhibit 17-1 reminds one of the importance of data visualization as an integral element in the data analysis process and as a necessary step prior to hypothesis testing.

Exhibit 17-2 Frequency of Ad Recall Value Label Value Frequency Percent Valid Cumulative Percent Percent A frequency table is a simple device for arraying data. It arrays category codes from lowest value to highest value, with columns for count (frequency), percent, valid percent (percent when missing data is extracted), and cumulative percent. Ad recall, a nominal variable, describes the ads research participants remembered seeing or hearing without being prompted by the researcher or the measurement instrument. Although there are 100 observations, the small number of media placements makes the variable easily tabled. The same data are presented using a pie chart on the next slides.

Exhibit 17-3 Pie Chart This portion of Exhibit 17-3 illustrates the observations of ad recall in the form of a pie chart. Data may be more readily understood when presented graphically.

Exhibit 17-3 Bar Chart In this slide, the same data are presented in the form of a bar chart.

Exhibit 17-4 Frequency Table When the variable of interest is measured on an interval-ratio scale and is one with many potential values, these techniques are not particularly informative. Exhibit 17-4, shown in the slide, is a condensed frequency table of the average annual purchases of PrimeSell’s top 50 customers. Only two values, 59.9 and 66, have a frequency greater than 1. Thus, the primary contribution of this table is an ordered list of values. If the table were converted to a bar chart, it would have 48 bars of equal length and two bars with two occurrences.

Exhibit 17-5 Histogram The histogram is the conventional solution for the display of interval-ratio data. Histograms are used when it is possible to group the variable’s values into intervals. A histogram is a graphical bar chart that groups continuous data values into equal intervals, with one bar for each interval. Data analysts find histograms useful for 1) displaying all intervals in a distribution, even those without observed values, and 2) examining the shape of the distribution for skewness, kurtosis, and the modal pattern. The values for the average annual purchases variable presented in Exhibit 17-4 were measured on a ratio scale and are easily grouped. Histograms are not useful for nominal variables like ad recall that has no order to its categories.

Exhibit 17-6 Stem-and-Leaf Display 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 455666788889 12466799 02235678 02268 24 018 3 1 06 36 The stem-and-leaf display is a technique that is closely related to the histogram. It shares some of the histogram’s features but offers several unique advantages. In contrast to histograms, which lose information by grouping data values into intervals, the stem-and-leaf presents actual data values that can be inspected directly, without the use of enclosed bar or asterisks as the representation medium. Visualization is the second advantage of stem-and-leaf displays. The range of values is apparent at a glance, and both shape and spread impressions are immediate. Patterns in the data are easily observed. Each line or row in the display is referred to as a stem, and each piece of information on the stem is called a leaf. In the first stem, there are 12 items (leaves) in the data set whose first digit is 5. 455666788889 representing 54,55,55,56,56,56,57,58,58,58,58,59 The second line shows that there are eight average annual purchase values whose first digit is six. 12366799 representing 61,62,63,66,66,67,69,69

Exhibit 17-7 Pareto Diagram Pareto diagrams represent frequency data as a bar chart, ordered from most to least, overlayed with a line graph denoting the cumulative percentage at each variable level. The percentages sum to 100 percent. The data are derived from a multiple-choice-single-response scale, a multiple-choice-multiple-response scale, or frequency counts of words or themes from content analysis. Exhibit 17-7, shown in the slide, depicts an analysis of MindWriter customer complaints as a Pareto diagram

Exhibit 17-8 Boxplot Components The boxplot, or box-and-whisker plot, is another technique used frequently in exploratory data analysis. A boxplot reduces the detail of the stem-and-leaf display and provides a different visual image of the distribution’s location, spread, shape, tail length, and outliers. Boxplots are extensions of the five-number summary of a distribution. This summary consists of the median, the upper and lower quartiles, and the largest and smallest observations. The median and quartiles are used because they are particularly resistant statistics. Resistance is a characteristic that provides insensitivity to localized misbehavior in data. The mean and standard deviation are considered nonresistant statistics, because they are susceptible to the effects of extreme values in the tails of the distribution and do not represent typical values well under conditions of asymmetry. Boxplots may be constructed easily by hand or by computer programs. The ingredients of the plot are The rectangular plot that encompasses 50% of the data values, A center line--marking the median and going through the width of the box, The edges of the box, called hinges, and The whiskers that extend from the right and left hinges to the largest and smallest values. These values may be found within 1.5 times the interquartile range (IQR) from either edge of the box.

Exhibit 17-9 Diagnostics with Boxplots Exhibit 17-9 summarizes several comparisons that are of help to the analyst. Boxplots are an excellent diagnostic tool, especially when graphed on the same scale. The upper two plots in the exhibit are both symmetric, but one is larger than the other. Larger box widths are sometimes used when the second variable, from the same measurement scale, comes from a larger sample size. The box widths should be proportional to the square root of the sample size, but not all plotting programs account for this. Right- and left-skewed distributions and those with reduced spread are also presented clearly in the plot comparison. Groups may be compared by means of multiple plots.

Exhibit 17-10 Boxplot Comparison In Exhibit 17-10, multiple boxplots compare five sectors of PrimeSell’s customers by their average annual purchases data. The overall impression is one of potential problems for the analyst: unequal variances, skewness, and extreme outliers. Note the similarities of the profiles of finance and retailing in contrast to the high-tech and insurance sectors.

Mapping With mapping, colors and patterns denoting knowledge, attitude, behavior, or demographic data arrays are superimposed over street maps, block-group maps, or county, state, or country maps to help identify the best locations for stores based on demographic, psychographic, and life-stage segmentation data. The PCensus ad points out that determining whether a site has the potential to attract sufficient members of a market and offers facilitating infrastructure and appropriate traffic patterns can be facilitated by mapping.

Digital Camera Map This map, developed by American Demographics and Claritas illustrates the penetration of digital cameras by geographic location.

Exhibit 17-11 SPSS Cross-Tabulation Cross-tabulation is a technique for comparing data from two or more categorical variables. It is used with demographic variables and the study’s target variables. The technique uses tables having rows and columns that correspond to the levels or code values of each variable’s categories. Exhibit 17-11 is an example of a computer-generated cross-tabulation. This table has two rows for gender and two columns for assignment selection. The combination produces four cells. Depending on what you request for each cell, it can contain a count of the cases of the joint classification and also the row, column, and/or the total percentages. The number of row cells and column cells is often used to designate the size of the table, as in this 2 x 2 table. Row and column totals, called marginals, appear at the bottom and right “margins” of the table. When tables are constructed for statistical testing, we call them contingency tables and the test determines if the classification variables are independent of each other. This is discussed in Chapter 20.

Exhibit 17-12 Percentages in Cross-Tabulation Percentages serve two purposes in data presentation. They simplify the data by reducing all numbers to a range from 0 to 100. They also translate the data into standard form with a base of 100 for relative comparisons. One can see in Exhibit 17-12 that the percentage of females selected for overseas assignments rose from 15.8 to 22.5 percent of their respective samples. Among all overseas selectees, in the first study, 21.4% were women, while in the second study, 37.5% were women. The tables verify an increase in women with overseas assignments, but we cannot conclude that their gender had anything to do with the increase.

Guidelines for Using Percentages Averaging percentages Use of too large percentages Using too small a base Percentages are used by virtually everyone dealing with numbers, but these guidelines will help to prevent errors in reporting. Percentiles cannot be averaged unless each is weighted by the size of the group from which it is derived. In other words, a simple average is inappropriate but a weighted average may be used. A large percentage is difficult to understand. For instance, if a 1,000 percent increase is experienced, it is better to describe the increase as a 10-fold increase. Percentages hide the base from which they have been computed. A figure of 60% when contrasted with 30% seems sizable, but there may be only 3 cases in one category and 6 in another. The final guideline shouldn’t happen but does. The higher figure should always be used as the denominator or base. For instance, if a price is reduced from $1 to $.25, the decrease is 75% (75/100). Percentage decreases can never exceed 100%

Exhibit 17-13 Cross-Tabulation with Control and Nested Variables A control variable is a variable introduced to help interpret the relationship between variables. Statistical packages like SPSS have the option of constructing n-way tables with the provision of multiple control variables. Exhibit 17-13 presents an example in which all three variables are handled under the same banner.

Exhibit 17-14 AID Example An advanced variation on n-way tables is automatic interaction detection (AID). AID is a computerized statistical process that requires that the researcher identify a dependent variable and a set of predictors or independent variables. The computer then searches among up to 300 variables for the best single division of the data according to each predictor variable, chooses one, and splits the sample using a statistical test to verify the appropriateness of this choice. Exhibit 17-4 shows the tree diagram that resulted from an AID study of customer satisfaction with MindWriter’s CompleteCare repair service. The initial dependent variable is the overall impression of the repair service. The variable was measured on an interval scale of 1 to 5. The variables that contribute to perceptions of repair effectiveness were also measured on the same scale but were rescaled to ordinal data for this example. The top box shows that 62% of the respondents rated the repair service as excellent. The best predictor of repair effectiveness s “resolution of the problem.”

Key Terms Automatic interaction detection (AID) Boxplot Cell Confirmatory data analysis Contingency table Control variable Cross-tabulation Exploratory data analysis (EDA) Five-number summary Frequency table Histogram Interquartile range (IQR) Marginals Nonresistant statistics Outliers Pareto diagram Resistant statistics Stem-and-leaf display