Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Slides:



Advertisements
Similar presentations
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Advertisements

Biostatistics Unit 3 Graphs 1. Grouped data Data can be grouped into a set of non- overlapping, contiguous intervals called class intervals (Excel calls.
Chapter 2 Minitab for Data Analysis KANCHALA SUDTACHAT.
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
San Jose State University Engineering 101 JKA & KY.
Agresti/Franklin Statistics, 1 of 63  Section 2.4 How Can We Describe the Spread of Quantitative Data?
Descriptive Statistics Summarizing data using graphs.
Histograms & Comparing Graphs
Section 2.2 Graphical Displays of Distributions.  Dot Plots  Histograms: uses bars to show quantity of cases within a range of values  Stem-and-leaf.
Chapter 3 – Data Visualization
Types of Graph And when to use them!.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Data Tutorial Tutorial on Types of Graphs Used for Data Analysis, Along with How to Enter Them in MS Excel Carryn Bellomo University of Nevada, Las Vegas.
Sometimes, Tables can be confusing
11 Aug Infographics Overview on What is important Traditional graphs Charts for specific usages Common misleading factors Read also the two books.
Advantages and Disadvantages
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Variable  An item of data  Examples: –gender –test scores –weight  Value varies from one observation to another.
Quantitative Skills 1: Graphing
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
Are You Smarter Than a 5 th Grader?. 1,000,000 5th Grade Topic 15th Grade Topic 24th Grade Topic 34th Grade Topic 43rd Grade Topic 53rd Grade Topic 62nd.
Representing Data Sets MCC9–12.S.ID.1. Data can be represented graphically using a__________. Graphs provide a visual representation of data; just by.
1.1 EXPLORING STATISTICAL QUESTIONS Unit 1 Data Displays and Number Systems.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Chapter 3 Data Visualization 1. Introduction Data visualization involves: Creating a summary table for the data. Generating charts to help interpret,
Descriptive Statistics Summarizing data using graphs.
Unit 4 Statistical Analysis Data Representations.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
The hypothesis that most people already think is true. Ex. Eating a good breakfast before a test will help you focus Notation  NULL HYPOTHESIS HoHo.
Descriptive statistics Petter Mostad Goal: Reduce data amount, keep ”information” Two uses: Data exploration: What you do for yourself when.
Graphing Tutorial William Hornick CS 101. Overview You will be given a brief description, example, and “how to create” for each of the following: You.
11.3 Using Data Displays Two types of data: Categorical- data that consists of names, labels, or other non- numerical values such as animals or colors.
Represent sets of data using different visual displays.
Data Analysis. I. Mean, Median, Mode A. central tendency is a value that describes a A. central tendency is a value that describes a data set. data set.
Types of Graphs.
MDM4U Displaying Data Visually Learning goal:Classify data by type Create appropriate graphs.
Interpreting Data from Tables and Graphs. What are Tables and Graphs Tables and graphs are visual representations. They are used to organize information.
Bivariate Data Parallel Boxplots. Parallel boxplots are made up of several regular boxplots. They are drawn up on a common scale so they can be easily.
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Chapter 4 –Dimension Reduction Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 9 Scatter Plots and Data Analysis LESSON 1 SCATTER PLOTS AND ASSOCIATION.
CLIMATE GRAPHS. Temp In °C Precipitation In mm OTTAWA LABELS! CITY AT TOP TEMPERATURE ON LEFT IN °C PRECIPITATION ON RIGHT MONTHS ACROSS THE BOTTOM.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
Techniques for Decision-Making: Data Visualization Sam Affolter.
Chapter 5: Organizing and Displaying Data. Learning Objectives Demonstrate techniques for showing data in graphical presentation formats Choose the best.
By Christy Quattrone Click to View Types of Graphs Data Analysis, Grade 5.
(Unit 6) Formulas and Definitions:. Association. A connection between data values.
Using Appropriate Displays. The rabbit’s max speed is easier to find on the ________. The number of animals with 15 mph max speed or less was easier to.
Statistics Review  Mode: the number that occurs most frequently in the data set (could have more than 1)  Median : the value when the data set is listed.
Data Coaching Services Chart Interpretation 1. o Bar o Stacked Bar o Pie o Line o Scatter plot 2.
Write a statistical question you could ask your classmates about the movies they like.
Central Tendency  Key Learnings: Statistics is a branch of mathematics that involves collecting, organizing, interpreting, and making predictions from.
Descriptive Statistics
Exploring Data: Summary Statistics and Visualizations
Using Excel to Construct Basic Tools for Understanding Variation
Descriptive Statistics
Unit 4 Statistical Analysis Data Representations
Charts and Graphs V
How could data be used in an EPQ?
Collecting & Displaying Data
Types of Graphs… and when to use them!.
pencil, red pen, highlighter, GP notebook, graphing calculator
Program This course will be dived into 3 parts: Part 1 Descriptive statistics and introduction to continuous outcome variables Part 2 Continuous outcome.
Honors Statistics Review Chapters 4 - 5
Range, Width, min-max Values and Graphs
Bell Ringer When completing an experiment, you are testing your hypothesis. What are the three kinds of variables that you need to identify in your experiments?
Presentation transcript:

Chapter 3 – Data Visualization © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce

Graphs for Data Exploration Basic Plots Line Graphs Bar Charts Scatterplots Distribution Plots Boxplots Histograms

Line Graph for Time Series

Bar Chart for Categorical Variable 95% of tracts do not border Charles River Excel can confuse: y-axis is actually “% of records that have a value for CATMEDV” (i.e., “% of all records”)

Scatterplot Displays relationship between two numerical variables

Distribution Plots Display “how many” of each value occur in a data set Or, for continuous data or data with many possible values, “how many” values are in each of a series of ranges or “bins”

Histograms Histogram shows the distribution of the outcome variable (median house value) Boston Housing example:

Boxplots Boston Housing Example: Display distribution of outcome variable (MEDV) for neighborhoods on Charles river (1) and not on Charles river (0) Side-by-side boxplots are useful for comparing subgroups

Box Plot Top outliers defined as those above Q3+1.5(Q3-Q1). “max” = maximum of non-outliers Analogous definitions for bottom outliers and for “min” Details may differ across software Median Quartile 1 “max” “min” outliers mean Quartile 3

Heat Maps Color conveys information In data mining, used to visualize Correlations Missing Data

Heatmap to highlight correlations (Boston Housing) In Excel (using conditional formatting) In Spotfire

Multidimensional Visualization

Scatterplot with color added Boston Housing NOX vs. LSTAT Red = low median value Blue = high median value

Matrix Plot Shows scatterplots for variable pairs Example: scatterplots for 3 Boston Housing variables

Rescaling to log scale (on right) “uncrowds” the data

Aggregation

Amtrak Ridership – Monthly Data

Aggregation – Monthly Average

Aggregation – Yearly Average

Scatter Plot with Labels (Utilities)

Scaling: Smaller markers, jittering, color contrast (Universal Bank; red = accept loan)

Jittering Moving markers by a small random amount Uncrowds the data by allowing more markers to be seen

Without jittering (for comparison)

Parallel Coordinate Plot (Boston Housing) Filter Settings - CAT. MEDV: (1) CATMEDV =1 CATMEDV =0

Linked plots (same record is highlighted in each plot)

Network Graph – eBay Auctions (sellers on left, buyers on right) Circle size = # of transactions for the node Line width =# of auctions for the buyer- seller pair Arrows point from buyer to seller

Treemap – eBay Auctions (Hierarchical eBay data: Category> sub-category> Brand) Rectangle size = average closing price (=item value) Color = % sellers with negative feedback (darker=more)

Map Chart (Comparing countries’ well-being with GDP) Darker = higher value