Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Lesson Describing Distributions with Numbers parts from Mr. Molesky’s Statmonkey website.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 9/6/12 Describing Data: One Variable SECTIONS 2.1, 2.2, 2.3, 2.4 One categorical.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Statistics Intro Univariate Analysis Central Tendency Dispersion.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Introduction to Statistics Chapter 3 Using Statistics to summarize.
Describing Distributions Numerically
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
Univariate Data Chapters 1-6. UNIVARIATE DATA Categorical Data Percentages Frequency Distribution, Contingency Table, Relative Frequency Bar Charts (Always.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
AP Stats Chapter 1 Review. Q1: The midpoint of the data MeanMedianMode.
STAT 280: Elementary Applied Statistics Describing Data Using Numerical Measures.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Chapter 2 Describing Data.
Lecture 3 Describing Data Using Numerical Measures.
Skewness & Kurtosis: Reference
Categorical vs. Quantitative…
INVESTIGATION 1.
Chap 3-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 3 Describing Data Using Numerical.
1 Chapter 4: Describing Distributions 4.1Graphs: good and bad 4.2Displaying distributions with graphs 4.3Describing distributions with numbers.
Displaying Quantitative Data Graphically and Describing It Numerically AP Statistics Chapters 4 & 5.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Describing Data: One Quantitative Variable SECTIONS 2.2, 2.3 One quantitative.
UNIT #1 CHAPTERS BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT STAUB.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
LIS 570 Summarising and presenting data - Univariate analysis.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
Univariate EDA. Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with.
Plan for Today: Chapter 11: Displaying Distributions with Graphs Chapter 12: Describing Distributions with Numbers.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
CCGPS Advanced Algebra Day 1 UNIT QUESTION: How do we use data to draw conclusions about populations? Standard: MCC9-12.S.ID.1-3, 5-9, SP.5 Today’s Question:
Quantitative Univariate EDASlide #1 Univariate EDA Purpose – describe the distribution –Distribution is concerned with what values a variable takes and.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Descriptive Statistics(Summary and Variability measures)
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Statistics Review  Mode: the number that occurs most frequently in the data set (could have more than 1)  Median : the value when the data set is listed.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Interpreting Categorical and Quantitative Data. Center, Shape, Spread, and unusual occurrences When describing graphs of data, we use central tendencies.
Descriptive Statistics
Exploratory Data Analysis
MATH-138 Elementary Statistics
EXPLORATORY DATA ANALYSIS and DESCRIPTIVE STATISTICS
Chapter 4 Review December 19, 2011.
Objective: Given a data set, compute measures of center and spread.
1st Semester Final Review Day 1: Exploratory Data Analysis
Description of Data (Summary and Variability measures)
Laugh, and the world laughs with you. Weep and you weep alone
Jeopardy Final Jeopardy Chapter 1 Chapter 2 Chapter 3 Chapter 4
Unit 7: Statistics Key Terms
Describing Distributions Numerically
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
Basic Practice of Statistics - 3rd Edition
Basic Practice of Statistics - 3rd Edition
Honors Statistics Review Chapters 4 - 5
Probability and Statistics
Presentation transcript:

Univariate EDA

Quantitative Univariate EDASlide #2 Exploratory Data Analysis Univariate EDA – Describe the distribution –Distribution is concerned with what values a variable takes and how often it takes each value Univariate EDA (for quantitative data) –Graphically –Numerically –Model

What is this graph called? How many lake trout were in the mm bin? What is the most common range of lengths? Which range of lengths has the fewest lake trout? How many lake trout were exactly 108 mm? Quantitative Univariate EDASlide #3

Quantitative Univariate EDA What four things are described? Quantitative Univariate EDASlide #4 Shape Outliers Center Dispersion

Quantitative Univariate EDASlide #5 Shape – what are these three shapes? –Symmetric –Left-skewed –Right-skewed Quantitative Univariate EDA

Slide #6 Outliers – what is an outlier? –Individual(s) that is/are distinctly separate* from the main cluster of individuals Quantitative Univariate EDA *at least one or two bars removed *only one or two individuals *on the margins of the distribution

Quantitative Univariate EDASlide #7 Center – what are the two measures of center? –Mean (arithmetic average) –Median (value in the middle of ordered data) Quantitative Univariate EDA  = population mean  x = sample mean  = sample median

Compute the  x and M of values (faculty salaries) below with and without the red value. 38, 46, 42, 44, 44, 43, 45, 45, 46, 44, 139 Examine meanMedian() graphic Quantitative Univariate EDASlide #8

Adequacy of Mean? 18, 19, 20, 21, 22   x = 20 5, 15, 20, 25, 35   x = 20 Does the mean adequately relate all pertinent information for these samples? If not, what is missing? Quantitative Univariate EDASlide #9

Quantitative Univariate EDASlide #10 Dispersion -- variability among individuals What are the three measures of dispersion? –Range (minimum, maximum) –Inter-Quartile Range (IQR; Q1, Q3) –Standard Deviation (average difference from mean) Quantitative Univariate EDA  = population standard deviation s = sample standard deviation

Quantitative Univariate EDASlide #11 Standard Deviation 1) Find the sample mean 2) Find each difference from the mean 3) Square each difference 4) Sum squared differences 5) Divide by n-1 6) Square root Calculation Steps

Compute s from the values below (use table 3.4 in the book as a model). 5, 8, 9, 11, 12 Compute the IQR of values (faculty salaries) below with and without the red value. 38, 46, 42, 44, 44, 43, 45, 45, 46, 44, 139 Quantitative Univariate EDASlide #12

Quantitative Univariate EDA in R Examine Handout – hist() – Summarize() Quantitative Univariate EDASlide #13

Quantitative Univariate EDASlide #14 Overall Numerical Summaries If outliers exist then use the Median and IQR If outliers do not exist, but distribution is strongly skewed then use the Median and IQR If outliers do not exist and the distribution is symmetric or only slightly skewed then use the Mean and standard deviation

What four items are described in a univariate EDA for quantitative data? Describe a univariate EDA for the data in Figure 1 and Table 1. Quantitative Univariate EDASlide #15

Describe a univariate EDA for the data in Figure 2 and Table 2. Quantitative Univariate EDASlide #16

Describe a univariate EDA for the data in Figure 3. Quantitative Univariate EDASlide #17 Figure 3. Histogram of 1996 tuition for 30 public and 50 private colleges and universities.

Quantitative Univariate EDASlide #18 Figure 4. Boxplot of 1996 tuition for 30 public and 50 private colleges and universities. The distribution of tuition for private schools is left-skewed with no obvious outliers, centered on a median of 25430, with an IQR from to (Figure 4; Table 3). The distribution of tuition for public schools is right-skewed with one outlier at a tuition of 23460, centered on a median of 13590, with an IQR from to (Figure 4; Table 3). I chose to use the median and IQR as measures of center and dispersion because of the outlier and the skewness of the distributions. Statistic Public Private Mean Std. Dev Min st Qu Median rd Qu Max Table 3. Summary statistics of 1996 tuition for 30 public and 50 private colleges and universities.

Categorical Univariate EDASlide #19 Quantitative vs. Categorical Do NOT describe shape, center, dispersion, or outliers with CATEGORICAL data. Identify the most outstanding characteristics.

Categorical Univariate EDASlide #20 Numerical Summaries Red Blonde Brunette Blonde Red Blonde Red Hair ColorFreq Blonde Brunette Red Frequency Table Hair ColorPerc Blonde Brunette Red Percentages Table

Categorical Univariate EDASlide #21 Graphical Summaries Bar chart –Bars over category label –Height is frequency of individuals in that category Hair ColorFreq Blonde4 Brunette1 Red3

Categorical Univariate EDASlide #22 Bar chart Pie chart –Circle with pieces proportional to category frequencies Graphical Summaries Hair ColorFreq Blonde4 Brunette1 Red3

no, No, NO!!! Categorical Univariate EDASlide #23

no, No, NO!!! Categorical Univariate EDASlide #24

no, No, NO!!! Categorical Univariate EDASlide #25

no, No, NO!!! Categorical Univariate EDASlide #26

Categorical Univariate EDASlide #27 Overall Summary Identify most outstanding characteristic(s) Most student were blondes and very few were brunettes. Hair ColorFreq Blonde4 Brunette1 Red3

Describe a univariate EDA for the data in Figure 4. Quantitative Univariate EDASlide #28 Figure 4. Bar chart of the number of KNOWN species by organism type.

Describe a univariate EDA for the data in Figure 5. Quantitative Univariate EDASlide #29 Figure 5. Bar chart of the types of organizations that received funding by the Invasive Alien Species Partnership Program (Canada),

Categorical Univariate EDA in R Examine Handout – xtabs() – percTable() – barplot() Quantitative Univariate EDASlide #30