Stat 31, Section 1, Last Time Course Organization & Website https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html What is Statistics? Data types.

Slides:



Advertisements
Similar presentations
Histograms Bins are the bars Counts are the heights Relative Frequency Histograms have percents on vertical axis.
Advertisements

Statistics for the Social Sciences Psychology 340 Fall 2006 Distributions.
So What Do We Know? Variables can be classified as qualitative/categorical or quantitative. The context of the data we work with is very important. Always.
Stor 155, Section 2, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms –Binwidth is critical Time Plots = Time Series Course.
Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 2 Picturing Variation with Graphs.
Beginning the Visualization of Data
Full Name Phone # Birthday Parents’ Names Mom Cell/Work #
QUANTITATIVE DATA chapter 4 (NUMERICAL).
Chapter 4: Displaying Quantitative Data
Chapter 5: Understanding and Comparing Distributions
Section 2.2 Graphical Displays of Distributions.  Dot Plots  Histograms: uses bars to show quantity of cases within a range of values  Stem-and-leaf.
The Stats Unit.
Histogram A frequency plot that shows the number of times a response or range of responses occurred in a data set.
Describing distributions with numbers
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Objective To understand measures of central tendency and use them to analyze data.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Histograms, Frequency Polygons Ogives
Statistics 3502/6304 Prof. Eric A. Suess Chapter 3.
Chapter Two: Summarizing and Graphing Data 2.2: Frequency Distributions 2.3: ** Histograms **
MATH 2400 – Chapter 1 Vocabulary Individuals – the objects described by a set of data (doesn’t have to be people) Variable – any characteristic of an individual.
Stat 155, Section 2, Last Time Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Transformation.
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
What is Statistics? Statistics is the science of collecting, analyzing, and drawing conclusions from data –Descriptive Statistics Organizing and summarizing.
Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
Last Time Hypothesis Testing –1-sided vs. 2-sided Paradox Big Picture Goals –Hypothesis Testing –Margin of Error –Sample Size Calculations Visualization.
Displaying Categorical Variables Frequency Table 1Section 2.1, Page 24 Variable Categories of the Variable Count of elements from sample in each category.
Section 2.2 Graphical Displays of Distributions.  Dot Plots  Histograms: uses bars to show quantity of cases within a range of values  Stem-and-leaf.
Visual Displays for Quantitative Data
Slide 4-1 Copyright © 2004 Pearson Education, Inc. Dealing With a Lot of Numbers… Summarizing the data will help us when we look at large sets of quantitative.
Bellwork 1. If a distribution is skewed to the right, which of the following is true? a) the mean must be less than the.
Unit 4 Statistical Analysis Data Representations.
Dr. Serhat Eren Other Uses for Bar Charts Bar charts are used to display data for different categories where the data are some kind of quantitative.
GrowingKnowing.com © Frequency distribution Given a 1000 rows of data, most people cannot see any useful information, just rows and rows of data.
Notes 9.6 – Statistics and Data - Graphically. I. Variables A.) Def: characteristics of individuals being identified or measured. 1.) CATEGORICAL – Class.
Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.
MMSI – SATURDAY SESSION with Mr. Flynn. Describing patterns and departures from patterns (20%–30% of exam) Exploratory analysis of data makes use of graphical.
Chapter 3 – Graphical Displays of Univariate Data Math 22 Introductory Statistics.
Statistics – OR 155, Section 2 J. S. Marron, Professor Department of Statistics and Operations Research.
Section 1-1 Day One Types of Data Bar Graphs, Pie Charts Dots Plots, Stem and leaf plots, Histograms.
1 Review Sections 2.1, 2.2, 1.3, 1.4, 1.5, 1.6 in text.
Warm Up Write your height in centimeters on the board. Females on one side, Males on the other. Draw a stem plot for height and then two stem plots of.
Stat 31, Section 1, Last Time Distributions (how are data “spread out”?) Visual Display: Histograms Binwidth is critical Bivariate display: scatterplot.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Chapter 2:.  Come up to board and write the number of different types of social media YOU have used TODAY; write anywhere; no need to organize in any.
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
Exploratory Data Analysis EDA
Statistical Smoothing In 1 Dimension (Numbers as Data Objects)
Chapter 2:.  Come up to board and write the number of different types of social media YOU have used TODAY; write anywhere; no need to organize in any.
ALL ABOUT THAT DATA UNIT 6 DATA. LAST PAGE OF BOOK: MEAN MEDIAN MODE RANGE FOLDABLE Mean.
All About that Data Unit 6 Data.
Chapter 1.1 Displaying Distributions with graphs.
Chapter 2: Methods for Describing Data Sets
Looking at data Visualization tools.
Unit 4 Statistical Analysis Data Representations
All About that Data Unit 6 Data.
AP Statistics CH. 4 Displaying Quantitative Data
Statistical Reasoning
Chapter 1 Data Analysis Section 1.2
Frequency Distributions
NUMERICAL DATA (QUANTITATIVE) CHAPTER 4.
Means & Medians.
CHAPTER 1 Exploring Data
QUANTITATIVE DATA chapter 4 (NUMERICAL).
Organizing, Displaying and Interpreting Data
Types of variables. Types of variables Categorical variables or qualitative identifies basic differentiating characteristics of the population.
Normal Distribution and Standard Deviation
Presentation transcript:

Stat 31, Section 1, Last Time Course Organization & Website What is Statistics? Data types and structure Get going in EXCEL Exploratory Data Analysis Bar Graphs

Stat 31, Student Poll Results As indicated on “Student Info” form: Big changes from the past: More biology More diversity

Stat 31, Student Poll Results “Have you taken an AP Exam?” Only ~10% had & grades generally low So don’t worry if you haven’t…

Major Concept: Distributions “Distribution” = “Patterns of data” = “way data is spread out” e.g. Bar Graph is visual display of categorical “distribution”

Exploratory Data Analysis 3 Visual Display of Quantitative Distributions: 1.Stem and Leaf Plots Not Recommended (Main motivation was pencil and paper statistical analysis, but now have better graphical methods readily accessible) A limited special case of….

Visual Disp: Quantitative Dist’ns 2.Histograms Idea: Apply bar graph idea, By creating categories, Called “class intervals” or “classes” or “bins”

Histograms Idea: put numbers into “bins”, bar heights are counts, or “frequencies”

Class Histogram Example Buffalo, N. Y. (Annual) Snowfall Data Raw Data: 63 years, ranging from ~30 - ~120 (inches)

Buffalo Snowfall Data Buffalo, N. Y. (Annual) Snowfall Data Raw Data: 63 years, ranging from ~30 - ~120 (inches) Histogram Analysis (pre-done):

Buffalo Snowfall Data, I A.EXCEL default (of bin edges) Unround numbers for bin edges Data “centered around 90” Most data between 50 and 130 Assymetric Distribution

Buffalo Snowfall Data, II B.Smaller bins Chosen by me Binwidth = 5, << ~13 from EXCEL default Nicer edge numbers Data centered around 84 (now more precise) Bar graph rougher (fewer points in each bin) Suggests 3 main groups (called “modes”) (can’t see this above: bin width counts)

Buffalo Snowfall Data, III C.Larger bins Chosen by me Binwidth = 30, >> ~13 from EXCEL default Bar graph is “smooth” (since many points in each bin) Only one mode??? Quite symmetric? (different from above: bin width counts)

Buffalo Snowfall Data, IV C.What’s under the hood (how to do this): i.Tools  Data Analysis  Histogram (& Chart Out) (may need Data Analysis “Add-in”) i.Massage pic (especially bar width) ii.Sigma  min, max iii.Bin range: create first two & drag iv.Histogram, using input bin edges

Buffalo Snowfall Data, IV C.What’s under the hood (how to do this): i.Tools  Data Analysis  Histogram (& Chart Out) (may need Data Analysis “Add-in”) i.Massage pic (especially bar width) ii.Sigma  min, max iii.Bin range: create first two & drag iv.Histogram, using input bin edges

Histogram HW HW:1.21 Use Excel and histograms Get data from CDrom Do both: –Excel Default bins –Bins set to: 0,10,20,…,240 Which gives answers closer to answers in back of book? Turn in only one page

Histogram Binwidths Nice Example from the Webster West, U.S.C.: Control Binwidth with slider: Undersmoothing? About right? Oversmoothing? (critical to visual impression)

Histogram Binwidth Example Hidalgo Stamp Data From Mexico in 1800s How many sources of paper? How many modes: 1, 2, 5, 7, 10?

Histogram Binwidth Example How many modes? Caution: Answer depends on binwidth (a serious and current statistical research problem)

Stamps Data Histogram How many modes? 2 nd Caution: Answer also depends on bin location (i.e. “shift” of bins)

Histogram Bins For this course: Try several binwidths, to “get the idea” Weakness of EXCEL (we will see several): This is inconvenient

Comparison of Histograms Class Example: Study Habits Data Idea: Compare Study Habits of Males vs. Females (measured by some “survey score”, perhaps of questionable value?)

Study Habits Data EXCEL default histograms: Populations look similar??? Careful: Binwidth very big… Careful: Different bin ranges… Need smaller binwidths, and common scales

Study Habits Data Better Choice: Binwidths = 10, same bins for both Clear difference, easy to see Females higher “on average” Males are “more spread” 1 “exceptional value”, really true???

Things to look for (in histo’s) 1.Population Center Point (Study Habits Data) 2.Population Spread (Study Habits Data) 3.Shape - Symmetric vs. Skewed Right Skewed: Left Skewed: 1.Modes - Unexpected clusters 2.Outliers - “unusual data points”

Comparison of Histograms HW HW: 1.25b, 1.27, 1.29, 1.22 Work in this order Get data from CDrom Use EXCEL and histograms Odd answers in back You choose the bins (if you miss something in answers, change this) Turn in at most one page for each

Plotting Bivariate Data Toy Example: (1,2) (3,1) (-1,0) (2,-1)

Plotting Bivariate Data Sometimes: Can see more insightful patterns by connecting points

Plotting Bivariate Data Sometimes: Useful to switch off points, and only look at lines/curves

Plotting Bivariate Data Common Name: “Scatterplot” A look under the hood: EXCEL: Chart Wizard (colored bar icon) Chart Type: XY (scatter) Subtype conrols points only, or lines Later steps similar to above (can massage the pic!)