Chapter 1: Looking at Data - Distributions:

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Statistics 100 Lecture Set 6. Re-cap Last day, looked at a variety of plots For categorical variables, most useful plots were bar charts and pie charts.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Copyright © Cengage Learning. All rights reserved. 1 Overview and Descriptive Statistics /are-your-conversion-test-results-accurate-enough.
Chapter 1 Introduction Individual: objects described by a set of data (people, animals, or things) Variable: Characteristic of an individual. It can take.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Statistics Lecture 2. Last class began Chapter 1 (Section 1.1) Introduced main types of data: Quantitative and Qualitative (or Categorical) Discussed.
1.2: Describing Distributions
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
Understanding and Comparing Distributions
CHAPTER 1: Picturing Distributions with Graphs
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
AP Statistics Chapters 0 & 1 Review. Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of.
Programming in R Describing Univariate and Multivariate data.
Describing distributions with numbers
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Chapter 1 Exploring Data
Let’s Review for… AP Statistics!!! Chapter 1 Review Frank Cerros Xinlei Du Claire Dubois Ryan Hoshi.
Chapter 1 – Exploring Data YMS Displaying Distributions with Graphs xii-7.
1.1 Displaying Distributions with Graphs
Chapter 1: Exploring Data AP Stats, Questionnaire “Please take a few minutes to answer the following questions. I am collecting data for my.
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
Chapter 1: Exploring Data
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Chapter 2 Describing Data.
1 Chapter 3 Looking at Data: Distributions Introduction 3.1 Displaying Distributions with Graphs Chapter Three Looking At Data: Distributions.
BPS - 5th Ed. Chapter 11 Picturing Distributions with Graphs.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Essential Statistics Chapter 11 Picturing Distributions with Graphs.
Organizing Data AP Stats Chapter 1. Organizing Data Categorical Categorical Dotplot (also used for quantitative) Dotplot (also used for quantitative)
CHAPTER 1 Picturing Distributions with Graphs BPS - 5TH ED. CHAPTER 1 1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 1 Exploring Data 1.2 Displaying Quantitative.
+ Chapter 1: Exploring Data Section 1.2 Displaying Quantitative Data with Graphs The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
1 Never let time idle away aimlessly.. 2 Chapters 1, 2: Turning Data into Information Types of data Displaying distributions Describing distributions.
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 2 Tables.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
MATH 2311 Section 1.5. Graphs and Describing Distributions Lets start with an example: Height measurements for a group of people were taken. The results.
Chapter 0: Why Study Statistics? Chapter 1: An Introduction to Statistics and Statistical Inference 1
Class Two Before Class Two Chapter 8: 34, 36, 38, 44, 46 Chapter 9: 28, 48 Chapter 10: 32, 36 Read Chapters 1 & 2 For Class Three: Chapter 1: 24, 30, 32,
1 By maintaining a good heart at every moment, every day is a good day. If we always have good thoughts, then any time, any thing or any location is auspicious.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Unit 1 - Graphs and Distributions. Statistics 4 the science of collecting, analyzing, and drawing conclusions from data.
The rise of statistics Statistics is the science of collecting, organizing and interpreting data. The goal of statistics is to gain understanding from.
Chapter 1.1 Displaying Distributions with graphs.
ISE 261 PROBABILISTIC SYSTEMS
Chapter 1: Exploring Data
Warm Up.
Statistical Reasoning
Laugh, and the world laughs with you. Weep and you weep alone
CHAPTER 1: Picturing Distributions with Graphs
CHAPTER 1: Picturing Distributions with Graphs
DAY 3 Sections 1.2 and 1.3.
Describing Distributions of Data
Organizing Data AP Stats Chapter 1.
Daniela Stan, PhD School of CTI, DePaul University
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
CHAPTER 1: Picturing Distributions with Graphs
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Chapter 1: Looking at Data - Distributions: http://anengineersaspect.blogspot.com/2013_05_01_archive.html

What is Statistics? http://vadlo.com/cartoons.php?id=71

What is Statistics Statistics is the science of learning from data. Components Collection Organization Analysis Interpretation

Applications of Statistics Computer Science client-server performance image processing Chemistry/Physics determining outliers in your data linear regression propagation of error dealing with large populations and approximations Engineering is one process/technique better than another one? Business Making good decisions Everyday life Medical information Average cell phone usage of Purdue students

Branches of Statistics Collection of data Descriptive Statistics Inferential Statistics

1.1: Data: Goals Give examples of cases in a data set. Identify the variables in a data set. Demonstrate how a label can be used as a variable in a data set. Identify the values of a variable. Classify variables as categorical or quantitative. Information on Histograms: Slides: 15 – 19, Book: pp. 15 – 20.

Basic Definitions Cases objects that are described by the data Label special variable used to separate the cases Variable characteristic of a case

Types of Variables Number univariate bivariate multivariate Type Categorical Quantitative Distribution of a variable The possible values and how often that it takes these variables

To better understand a data set, ask: Who? What cases do the data describe? How many cases? What? How many variables? What is the exact definition of each variable? What is the unit of measurement for each variable? Why? What is the purpose of the data? What questions are being asked? Are the variables suitable?

1.2: Displaying Distributions with Graphs: Goals Analyze the distribution of categorical variable: Bar Graphs Pie Charts Analyze the distribution of quantitative variable: Histogram Time plots Identify the shape, center, and spread Identify and describe any outliers

Categorical Variables - Display The distribution of a categorical variable lists the categories and gives the count or percent or frequency of individuals who fall into each category. Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories. Bar graphs represent categories as bars whose heights show the category counts or percents.

Categorical Variables – Display (STAT 311)

Quantitative Variable: Stemplot Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable. Procedure Separate each observation into a stem (first part of the number) and a leaf (the remaining part of the number). Write the stems in a vertical column; draw a vertical line to the right of the stems. Write each leaf in the row to the right of its stem; order leaves if desired. Put in the units. X

Stemplot: Example The actual percentages are: 77 83 66 91 75 76 78 57 95 90 86 99 63 71 73 52 68 X

Quantitative Variable: Histograms Histograms show the distribution of a quantitative variable by using bars. The height of a bar represents the number of individuals whose values fall within the corresponding class. Procedure - discrete Calculate the frequency and/or relative frequency of each x value. Mark the possible x values on the x-axis. Above each value, draw a rectangle whose height is the frequency (or relative frequency) of that value.

Histogram - Discrete 100 married couples between 30 and 40 years of age are studied to see how many children each couple have. The table below is the frequency table of this data set. Kids # of Couples Rel. Freq 11 0.11 1 22 0.22 2 24 0.24 3 30 0.30 4 5 0.01 6 0.00 7 100 1.00

Quantitative Variable: Histograms - continuous Procedure - continuous Divide the x-axis into a number of class intervals or classes such that each observation falls into exactly one interval. Calculate the frequency or relative frequency for each interval. Above each value, draw a rectangle whose height is the frequency (or relative frequency) of that value.

Visual Display: Continuous Histogram Power companies need information about customer usage to obtain accurate forecasts of demand. Investigators from Wisconsin Power and Light determined the energy consumption (BTUs) during a particular period for a sample of 90 gas-heated homes. An adjusted consumption value was calculated via The data is listed under furnace.txt under extra files on the computer web page.

Example (cont) 63 classes Bin = 0.5 32 classes Bin = 0.25 Bin = 1

Examining Distributions In any graph of data, look for the overall pattern and for striking deviations from that pattern. You can describe the overall pattern by its shape, center, and spread. An important kind of deviation is an outlier, an individual that falls outside the overall pattern.

Shapes of Histograms - Number Symmetric unimodal bimodal multimodal http://www.particleandfibretoxicology.com/content/6/1/6/figure/F1?highres=y

Shapes of Histograms (cont) Symmetric Positively skewed Negatively skewed

Shapes of Histograms (cont)

Outliers http://ewencp.org/blog/url-reshorteners/

Time Plots A time plot shows behavior over time. Time is always on the x-axis; the other variable is on the y-axis Look for a trend and deviations from the trend. Connecting the data points by lines may emphasize this trend. Look for patterns that repeat at known regular intervals.

Example: Time Plots We are interested in the temperature (oF) of effluent at a sewage treatment plant. Plot a histogram of the data. Plot a time plot of the data. 47 54 53 50 46 51 52

Example: Time Plots (cont)

1.3: Describing Distributions with Numbers: Goals Describe the center of a distribution by: mean median Compare the mean and median Describe the measure of spread: quartiles standard deviation Describe a distribution by a boxplot (five-number summary and outliers) Be able to determine which summary statistics are appropriate for a given situation Be able to determine the effects of a linear transformation on the above summary statistics.

Sample Mean 𝑥 = 𝑠𝑢𝑚 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑛 = 1 𝑛 𝑥 𝑖

Sample Mean: Example The following data give the time in months from hire to promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm. a) What is the mean time for this sample? b) Suppose that instead of x20 = 69, we had chosen another engineer that took 483 months to be promoted. what is the mean time for this new sample? 5 7 12 14 18 22 21 25 23 24 34 37 49 64 47 67 69

Sample Median, M or x̃ Procedure Sort n observations from smallest to largest If n is odd, x̃ is the center If n is even, x̃ is the average of the two center observations

Sample Median: Example The following data give the time in months from hire to promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm. a) What is the median time for this sample? b) Suppose that instead of x20 = 69, we had chosen another engineer that took 483 months to be promoted. what is the median time for this new sample? 5 7 12 14 18 21 22 23 24 25 34 37 47 49 64 67 69

Mean and Median Left skew Right skew Mean Median Mean Mean Median

Variability of Data Set 1 -15 -10 -5 5 10 15 Set 2 -1 1 Set 3 -3 -2 2 5 10 15 Set 2 -1 1 Set 3 -3 -2 2 3

Quartiles Q1 Q2 Q3

Quartiles Procedure Sort the values from lowest to highest and locate the median. The first Quartile, Q1 is the median of the lower half. The third quartile, Q3 is the median of the upper half.

Quartiles: Example The following data give the time in months from hire to promotion to manager for a random sample of 19 software engineers from all software engineers employed by a large telecommunications firm. Find the median and the quartiles. What is the Interquartile Range? Are there any outliers in this data set? 7 12 14 18 21 22 23 24 25 34 37 47 49 64 100 150

Boxplots Procedure Draw and label a number line that includes the range of the distribution. Draw a central box from Q1 to Q3. Draw a line for the median. Extend lines (whiskers) from the box to the minimum and maximum values that are not outliers. Put in dots (* or some symbol) for the outliers

Boxplot: Example

Side-by-side Boxplot: Example

Sample Standard Deviation 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒= 𝑠 𝑥 2 = 1 𝑛−1 ( 𝑥 𝑖 − 𝑥 ) 2 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛= 𝑠 𝑥 = 1 𝑛−1 ( 𝑥 𝑖 − 𝑥 ) 2

Properties of Standard Deviation s measures spread about the mean so only use this measure when you are using the mean to measure the center. s = 0 means that all of the observations are the same, normally s > 0 s is not resistant to outliers s has the same units of measurement as the original observations

Sample Standard Deviation: Example The following data give the time in months from hire to promotion to manager for a random sample of 20 software engineers from all software engineers employed by a large telecommunications firm. What is the standard deviation time for this sample? 5 7 12 14 18 22 21 25 23 24 34 37 49 64 47 67 69

Choosing Measures of Center and Spread Choices Mean and standard deviation Median and IQR ALWAYS PLOT YOUR DATA! http://freshspectrum.com/wp-content/uploads/2012/09/ Hans-Rosling-Bubble-Plot-Cartoon.jpg

Change of Measurement Linear transformation: xnew = a + bx Effects No change to shape Adding a: adds a to measures of center; doesn’t effect measures of spread Multiplying by b: multiplies both measures of center and measures of spread (s, IQR) by b.

1.4: Density Curves and Normal Distributions: Goals Be able to state the definition and practical importance of a density curve. State the physical means of the measurements of center and spread for density distributions. Normal distributions Be able to sketch the normal distribution. Be able to state the importance of the 68 – 96 – 99.7 rule Be able to standardize a value Be able to use the Z-table Be able to calculate percentages Be able to calculate percentiles (Inverse calculations) Be able to determine if a distribution is normal (normal quantile plots)

Exploring Quantitative Data Always plot your data. Look for the overall pattern. Calculate a numeric summary. Sometimes, the overall pattern is regular so that we can describe it by a specific methodology.

Density Curve (a) (b) (c)

Properties of Density Curve y = f(x) y = f(x)

Density Curves – Median and Mean The median of a density curve is the equal – areas point. 𝑝=0.5= −∞ 𝑦=𝑚𝑒𝑑𝑖𝑎𝑛 𝑓 𝑥 𝑑𝑥 The mean of a density curve is the balance point. If the distribution is symmetric, the median and mean are the same and are the center of the curve.

Mean http://isc.temple.edu/economics/notes/descprob/descprob.htm

Sample vs. Population Terms for samples (actual observations) Mean: x̄, median: x̃, standard deviation, s Terms for populations (density curves) Mean: , median: ̃, standard deviation, 

Normal Distribution A visual comparison of normal and paranormal Lower caption says 'Paranormal Distribution' - no idea why the graphical artifact is occurring. http://stats.stackexchange.com/questions/423/what-is-your-favorite-data-analysis-cartoon

Normal Distribution 𝑓 𝑥 = 1 𝜎 2𝜋 𝑒 − (𝑥−𝜇) 2 2 𝜎 2 where -∞ <  < ∞, σ > 0 X ~ N(,σ)

Shapes of Normal Density Curve http://resources.esri.com/help/9.3/arcgisdesktop/com/gp_toolref /process_simulations_sensitivity_analysis_and_error_analysis_modeling /distributions_for_assigning_random_values.htm

68-95-99.7 Rule Empirical Rule

Standard Normal or z curve 𝑓 𝑧 = 1 2𝜋 𝑒 − 𝑧 3 2

Cumulative z curve area

Z-table

Using the Z table area right of z = 1  area left of z area between z1 and z2 = area left of z1 – area left of z2

Procedure for Normal Distribution Problems Sketch the situation and shade the area to be found. Standardize X to state the problem in terms of Z. Use Table A to find the area to the left of z. Calculate the final answer. Write your conclusion in the context of the problem.

Normal Distribution: Example A particular rash has shown up in an elementary school. It has been determined that the length of time that the rash will last is normally distributed with mean 6 days and standard deviation 1.5 days. What is the percentage of students that have the rash for longer than 8 days? What is the percentage of students that the rash will last between 3.7 and 8 days?

Percentiles

Normal Distribution: Example A particular rash has shown up in an elementary school. It has been determined that the length of time that the rash will last is normally distributed with mean 6 days and standard deviation 1.5 days. How long would the student’s rash have to have lasted to be in the top 10% of the number of days that the students have the rash?

Symmetrically Located Areas

Normal Distribution: Example A particular rash has shown up in an elementary school. It has been determined that the length of time that the rash will last is normally distributed with mean 6 days and standard deviation 1.5 days. What interval symmetrically placed about the mean will capture 95% of the times for the student’s rashes to have lasted.

Procedure: Normal Quantile Plot Arrange the data from smallest to largest. Record the corresponding percentiles (quantiles). Find the z value corresponding to the quantile calculated in part 2. Plot the original data points (from 1) vs. the z values (from 3).