AN OVERVIEW OF STATISTICS. WHAT IS STATISTICS? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob 34 32.724 7.6.552 Andy 36 31.521.

Slides:



Advertisements
Similar presentations
Introduction to the Practice of Statistics
Advertisements

Introduction to Probability and Statistics Fourteenth Edition Introduction Train Your Brain for Statistics.
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Note 3 of 5E Statistics with Economics and Business Applications Chapter 2 Describing Sets of Data Descriptive Statistics - Tables and Graphs.
Chapter 2 Describing Data: Graphical
Chapter 2 Presenting Data in Tables and Charts
1 Chapter 1: Sampling and Descriptive Statistics.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Presenting Data in Tables and Charts
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Chapter 1 & 3.
CHAPTER 2 ORGANIZING AND GRAPHING DATA. Opening Example.
Copyright ©2003 Brooks/Cole A division of Thomson Learning, Inc. Definitions variableA variable is a characteristic that changes or varies over time and/or.
Ch. 2: The Art of Presenting Data Data in raw form are usually not easy to use for decision making. Some type of organization is needed Table and Graph.
Chapter 2 Graphs, Charts, and Tables – Describing Your Data
Chapter 2 Describing Data Sets
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Descriptive statistics (Part I)
Sexual Activity and the Lifespan of Male Fruitflies
Introduction to Statistics
Introduction to Statistical Method
Introduction to Probability and Statistics Thirteenth Edition Chapter 1 Describing Data with Graphs.
1 Pertemuan 02 Penyajian Data dan Distribusi Frekuensi Matakuliah: I0134 – Metode Statistika Tahun: 2007.
Frequency Distributions and Graphs
Welcome to Data Analysis and Interpretation
1 Pertemuan 01 Pendahuluan Matakuliah: I Statistika Tahun: 2008 Versi: Revisi.
Descriptive Statistics: Tabular and Graphical Methods
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Introduction to Probability and Statistics
STATISTIC FOR COMPUTING (CSNB293) MODULE 1 GRAPHICAL DATA REPRESENTATION.
Lecture 2 Graphs, Charts, and Tables Describing Your Data
Basic Business Statistics Chapter 2:Presenting Data in Tables and Charts Assoc. Prof. Dr. Mustafa Yüzükırmızı.
Copyright ©2011 Nelson Education Limited Describing Data with Graphs CHAPTER 1.
Chapter 1 The Role of Statistics. Three Reasons to Study Statistics 1.Being an informed “Information Consumer” Extract information from charts and graphs.
Chap 2-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 2 Describing Data: Graphical Statistics for Business and Economics.
Chapter 2 Describing Data.
1 MATB344 Applied Statistics Chapter 1 Describing Data with Graphs.
ORGANIZING AND GRAPHING DATA
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 2-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
1 MATB344 Applied Statistics Chapter 1 Describing Data with Graphs.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 3 Graphical Methods for Describing Data.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 2-2 Frequency Distributions.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Statistics For Managers 4 th.
Statistical Methods © 2004 Prentice-Hall, Inc. Week 2-1 Week 2 Presenting Data in Tables and Charts Statistical Methods.
Chap 2-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course in Business Statistics 4 th Edition Chapter 2 Graphs, Charts, and Tables.
Displaying Distributions with Graphs. the science of collecting, analyzing, and drawing conclusions from data.
Applied Quantitative Analysis and Practices
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 2-1 Chapter 2 Presenting Data in Tables and Charts Basic Business Statistics 11 th Edition.
Chapter 3: Organizing Data. Raw data is useless to us unless we can meaningfully organize and summarize it (descriptive statistics). Organization techniques.
Copyright 2011 by W. H. Freeman and Company. All rights reserved.1 Introductory Statistics: A Problem-Solving Approach by Stephen Kokoska Chapter 2 Tables.
Chapter 1 Describing Data with Graphs. Variables and Data variable A variable is a characteristic that changes over time and/or for different individuals.
What is Statistics?. Statistics 4 Working with data 4 Collecting, analyzing, drawing conclusions.
Descriptive Statistics: Tabular and Graphical Methods
Chapter 2 Describing Data: Graphical
Variables and Data A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration. Examples:
Basic Definitions Statistics: statistics is the science of data. This involves collecting, classifying, summarizing, organizing, analyzing, and interpreting.
Chapter 2: Methods for Describing Data Sets
CHAPTER 5 Basic Statistics
Chapter 2 Describing Data: Graphical
Introduction to Statistical Method
Chapter 2 Presenting Data in Tables and Charts
Introduction to Probability and Statistics Fourteenth Edition
Chapter 1: Describing Data with Graphs
Introduction to Probability and Statistics Thirteenth Edition
Principles of Statistics
Displaying Data – Charts & Graphs
Presentation transcript:

AN OVERVIEW OF STATISTICS

WHAT IS STATISTICS? What does a statistician do? Player Games Minutes Points Rebounds FG% Bob Andy Larry Michael Player Games Minutes Points Rebounds FG% Bob Andy Larry Michael

JOB OF A STATISTICIAN Collects numbers or data Systematically organizes or arranges the data Analyzes the data…extracts relevant information to provide a complete numerical description Infers general conclusions about the problem using this numerical description

POLITICS Forecasting and predicting winners of elections Where to concentrate campaign appearances, advertising and $$… If the election for president of the United States were held today, who would you be more likely to vote for? Rudy Guiliani 45% Hilary Clinton 43% Someone else 2% Wouldn’t vote 4% Unsure 6% If the election for president of the United States were held today, who would you be more likely to vote for? Rudy Guiliani 45% Hilary Clinton 43% Someone else 2% Wouldn’t vote 4% Unsure 6%

To market product… Interested in the average length of life of a light bulb Cannot test all the bulbs INDUSTRY

USES OF STATISTICS Statistics is a theoretical discipline in its own right Statistics is a tool for researchers in other fields Used to draw general conclusions in a large variety of applications

COMMON PROBLEM Decision or prediction about a large body of measurements which cannot be totally enumerated. Examples Light bulbs (to enumerate population is destructive) Forecasting the winner of an election (population too big; people change their minds) Solutions Collect a smaller set of measurements that will (hopefully) be representative of the larger set.

DATA AND STATISTICS Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions. A population is the collection of all outcomes, responses, measurement, or counts that are of interest. A sample is a subset of a population.

Introduction to Probability and Statistics Thirteenth Edition Chapter 1 Describing Data with Graphs

Introduction to Statistical Terms  Variable o Something that can assume some type of value  Data  consists of information coming from observations, counts, measurements, or responses.  Data Set o A collection of data values  Observation o the value, at a particular period, of a particular variable experimental unit  An experimental unit is the individual or object on which a variable is measured. measurement  A measurement results when a variable is actually measured on an experimental unit. data,samplepopulation.  A set of measurements, called data, can be either a sample or a population.

Example Variable – Time until a light bulb burns out Experimental unit – Light bulb Typical Measurements – 1500 hours, hours, etc.

Populations and Samples A Population is the set of all items or individuals of interest – Examples: All likely voters in the next election All parts produced today All sales receipts for November A Sample is a subset of the population – Examples:1000 voters selected at random for interview A few parts selected for destructive testing Every 100 th receipt selected for audit

population sample inference Sampling Techniques Statistical Procedures Parameters Statistics

Parameters & Statistics A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample characteristic. Parameter Population Statistic Sample

 Univariate data:  Univariate data: One variable is measured on a single experimental unit.  Bivariate data:  Bivariate data: Two variables are measured on a single experimental unit.  Multivariate data:  Multivariate data: More than two variables are measured on a single experimental unit.

 Nominal o for things that are mutually exclusive/non-overlapping o there is no order or ranking o For example: gender (male or female), religion.  Ordinal o can be ordered, but not precisely. o For example : health quality (excellent, good, adequate, bad, terrible)  Interval o involves measurements, but there is no meaningful zero. o For example : temperature.  Ratio o involves measurements, it can be ranked and there are precise differences between the ranks, as well as having a meaningful zero. o For example: height, time, or weight

Qualitative Discrete Continuous Quantitative Types of Variables

Qualitative variablesQualitative variables measure a quality or characteristic on each experimental unit. Examples:Examples: Hair color (black, brown, blonde…) Make of car (Dodge, Honda, Ford…) Gender (male, female) State of birth (California, Arizona,….) Quantitative variablesQuantitative variables measure a numerical quantity on each experimental unit. Discrete Discrete if it can assume only a finite or countable number of values. Continuous Continuous if it can assume the infinitely many values corresponding to the points on a line interval.

Examples For each orange tree in a grove, the number of oranges is measured. –Quantitative discrete For a particular day, the number of cars entering a college campus is measured. –Quantitative discrete Time until a light bulb burns out –Quantitative continuous

Statistical Methods Descriptive StatisticsInferential Statistics Utilizes numerical and graphical methods to look for patterns in the data set. The data can either be a representation of the entire population or a sample

Descriptive Statistics GraphicalNumerical Bar Chart Pie Chart Bar/Pie Chart Line Plot (Time Series) Dotplot Stem-and-Leaf Plot Histogram Ogive Boxplot Qualitative Quantitative Note: Some graphs require a tabular representation (frequency distribution) Qualitative Quantitative Central Tendency Dispersion (Variability) Tables, frequency, percentage, cumulative percentage Cross tabulation

Graphing Qualitative Variables data distributionUse a data distribution to describe: –What values –What values of the variable have been measured –How often –How often each value has occurred “How often” can be measured 3 ways: –Frequency –Relative frequency = Frequency/n –Percent = 100 x Relative frequency Bar Chart Pie Chart

Example A bag of M&Ms contains 25 candies: Raw Data:Raw Data: ColorTally FrequencyRelative Frequency Percent Red33/25 =.1212% Blue66/25 =.2424% Green44/25 =.1616% Orange55/25 =.2020% Brown33/25 =.1212% Yellow44/25 =.1616% m m mm m m m m m m m m m m m m m m m m m m m mmm mm m mm mm mm m mmm mm m m m m m m m m m Statistical Table:

Graphs Bar Chart Pie Chart

Graphing Quantitative Variables Bar/Pie Chart Line Plot (Time Series) Dotplot Stem-and-Leaf Plot Histogram Ogive Boxplot

Graphing Quantitative Variables (1) bar pie chartA single quantitative variable measured for different population segments or for different categories of classification can be graphed using a bar or pie chart. A Big Mac hamburger costs $4.90 in Switzerland, $2.90 in the U.S. and $1.86 in South Africa.

time seriesline bar chartA single quantitative variable measured over time is called a time series. It can be graphed using a line or bar chart. SeptOctNovDecJanFebMar CPI: All Urban Consumers-Seasonally Adjusted Graphing Quantitative Variables (2)

The simplest graph for quantitative data Plots the measurements as points on a horizontal axis, stacking the points that duplicate existing points. Example:Example: The set 4, 5, 5, 7, Graphing Quantitative Variables (3) -Dotplot

Stem and Leaf Plots (4) A simple graph for quantitative data Uses the actual numerical values of each data point. –Divide each measurement into two parts: the stem and the leaf. –List the stems in a column, with a vertical line to their right. –For each measurement, record the leaf portion in the same row as its matching stem. –Order the leaves from lowest to highest in each stem. –Provide a key to your coding. –Divide each measurement into two parts: the stem and the leaf. –List the stems in a column, with a vertical line to their right. –For each measurement, record the leaf portion in the same row as its matching stem. –Order the leaves from lowest to highest in each stem. –Provide a key to your coding.

Example : Stem-and-Leaf Plot The prices ($) of 18 brands of walking shoes:

Relative Frequency Histograms (5) relative frequency histogramA relative frequency histogram for a quantitative data set is a bar graph in which the height of the bar shows “how often” (measured as a proportion or relative frequency) measurements fall in a particular class or subinterval. 5-12subintervalsDivide the range of the data into 5-12 subintervals of equal length. approximate widthCalculate the approximate width of the subinterval as Range/number of subintervals. Round the approximate width up to a convenient value. left inclusionUse the method of left inclusion, including the left endpoint, but not the right in your tally. statistical tableCreate a statistical table including the subintervals, their frequencies and relative frequencies.

relative frequency histogramDraw the relative frequency histogram, plotting the subintervals on the horizontal axis and the relative frequencies on the vertical axis. The height of the bar represents proportion –The proportion of measurements falling in that class or subinterval. probability –The probability that a single measurement, drawn at random from the set, will belong to that class or subinterval. Relative Frequency Histograms (5) : cont’d

Example 1 The ages of 50 tenured faculty at a state university We choose to use 6 intervals. Minimum class width = (70 – 26)/6 = 7.33 Convenient class width = 8 Use 6 classes of length 8, starting at 25. Range

AgeTallyFrequencyRelative Frequency Percent 25 to < /50 =.1010% 33 to < /50 =.2828% 41 to < /50 =.2626% 49 to < /50 =.1818% 57 to < /50 =.1414% 65 to < /50 =.044%

ClassClass Boundaries Midpoint Frequency Relative Frequency Percent 25 to < – /50 =.1010% 34 to < – /50 =.3232% 43 to < – /50 =.2828% 52 to < – /50 =.2020% 61 to < – /50 =.088% 70 to < – /50 =.022%

Shape? Outliers? What proportion of the tenured faculty are younger than 42.5? What is the probability that a randomly selected faculty member is 52 or older? Skewed right No. (16 + 5)/50 = 31/50 =.62=62% ( )/50 = 15/50 =.34 Describing the Distribution

How Many Class Intervals? Many (Narrow class intervals) may yield a very jagged distribution with gaps from empty classes Can give a poor indication of how frequency varies across classes Few (Wide class intervals) may compress variation too much and yield a blocky distribution can obscure important patterns of variation. (X axis labels are upper class endpoints)

Example 2 Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Find range: = 46 Select number of classes: 5 (usually between 5 and 12) Compute class interval (width): 10 (46/5 then round up) Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 Compute class midpoints: 15, 25, 35, 45, 55 Count observations & assign to classes Example 2: Solution (Frequency Distribution)

Class 10 ≤ X < ≤ X < ≤ X < ≤ X < ≤ X < Total Relative Frequency Percentage Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 (continued) Example 2: Solution (Frequency Distribution) Frequency

Class Midpoints Histogram: Example 2 (No gaps between bars) Class 10 ≤ X < ≤ X < ≤ X < ≤ X < ≤ X < Frequency Class Midpoint

Ogive (6) An ogive is a curve drawn for the cumulative frequency distribution by joining with straight lines the dots marked above the upper boundaries of classes at heights equal to the cumulative frequencies of respective classes. Two type of ogive: (i) ogive less than (ii)ogive greater than First, build a table of cumulative frequency.

Cumulative Frequency Class 10 ≤ X < ≤ X < ≤ X < ≤ X < ≤ X < Total Percentage Cumulative Percentage Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Frequency Cumulative Frequency

Graphing Cumulative Frequencies: The Ogive Class Boundaries (Not Midpoints) Class < ≤ X < ≤ X < ≤ X < ≤ X < ≤ X < Cumulative Percentage Lower class boundary

Interpreting Graphs: Location and Spread Where is the data centered on the horizontal axis, and how does it spread out from the center?

Interpreting Graphs: Shapes Mound shaped and symmetric (mirror images) Skewed right: a few unusually large measurements Skewed left: a few unusually small measurements Bimodal: two local peaks

Are there any strange or unusual measurements that stand out in the data set? Outlier No Outliers Interpreting Graphs: Outliers

A quality control process measures the diameter of a gear being made by a machine (cm). The technician records 15 diameters, but inadvertently makes a typing mistake on the second entry Example