A Review of Probability and Statistics

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Inferential Statistics
IB Math Studies – Topic 6 Statistics.
QUANTITATIVE DATA ANALYSIS
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Some Basic Concepts Schaum's Outline of Elements of Statistics I: Descriptive Statistics & Probability Chuck Tappert and Allen Stix School of Computer.
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Simulation Modeling and Analysis
The Simple Regression Model
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
Chapter 19 Data Analysis Overview
Descriptive statistics (Part I)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Chapter 2 Simple Comparative Experiments
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Inferences About Process Quality
Data Analysis Statistics. Inferential statistics.
Educational Research by John W. Creswell. Copyright © 2002 by Pearson Education. All rights reserved. Slide 1 Chapter 8 Analyzing and Interpreting Quantitative.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Quantitative Skills: Data Analysis and Graphing.
Regression Analysis (2)
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
(a.k.a: The statistical bare minimum I should take along from STAT 101)
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
PTP 560 Research Methods Week 8 Thomas Ruediger, PT.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
What is Business Statistics? What Is Statistics? Collection of DataCollection of Data –Survey –Interviews Summarization and Presentation of DataSummarization.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Chapter 2 Describing Data.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
1 STAT 500 – Statistics for Managers STAT 500 Statistics for Managers.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Research Seminars in IT in Education (MIT6003) Quantitative Educational Research Design 2 Dr Jacky Pow.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Math 4030 Midterm Exam Review. General Info: Wed. Oct. 26, Lecture Hours & Rooms Duration: 80 min. Close-book 1 page formula sheet (both sides can be.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Chapter Eight: Using Statistics to Answer Questions.
Chap 18-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 18-1 Chapter 18 A Roadmap for Analyzing Data Basic Business Statistics.
CHAPTER 1 Basic Statistics Statistics in Engineering
STATISTICS AND OPTIMIZATION Dr. Asawer A. Alwasiti.
Chapter 6: Analyzing and Interpreting Quantitative Data
Math 4030 Final Exam Review. Probability (Continuous) Definition of pdf (axioms, finding k) Cdf and probability (integration) Mean and variance (short-cut.
Larson/Farber Ch 2 1 Elementary Statistics Larson Farber 2 Descriptive Statistics.
Handbook for Health Care Research, Second Edition Chapter 10 © 2010 Jones and Bartlett Publishers, LLC CHAPTER 10 Basic Statistical Concepts.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Chapter 18 Data Analysis Overview Yandell – Econ 216 Chap 18-1.
COMPLETE BUSINESS STATISTICS
CHAPTER 5 Basic Statistics
إحص 122: ”إحصاء تطبيقي“ “Applied Statistics” شعبة 17130
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Basic Statistical Terms
When You See (This), You Think (That)
Welcome!.
Review for Exam 1 Ch 1-5 Ch 1-3 Descriptive Statistics
Chapter Nine: Using Statistics to Answer Questions
Introductory Statistics
Presentation transcript:

A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling distributions Estimation and confidence intervals Test of Hypothesis For mean, variances, and proportions Goodness of fit

Key Concepts Finite Infinite Population -- "parameters" Sample -- "statistics" Random samples - Your MOST important decision!

Data Deterministic vs. Probabilistic (Stochastic) Discrete or Continuous: Whether a variable is continuous (measured) or discrete (counted) is a property of the data, not of the measuring device: weight is a continuous variable, even if your scale can only measure values to the pound. Data description: Category frequency Category relative frequency

Nominal -- I E = 1 ; EE = 2 ; CE = 3 Data Types Qualitative (Categorical) Nominal -- I E = 1 ; EE = 2 ; CE = 3 Ordinal -- poor = 1 ; fair = 2 ; good = 3 ; excellent = 4 Quantitative (Numerical) Interval -- temperature, viscosity Ratio -- weight, height The type of statistics you can calculate depends on the data type. Average, median, and variance make no sense if the data is categorical (proportions do).

Data Presentation for Qualitative Data Rules: Each observation MUST fall in one and only one category. All observations must be accounted for. Table -- Provides greater detail Bar graphs -- Consider Pareto presentation! Pie charts (do not need to be round)

Data Presentation for Quantitative Data Consider a Stem-and-Leaf Display Use 5 to 20 classes (intervals, groups). Cell width, boundaries, limits, and midpoint Histograms Discrete Continuous (frequency polygon - plot at class mark) Cumulative frequency distribution (Ogive - plot at upper boundary)

Statistics Measures of Central Tendency Measures of Variation Arithmetic Mean Median Mode Weighted mean Measures of Variation Range Variance Standard Deviation Coefficient of Variation The Empirical Rule

Arithmetic Mean and Variance -- Raw Data

Arithmetic Mean and Variance -- Grouped Data

Percentiles and Box-Plots 100pth percentile: value such that 100p% of the area under the relative frequency distribution lies below it. Q1: lower quartile (25% percentile) Q3: upper quartile (75% percentile) Box-Plots: limited by lower and upper quartiles Whiskers mark lowest and highest values within 1.5*IQR from Q1 or Q3 Outliers: Beyond 1.5*IQR from Q1 or Q3 (mark with *) z-scores - deviation from mean in units of standard deviation. Outlier: absolute value of z-score > 3

Probability: Basic Concepts Experiment: A process of OBSERVATION Simple event - An OUTCOME of an experiment that can not be decomposed “Mutually exclusive” “Equally likely” Sample Space - The set of all possible outcomes Event “A” - The set of all possible simple events that result in the outcome “A”

Probability A measure of uncertainty of an estimate The reliability of an inference Theoretical approach - “A Priori” Pr (Ai) = n/N n = number of possible ways “Ai” can be observed N = total number of possible outcomes Historical (empirical) approach - “A Posteriori” Pr (Ai) = n/N n = number of times “Ai” was observed N = total number of observations Subjective approach An “Expert Opinion”

Probability Rules n1* n2* ......* nk Multiplication Rule: Number of ways to draw one element from set 1 which contains n1 elements, then an element from set 2, ...., and finally an element from set k (ORDER IS IMPORTANT!): n1* n2* ......* nk

Permutations and Combinations Number of ways to draw r out of n elements WHEN ORDER IS IMPORTANT: Combinations: Number of ways to select r out of n items when order is NOT important

Compound Events

Conditional Probability

Other Probability Rules Mutually Exclusive Events: Independence: A and B are said to be statistically INDEPENDENT if and only if:

Bayes’ Rule

Random Variables Random variable: A function that maps every possible outcome of an experiment into a numerical value. Discrete random variable: The function can assume a finite number of values Continuous random variable: The function can assume any value between two limits.

Probability Distribution for a Discrete Random Variable Function that assigns a value to the probability p(y) associated to each possible value of the random variable y.

Poisson Process Events occur over time (or in a given area, volume, weight, distance, ...) Probability of observing an event in a given unit of time is constant Able to define a unit of time small enough so that we can’t observe two or more events simultaneously. Tables usually give CUMULATIVE values!

The Poisson Distribution

Poisson Approximation to the Binomial In a binomial situation where n is very large (n > 25) and p is very small (p < 0.30, and np < 15), we can approximate b(x, n, p) by a Poisson with probability ( lambda = np)

Probability Distribution for a Continuous Random Variable F( y0 ), is a cumulative distribution function that assigns a value to the probability of observing a value less or equal to y0

Probability Calculations

Expectations Properties of Expectations

The Uniform Distribution A frequently used model when no data are available.

The Triangular Distribution A good model to use when no data are available. Just ask an expert to estimate the minimum, maximum, and most likely values.

The Normal Distribution

The Lognormal Distribution Consider this model when 80 percent of the data values lie in the first 20 % of the variable’s range.

The Gamma Distribution

The Erlang Distribution A special case of the Gamma Distribution when A Poisson process where we are interested in the time to observe k events

The Exponential Distribution A special case of the Gamma Distribution when

The Weibull Distribution A good model for failure time distributions of manufactured items. It has a closed expression for F ( y ).

The Beta Distribution A good model for proportions. You can fit almost any data. However, the data set MUST be bounded!

Bivariate Data (Pairs of Random Variables) Covariance: measures strength of linear relationship Correlation: a standardized version of the covariance Autocorrelation: For a single time series: Relationship between an observation and those immediately preceding it. Does current value (Xt) relate to itself lagged one period (Xt-1)?

Sampling Distributions See slides 8 and 9 for formulas to calculate sample means and variances (raw data and grouped data, simultaneously).

The Sampling Distribution of the Mean (Central Limit Theorem)

The Sampling Distribution of Sums

Distributions Related to Variances

The t Distribution

Estimation Point and Interval Estimators Properties of Point Estimators Unbiased: E (estimator) = estimated parameter Note: S2 is Unbiased if MVUE: Minimum Variance Unbiased Estimators Most frequently used method to estimate parameters: MLE - Maximum Likelihood Estimators.

Interval Estimators -- Large sample CI for mean

Interval Estimators -- Small sample CI for mean

Sample Size

CI for proportions (large samples)

Sample Size (proportions)

CI for the variance

CI for the Difference of Two Means -- large samples --

CI for (p1 - p2) --- (large samples)

CI for the Difference of Two Means -- small samples, same variance --

CI for the Difference of Two Means -small samples, different variances-

CI for the Difference of Two Means -- matched pairs --

CI for two variances

Prediction Intervals

Hypothesis Testing Elements of a Statistical Test. Focus on decisions made when comparing the observed sample to a claim (hypotheses). How do we decide whether the sample disagrees with the hypothesis? Null Hypothesis, H0. A claim about one or more population parameters. What we want to REJECT. Alternative Hypothesis, Ha: What we test against. Provides criteria for rejection of H0. Test Statistic: computed from sample data. Rejection (Critical) Region, indicates values of the test statistic for which we will reject H0.

Errors in Decision Making True State of Nature H0 Ha Decision Dishonest client Honest client Do not lend Correct decision Type II error Lend Type I error Correct decision

Statistical Errors

Statistical Tests

The Critical Value

The observed significance level for a test

Testing proportions (large samples)

Testing a Normal Mean

Testing a variance

Testing Differences of Two Means -- large samples --

Testing Differences of Two Means -- small samples, same variance --

Testing Differences of Two Means -small samples, different variances-

Testing Difference of Two Means -- matched pairs --

Testing a ratio of two variances

Testing (p1 - p2) --- (large samples)

Categorical Data

One-way Tables (Cont.)

Categorical Data Analysis

Example of a Contingency Table

Testing for Independence

Distributions: Model Fitting Steps Collect data. Make sure you have a random sample. You will need at least 30 valid cases Plot data. Look for familiar patterns Hypothesize several models for distribution Using part of the data, estimate model parameters Using the rest of the data, analyze the model’s accuracy Select the “best” model and implement it Keep track of model accuracy over time. If warranted, go back to 6 (or to 3, if data (population?) behavior keeps changing)

Chi-Square Test of Goodness of Fit

Kolmogorov-Smirnov Test of Goodness of Fit

A Review of Probability and Statistics Descriptive statistics Probability Random variables Sampling distributions Estimation and confidence intervals Test of Hypothesis For mean, variances, and proportions Goodness of fit