Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.

Slides:



Advertisements
Similar presentations
AP Statistics Course Review.
Advertisements

5. Statistical Inference: Estimation
Sampling Distributions (§ )
QUANTITATIVE DATA ANALYSIS
1 BA 555 Practical Business Analysis Housekeeping Review of Statistics Exploring Data Sampling Distribution of a Statistic Confidence Interval Estimation.
4. Probability Distributions Probability: With random sampling or a randomized experiment, the probability an observation takes a particular value is the.
Chapter 8 Estimation: Single Population
Introduction to Educational Statistics
Chapter 7 Estimation: Single Population
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Review of Statistics 101 We review some important themes from the course 1.Introduction Statistics- Set of methods for collecting/analyzing data (the art.
Chapter 2: The Normal Distribution
Probability and the Sampling Distribution Quantitative Methods in HPELS 440:210.
Chap. 5, 6: Statistical inference methods
1. Homework #2 2. Inferential Statistics 3. Review for Exam.
Chapter 1 Descriptive Analysis. Statistics – Making sense out of data. Gives verifiable evidence to support the answer to a question. 4 Major Parts 1.Collecting.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Numerical Descriptive Techniques
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
MBA7020_04.ppt/June 120, 2005/Page 1 Georgia State University - Confidential MBA 7020 Business Analysis Foundations Descriptive Statistics June 20, 2005.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
AP Statistics Chapter 9 Notes.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
LECTURE 16 TUESDAY, 31 March STA 291 Spring
© Copyright McGraw-Hill CHAPTER 3 Data Description.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Agresti/Franklin Statistics, 1e, 1 of 139  Section 6.4 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
M07-Numerical Summaries 1 1  Department of ISM, University of Alabama, Lesson Objectives  Learn when each measure of a “typical value” is appropriate.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 2 Modeling Distributions of Data 2.2 Density.
Chapter 21 Basic Statistics.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Determination of Sample Size: A Review of Statistical Theory
June 11, 2008Stat Lecture 10 - Review1 Midterm review Chapters 1-5 Statistics Lecture 10.
Agresti/Franklin Statistics, 1 of 87  Section 7.2 How Can We Construct a Confidence Interval to Estimate a Population Proportion?
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 6 Putting Statistics to Work.
Summarizing Risk Analysis Results To quantify the risk of an output variable, 3 properties must be estimated: A measure of central tendency (e.g. µ ) A.
Chapter 13 Sampling distributions
Review Confidence Intervals Sample Size. Estimator and Point Estimate An estimator is a “sample statistic” (such as the sample mean, or sample standard.
1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Chapter 2 The Normal Distributions. Section 2.1 Density curves and the normal distributions.
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Sampling Distributions and Estimation
Week 10 Chapter 16. Confidence Intervals for Proportions
CHAPTER 2 Modeling Distributions of Data
Descriptive Statistics
Descriptive and inferential statistics. Confidence interval
Review for Exam 2 Some important themes from Chapters 6-9
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Welcome!.
1. Homework #2 (not on posted slides) 2. Inferential Statistics 3
Sampling Distribution Models
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
CHAPTER 2 Modeling Distributions of Data
Advanced Algebra Unit 1 Vocabulary
CHAPTER 2 Modeling Distributions of Data
Introductory Statistics
CHAPTER 2 Modeling Distributions of Data
Presentation transcript:

Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods for Design – Planning / Implementing a study Description – Graphical and numerical methods for summarizing the data Inference – Methods for making predictions about a population (total set of subjects of interest, real or conceptual), based on a sample

2. Sampling and Measurement Variable – a characteristic that can vary in value among subjects in a sample or a population. Types of variables Categorical Quantitative Categorical variables can be ordinal (ordered categories) or nominal (unordered categories) Quantitative variables can be continuous or discrete Classifications affect the analysis; e.g., for categorical variables we make inferences about proportions and for quantitative variables we make inferences about means

Randomization – obtaining reliable data by reducing potential bias Simple random sample: In a sample survey, each possible sample of size n has the same chance of being selected. Randomization in a survey used to get a good cross- section of population. With such probability sampling methods, standard errors tell us how close sample statistics tend to be to population parameters. (Otherwise, the sampling error is unpredictable.)

Experimental vs. observational studies Sample surveys are examples of observational studies (merely observe subjects without any experimental manipulation) Experimental studies: Researcher assigns subjects to experimental conditions. –Subjects should be assigned at random to the conditions (“treatments”) –Randomization “balances” treatment groups with respect to lurking variables that could affect response (e.g., demographic characteristics, SES), makes it easier to assess cause and effect

3. Descriptive Statistics Numerical descriptions of center (mean and median), variability (standard deviation – typical distance from mean), position (quartiles, percentiles) Bivariate description uses regression/correlation (quantitative variable), contingency table analysis (categorical variables). Graphics include histogram, box plot, scatterplot

Mean drawn toward longer tail for skewed distributions, relative to median. Properties of the standard deviation s: s increases with the amount of variation around the mean s depends on units of the data (e.g. measure euro vs $) Like mean, affected by outliers Empirical rule: If distribution approx. bell-shaped,  about 68% of data within 1 std. dev. of mean  about 95% of data within 2 std. dev. of mean  all or nearly all data within 3 std. dev. of mean

Sample statistics / Population parameters We distinguish between summaries of samples (statistics) and summaries of populations (parameters). Denote statistics by Roman letters, parameters by Greek letters: Population mean =  standard deviation =  proportion   are parameters. In practice, parameter values are unknown, we make inferences about their values using sample statistics.

4. Probability Distributions Probability: With random sampling or a randomized experiment, the probability an observation takes a particular value is the proportion of times that outcome would occur in a long sequence of observations. A probability distribution lists all the possible values and their probabilities (which add to 1.0)

Like frequency dist’s, probability distributions have mean and standard deviation Standard Deviation - Measures the “typical” distance of an outcome from the mean, denoted by σ

Normal distribution Symmetric, bell-shaped Characterized by mean (  ) and standard deviation (  ), representing center and spread Prob. within any particular number z of standard deviations of  is same for all normal distributions An individual observation from an approximately normal distribution satisfies: –Probability 0.68 within 1 standard deviation of mean –0.95 within 2 standard deviations –0.997 (virtually all) within 3 standard deviations

Notes about z-scores z-score represents number of standard deviations that a value falls from mean of dist. A value y is z = (y - µ)/σ standard deviations from µ The standard normal distribution is the normal dist with µ = 0, σ = 1

Sampling dist. of sample mean is a variable, its value varying from sample to sample about population mean µ. Sampling distribution of a statistic is the probability distribution for the possible values of the statistic Standard deviation of sampling dist of is called the standard error of For random sampling, the sampling dist. of has mean µ and standard error

Central Limit Theorem: For random sampling with “large” n, sampling dist of sample mean is approximately a normal distribution Approx. normality applies no matter what the shape of the population distribution How “large” n needs to be depends on skew of population dist, but usually n ≥ 30 sufficient Can be verified empirically, by simulating with “sampling distribution” applet at Following figure shows how sampling distribution depends on n and shape of population distribution.

5. Statistical Inference: Estimation Point estimate: A single statistic value that is the “best guess” for the parameter value (such as sample mean as point estimate of popul. mean) Interval estimate: An interval of numbers around the point estimate, that has a fixed “confidence level” of containing the parameter value. Called a confidence interval. (Based on sampling dist. of the point estimate, has form point estimate plus and minus a margin of error that is a z or t score times the standard error)

Confidence Interval for a Proportion (in a particular category) Sample proportion is a mean when we let y=1 for observation in category of interest, y=0 otherwise Population prop. is mean µ of prob. dist having The standard deviation of this prob. dist. is The standard error of the sample proportion is

Finding a CI in practice Complication: The true standard error itself depends on the unknown parameter! In practice, we estimate and then find 95% CI using formula

CI for a population mean For a random sample from a normal population distribution, a 95% CI for µ is where df = n-1 for the t-score Normal population assumption ensures sampling dist. has bell shape for any n (Recall figure on p. 93 of text and next page). Method is robust to violation of normal assumption, more so for large n because of CLT.

Questions you should be able to answer: What is the difference between descriptive statistics and inferential statistics? Give an example of each. Why do we distinguish between quantitative variables and categorical (nominal and ordinal) variables? What is a simple random sample? Why should a survey use random sampling, and why should an experiment use randomization in assigning subjects to treatments? What is the difference between an experiment and an observational study? Define the mean and median, and give an advantage and disadvantage of each. Give an example when they would be quite different? Which would be larger, and why? How do you interpret a standard deviation?

Which measures are resistant to outliers: Mean, median, range, standard deviation, inter-quartile range? What information does a box plot give you? How can you describe an association between two (i) quantitative variables, (ii) categorical variables What is the correlation used for, and how do you interpret its value? If you know or can estimate P(A) and P(B), how do you find P(A and B) when A and B are independent? How do you use the normal distribution to get tail probabilities, central probabilities, z-scores? What does the z-score represent? What is a sampling distribution? What is a standard error? What’s the difference between s, σ, se. Which is the smallest? Why is the normal distribution important in statistics?

What does the Central Limit Theorem say? Why is it important in statistical inference? What are the differences among (i) population distribution, (ii) sample data distribution, (iii) sampling distribution? How do you interpret a confidence interval? Why do standard errors have to be estimated, and what impact does that have (i) for inference about a proportion, (ii) for inference about a mean? How does the t distribution depend on the df value, and how does it compare to the standard normal distribution? How does a confidence interval depend on the (i) sample size, (ii) confidence level? How do you find the z-score or t- score for a particular confidence? Why do you use a t-score instead of a z-score when you find the margin of error for a confidence interval for a mean?