Exploratory Data Analysis Observations of a single variable.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Estimating a Population Variance
Sta220 - Statistics Mr. Smith Room 310 Class #14.
Inference for Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Chapter 14 Comparing two groups Dr Richard Bußmann.
IB Math Studies – Topic 6 Statistics.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Point estimation, interval estimation
Simple Linear Regression Analysis
8-5 Testing a Claim About a Standard Deviation or Variance This section introduces methods for testing a claim made about a population standard deviation.
BCOR 1020 Business Statistics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Introduction to Regression Analysis, Chapter 13,
Use of Quantile Functions in Data Analysis. In general, Quantile Functions (sometimes referred to as Inverse Density Functions or Percent Point Functions)
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Regression Analysis (2)
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
+ DO NOW What conditions do you need to check before constructing a confidence interval for the population proportion? (hint: there are three)
Estimating a Population Mean
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
CHAPTER 18: Inference about a Population Mean
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.3 Estimating a Population Mean.
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Essential Statistics Chapter 161 Review Part III_A_Chi Z-procedure Vs t-procedure.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Report Writing. A report should be self-explanatory. It should be capable of being read and understood without reference to the original project description.
Worked Example Using R. > plot(y~x) >plot(epsilon1~x) This is a plot of residuals against the exploratory variable, x.
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Estimating with Confidence Section 11.1 Estimating a Population Mean.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Example x y We wish to check for a non zero correlation.
Section 6.4 Inferences for Variances. Chi-square probability densities.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
+ Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population Mean.
Statistics 25 Paired Samples. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chapter 8: Estimating with Confidence
Inference about the slope parameter and correlation
Chapter 8: Estimating with Confidence
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Determining Which Method to use
CHAPTER 18: Inference about a Population Mean
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
2/5/ Estimating a Population Mean.
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Presentation transcript:

Exploratory Data Analysis Observations of a single variable

Example In 1798 Cavendish made 29 determinations of the density of the Earth, relative to that of water. His results are stored in R: > density [1] [16] Source: The Data and Story Library: that these are observations of a continuous variable, as in general are measurements of all kinds.

Of interest, of course, is to estimate the true density of the Earth. A useful simple display is given by stem(density), while simple summary statistics are produced by the use of the functions mean, median, sd, summary, etc. In particular we have > mean(density) [1] 5.42 > median(density) [1] 5.46 The standard deviation of the observations is 0.34.

A histogram is given by > hist(density,breaks=seq(4,6,0.2), xlab = "relative density of Earth")

Clearly there is at least one low outlier in the data. Thus the median may give a better estimate than the mean of the true density. Now, let us investigate the extent to which the data can be modelled as a random sample from some underlying normal distribution..

A normal Q-Q plot can be used to examine this. Recall that this is a plot of the sorted observations against what is effectively a idealised sample of the same size from the N(0, 1) distribution. The fitted line corresponds to the normal distribution with the same first and third quartiles as the data.

The plot and the fitted line are constructed with > qqnorm(density) > qqline(density)

The line has intercept 5.46 and slope 0.23 which provide a reasonable estimate of the mean and standard deviation of the best fitting normal distribution. The plot again suggests thatat least the lowest observation should be ignored.

An approximate 95% confidence interval for the true mean of the underlying distribution of the data, based on using all the data, is given by >mean(density)+c(-1,1)*qnorm(0.975) *sqrt(var(density)/length(density))

This gives a response of: [1] To correct for the fact that the sample variance is an estimate of the underlying true variance, we can use t.test(density) which gives a 95% confidence interval of [ ]. The generally accepted modern day true value for the relative density of Earth is 5.52.

Example The R variable photons contains a count of the number of photons produced in each of 60 successive seconds by a very weak light source. > photons [1] [39] Here the variable photons is a count (and so discrete). We have 60 observations of it.

In addition to the usual R summary functions, the R function table gives a frequency table: > table(photons) photons

A histogram can be produced with >hist(photons,breaks=seq(-1,6)). However, since this variable is a count it is interesting to compare its distribution with that of the Poisson distribution with the same mean (2.08). The appropriate diagrams are produced with the commands:

>barplot(table(photons),xlab="photon count", ylab="frequency" ylim=c(0,20))

>barplot(60*dpois(0:8,2.08), names=0:8, xlab="photon count", ylab="Poisson expected frequency", ylim=c(0,20))

The Chi-squared distribution,  2,can be used to check whether there is a significant difference between the observed and the expected frequencies.

The sum of the last column is

This value of can then be compared with tabulated values of chi-squares for a particular degree of freedom (here 6).

It can be shown that the two distributions are not significantly different at a 5% level of significance.