Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College.

Slides:



Advertisements
Similar presentations
Sensitivity Analysis A systematic way of asking “what-if” scenario questions in order to understand what outcomes could possibly occur that would effect.
Advertisements

® Microsoft Office 2010 Excel Tutorial 4: Enhancing a Workbook with Charts and Graphs.
Random Sampling and Data Description
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Sampling Distributions
Introduction to Excel 2007 Part 2: Bar Graphs and Histograms February 5, 2008.
Introduction to FX Stat 3. Getting Started When you open FX Stat you will see three separate areas.
Central Limit Theorem.
Tinkerplots V Carryn Bellomo
Graphing With Excel 2010 University of Michigan – Dearborn Science Learning Center Based on a presentation by James Golen Revised by Annette Sieg…
Using Excel for Data Analysis in CHM 161 Monique Wilhelm.
First-Year Engineering Program 1 Autumn 2009 Graphing with Microsoft Excel Lecture 11 Engineering H191 Engineering Fundamentals and Laboratory.
1 Summary Statistics Excel Tutorial Using Excel to calculate summary statistics Prepared for SSAC by *David McAvity – The Evergreen State College* © The.
1 Summary Statistics Excel Tutorial Using Excel to calculate descriptive statistics Prepared for SSAC by *David McAvity – The Evergreen State College*
XP New Perspectives on Microsoft Office Excel 2003, Second Edition- Tutorial 9 1 Microsoft Office Excel 2003 Tutorial 9 – Data Tables and Scenario Management.
1 The Sample Mean rule Recall we learned a variable could have a normal distribution? This was useful because then we could say approximately.
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
Parts of a graph There are a few elements common to any graph. Data Range The graph is a pictorial interpretation of data. Generally, you will create a.
1 Organising data in a spreadsheet Module 1 Session 3.
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
Graphing in Excel X-Y Scatter Plot SCI 110 CCC Skills Training.
Regression Basics For Business Analysis If you've ever wondered how two or more things relate to each other, or if you've ever had your boss ask you to.
Graphing with Excel: Graphing Made Easy Mac 2008 Version.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
Introduction to Spreadsheets CSCI-N 100 Department of Computer Science and Information Science.
Using Excel To help with data. Excel is a spreadsheet program that can interface with Word, or PowerPoint A spreadsheet program has cells (little blocks)
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
OST Introduction to Excel Excel 2004 The Project Gallery is a window that links you to every application in Microsoft Office. It allows.
1-7 Data and Spread Big Idea Two measures of the spread of a data set are range and mean absolute deviation. Range, the difference between the maximum.
Examples of different formulas and their uses....
Chapter 03: Lecture Notes (CSIT 104) 11 Chapter 3 Charts: Delivering a Message Exploring Microsoft Office Excel 2007.
By C. Kohn Waterford Agricultural Sciences.   A major concern in science is proving that what we have observed would occur again if we repeated the.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Excel Workshop CHEM 2001, FALL Make some calculations Always begin a function with ‘=‘ Multiply X and Y Multiply X by 50 (2 methods) – Absolute.
Chapter 7 Probability and Samples: The Distribution of Sample Means
Using Google Sheets To help with data. Sheets is a spreadsheet program that can interface with Docs, or Slides A spreadsheet program has cells (little.
Intermacs Form Download Excel Tutorial Pivot Tables, Graphic Tools, Macros By: Devin Koehl.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.3 Prepared.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
ANOVA, Regression and Multiple Regression March
Intermacs Form Download Excel Tutorial Pivot Tables, Graphic Tools, Macros By: Devin Koehl.
Copyright © 2009 Pearson Education, Inc. 8.1 Sampling Distributions LEARNING GOAL Understand the fundamental ideas of sampling distributions and how the.
Standard Deviation. Two classes took a recent quiz. There were 10 students in each class, and each class had an average score of 81.5.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
Chapter 7: Sampling Distributions Section 7.2 Sample Proportions.
Graphing in Excel X-Y Scatter Plot SCI 110 CCC Skills Training.
Copyright © 2009 Pearson Education, Inc. Slide 4- 1 Practice – Ch4 #26: A meteorologist preparing a talk about global warming compiled a list of weekly.
Using Excel to Graph Data Featuring – Mean, Standard Deviation, Standard Error and Error Bars.
Guidelines for building a bar graph in Excel and using it in a laboratory report IB Biology (December 2012)
INTRODUCTION TO STATISTICS
Data!.
Microsoft Office Excel 2003
Distribution of the Sample Means
SPREADSHEETS Parts of a graph Data Range X and Y axes
Introduction to Summary Statistics
Introduction to Summary Statistics
Using Excel to Graph Data
graphical representation of data
By C. Kohn Waterford Agricultural Sciences
Standard Deviation.
Introduction to Summary Statistics
Introduction to Summary Statistics
Using Charts in a Presentation
Prediction and Accuracy
Using Excel to Graph Data
Chapter 7: Sampling Distributions
CHAPTER 7 Sampling Distributions
Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.
Presentation transcript:

Sample size vs. Error A tutorial By Bill Thomas, Colby-Sawyer College

Introduction In the pipetting tutorial, you explored the utility of the mean, the standard deviation and the relative error in describing the reproducibility and accuracy of a sample. You also learned a few tricks for working more efficiently within Excel. In this tutorial, we are going to explore the relationship between sample size and variability. How well do our descriptors (mean, st. dev.) work to tell us if our sample is representative of the population we are trying to describe, or evaluate. For this we will introduce a new descriptor, the Standard Error of the Mean (SEM), and we will see how it varies with the sample size. Along the way, we will also gain a bit more experience using Excel.

First, what do we mean by the “population” and the “sample”? Let’s suppose that our population were the numbers from 1 to 10. There would be 10 members of the population, and it would not be difficult to sample (e.g., consider, do an experiment on, take data on, or do our calculations on) every member of the population. However, what if our population had 10,000 elements (or even more)? It would be impractical, verging on impossible, to treat every member of the population separately. Thus we frequently select a sample population to represent the full population. This approach makes the work easier, but it raises the question of how representative the sample population is of the entire, or full population. Consider the images on the following slide.

Full population Sample population = The sample population IS the full population; this is ideal Full population Sample population The sample population is much smaller than the full population, but the full population is uniform, so the sample is representative of the full population. This is rarely the case, unfortunately. Full population Sample population Here the full population is non-uniform, so a small sample cannot be truly representative of the full population. This is most often the case we face in reality, and the question is, how to sample appropriately under these circumstances.

Let’s begin by getting a sense for the kind of variability we might encounter. For this we will use a new tool in Excel, a random number generator, that will allow us to create randomly generated populations of numbers of any size and within any limits. The function looks like this: =RANDBETWEEN(x,y) It will generate in the cell in which it is written a random number between the set limits x and y. Thus, if you specify 2 and 4, it will produce a number in the chosen cell between 2 and 4. Let’s use this function.

Open a new Excel file, and in cell A1 type in the function =RANDBETWEEN(9,11). When you click “enter”, there will appear a number between 9 and 11. Drag click this cell down to fill 9 more cells. Each of the 10 cells should now contain a number between 9 and 11. Highlight the 10 vertical cells and drag click them over to column P. You should have an array 16 columns by 10 numbers, each number randomly generated, that looks like this: Note that the number in each position of your set will be different that that shown here.

Now, there is a little quirk about the function that you just used. Each time you attempt to copy and paste a cell containing it, the number in the cell changes. While this feature is useful for some purposes, it makes what you are about to do a bit more challenging. We need to have numbers that do not change as you work with them, so here is what to do. Select all 160 numbers and copy them. Then put the cursor on cell C13. If you have a PC, right click on the cell and select the option “paste special”. If you have a Mac, highlight cell C13 and under the edit option on the menu bar, select “paste special”. In each case, you will be given a menu, the first choice on which is “paste”. Under the paste submenu, select “values” and click OK. The numbers that appear will now be fixed; they will not vary when you manipulate them.

Next, as preparation for the steps to come, color the cells and add in the other details shown below. Remember: A cell with a yellow background color is a cell into which you type a value, a number (like all the values above). A cell with a salmon background color is a cell in which you must write an equation to generate the number shown (which will be the case with most of the steps to follow).

Now, let’s think for a minute about the numbers that you have generated. The range allowed was from 9 to 11 (or 10 +/- 1), so you can see that the average of these numbers ought to be midway between 9 and 11, or 10. Let’s use Excel to calculate the mean of the first 3 numbers in each vertical column of 10 numbers to see how close it is to 10. Set your spread sheet up as shown below, being sure to calculate the mean and (below it) the standard deviation for the first three numbers in each column.

Now look at your 16 means. How similar are they? How close to the expected value of 10 are they? Are you surprised? Let’s visualize the distribution. Create a scatter plot (without a line connecting the data points) of the means with the axes shown below:

There is a number that describes the variation within each data set, and that number is the standard deviation. Calculate the standard deviation for each of your sets of 3 as shown below. Generate a scatter plot of the standard deviations, as well. What does it tell you about the variation within your data?

The standard deviation expresses the variation within the sample set, but it does not really tell us how well the sample represents (estimates the values for) the full population that we are trying to evaluate. For that we need a new term, the standard error of the mean. S.E.M. = standard deviation of the sample/square root of the sample size Use the standard deviations that you just calculated to calculate the standard error for each sample of 3. Use a scatter plot to visualize the distribution of these values as well.

It should be clear that there is quite a bit of variation in the data when they are sampled three at a time. Are samples of this size good at representing the full population from which they are taken? What would happen, if we took larger samples from the full population? Begin by doing the same calculations for each full column of 10 (not just the first three values in the column). Add these new data to each of your three scatter plots.

Now take the data twenty values at a time; calculate the mean, st. dev. and SEM for two columns of data. You will have half as many calculated values as before. Again, add these new data to each of your three scatter plots.

Complete this exercise in three more successive steps for the data taken 40 values at a time, 80 values at a time, and 160 values at a time, as shown below. Note that 160 values constitutes the entire population. Conclude by adding these new data to each of your three scatter plots. Do your plots show you a pattern? What is that pattern?

In the scatter plots that you have generated thus far, you have illustrated the distributions of your various calculations, showing each descriptor for each population evaluated. It is possible, and often very helpful, to visualize the trends in the data with summary graphs. To this end, calculate to the far right as shown, the means for all the values grouped horizontally across each line. Thus the value to the right of the first line will be the mean of all 16 means of the data taken three values at a time. The second line will be the mean of all the st. devs. of the the data taken three values at a time, and on down the column to the last value, which will be just the values calculated for the entire population

Graph the mean standard errors calculated against the sample size as indicted on the axes below. Do you see a trend? How big does the data set have to be to fulfill the trend? Let’s explore that a bit.

Let’s create a really big data set. Use the random number generator to create a column of 1000 values. Then calculate the mean, st. dev., and SEM for the sample sets having 3, 10, 20, 50, 100, 200, 500 and 1000 values (the last being the entire population). Then make two graphs, one of the mean, the other of the SEM, plotted against sample size. Do you remember that we assumed at the beginning that the mean for a population distributed randomly about 10 should have a mean of 10? What do the data show you? Similarly, how big does the sample have to be before the SEM of the sample approximates the SEm of the entire population? 1000

Some things to try: Now that you have the calculator and graphs all set up, you can “play” pretty easily. What happens to your values and graphs from the previous slide if you change the spread of the data?. For example, what if you used RANDBETWEEN(7,13)? How about RANDBETWEEN(5,15)? Does changing the size of the data (2000, 5000, 10,000 values) set have any interesting effect on these outcomes? (Don’t be intimidated. The numbers look huge, but you can create such sets in seconds). How big does each population have to be for the mean to resolve to the expected value? Finally, what have you learned about sampling a population with inherent variability?