Daniela Stan Raicu School of CTI, DePaul University

Daniela Stan Raicu School of CTI, DePaul University
CSC 323 Quarter: Winter 02/03 Daniela Stan Raicu School of CTI, DePaul University 1/18/2019 Daniela Stan - CSC323

Outline Chapter 3: Producing Data Common Terms:
Population, Individual, Sampling Frame, Sample, Sample Survey, Census Sampling Design Towards Statistical Inference Experimental Design 1/18/2019 Daniela Stan - CSC323

Population The way that the data is collected is very important. The amount and quality of useful information in a data set depends directly on how that data was gathered. The entire group of individuals that we want information about is called the population. 1/18/2019 Daniela Stan - CSC323

Census 1/18/2019 Daniela Stan - CSC323

Samples and Sample Surveys
A sample is the part of the population that we actually examine in order to gather information. 1/18/2019 Daniela Stan - CSC323

Sampling Design How do we choose the sample?
sample design: the method used to choose the sample; the sample should be representative for the entire population, that is it is not biased. Sources of bias: voluntary response sample: consists of people who choose themselves by responding to a general appeal. convenience sampling: selecting individuals that are easiest to reach: for example, choosing a sample of shoppers at a mall, also tends to give biased data sets. 1/18/2019 Daniela Stan - CSC323

Sampling Design The simplest way of getting an unbiased sample is to use a simple random sample. Simple Random Sample (SRS) of size n consists of n individuals from the population chosen in a such a way that every set of n individuals has an equal chance to be the sample actually selected. Steps (similar to experimental randomization): Label all the individuals in the population Use a table of random digits to select a sample of a desired size 1/18/2019 Daniela Stan - CSC323

Table of Random Digits Experimenters use software to carry out randomization. Without software, using a table of random digits (Table B in the textbook); A table of random digits is a list of digits 0,1,2,3,4,5,6,7,8,9 that has the following properties: 1. The digit in any position in the list has the same chance of being anyone of 0,1,2,3,4,5,6,7,8,9 2. The digits in different positions are independent in the sense that the value of one has no influence on the value of any other. 1/18/2019 Daniela Stan - CSC323

How to randomize? Randomization requires two steps:
assign labels to the individuals: - all labels should have same length use the shortest possible labels: one digit for 9 or fewer individuals, two digits for 10 to 100 individuals and so on. 2. use Table B to select labels at random: - you can read digits from Table B in any order – along a row, down a column, and so on Example: Problem 3.41 1/18/2019 Daniela Stan - CSC323

Toward Statistical Inference
Statistical Inference is to use a fact about a sample to estimate the truth about the whole population. A parameter p is a number that describes the population; it is a fixed number whose value we don’t know. A statistic is a number that describes a sample. A value of a statistics is known when we have taken a sample, but it can change from sample to sample. A statistic is often used to estimate a parameter. 1/18/2019 Daniela Stan - CSC323

How good is the statistic?
The value of a statistic will vary from one sample to another one; sampling variability is the variation of the values of the statistic in repeated random sampling. If the variation of the statistic is too great, when choosing different samples, the results of any one sample cannot be trusted. A statistical inference is trustworthy if there is not much variability for the statistics within repeated samples of same size. 1/18/2019 Daniela Stan - CSC323

How the statistic varies with repeated samples?
To understand the variability of the statistics: Take a large number of samples (simulation can be used to obtain the samples); Calculate the statistics for each sample Make a histogram of the values of the statistics Examine the distribution displayed in the histogram: - shape - center - spread - outliers 1/18/2019 Daniela Stan - CSC323

The distribution of a statistic
Example: Suppose that 60% of the all American adult residents find clothes shopping time-consuming and frustrating. The true value of the parameter we want to estimate is p =0.6. Suppose that we don’t know the true value of the parameter and we take different samples in order to estimate the value of p: - we take 1000 simple random samples (SRS), each of size 100, and we estimate the value of p 1/18/2019 Daniela Stan - CSC323

Example (cont.): If we choose other 1000 samples of 2500 size each, the below figure shoes the variation in the estimate of p: Sampling Distribution of a statistic is the distribution of values taken by the statistics in all samples of the same size from the same population. 1/18/2019 Daniela Stan - CSC323

Interpretation of the sampling distribution
Shape: normal distribution Center: The values are centered at 0.6; since the true values of the parameter is 0.6, the estimator of p (obtained from repeated SRS) is called to be unbiased (the mean of the statistic’s values is equal to the true value of the parameter). Spread: the values of the estimator (statistics) from samples of size 2500 are much less spread (variability of the statistics) out than those from samples of size 100; therefore, the statistics from larger size sample have smaller spreads. 1/18/2019 Daniela Stan - CSC323

Managing Bias and Variability
Simple random sampling produces unbiased estimates for the value of the parameter of a population; therefore, use random sampling to reduce bias. To reduce the variability of a statistics from a SRS, use a larger sample. The variability of a statistics from a random sample does not depend on the size of the population as long as the size of the population is at least 100 times larger than the sample. 1/18/2019 Daniela Stan - CSC323

Low bias, high variability
Example on bias and variability: Problem 3.62: Label each distribution relative to the others as: Low bias, high variability High bias, low variability Low bias, low variability High bias, high variability 1/18/2019 Daniela Stan - CSC323

Bias and Variability True value of the parameter
= bull’s eye on a target Bias and variability describes what happens when an archer fires many arrows at the target. 1/18/2019 Daniela Stan - CSC323

Probability Sampling Plans
Stratified random sampling Multistage sampling Stratified Samples: (Example: Problem 3.47) It is important to sample important groups within the population separately, then combine these samples. Steps of the stratified sample design: - divide the population into groups of similar individuals, called strata; Examples: - female versus male - urban, suburban and rural - choose a separate SRS in each stratum - combine the SRS to form the full sample. Multistage sample design selects successively smaller groups from the population in stages, resulting in a sample consisting of clusters of individuals; each stage may employ an SRS, a stratified sample or another type of sample. 1/18/2019 Daniela Stan - CSC323

Observation versus Experiment
An observational study observes individuals and measures variables of interests but does not attempt to influence the responses. An experiment deliberately imposes some treatment on individuals in order to observe their responses. Terms: Experimental units: individuals on which the experiment is done; subjects when the experimental units are human beings; Factors: the explanatory variables Treatment: combination of a specific value (often called level) of each of the factors 1/18/2019 Daniela Stan - CSC323

Design of Experiments The design of an experiment refers to:
the choice of treatments and the manner in which the experimental units or subjects are assigned to the treatments. The principles of statistical design: 1. Control: comparison of several treatments in the same environment is the simplest form of control. 2. Randomization: uses chance to assign experimental units into treatment groups that are similar (except for chance variation). - Randomization and comparison together prevent bias (systematic favoritism in experiments). 3. Replication: of the treatments on many units reduces the role of chance variation in the results. 1/18/2019 Daniela Stan - CSC323

Block design A second form of control is by forming blocks of experimental units that are similar in some way that is important to the response. In a block design, the random assignment of units to treatments is carried out separately within each block. Block designs can have blocks of any size; blocks allow to draw separate conclusions about each block. 1/18/2019 Daniela Stan - CSC323

Matched pairs designs Matched pairs are a common form of blocking for comparing just two treatments. There are two types of matched pairs designs: Each subject receives both treatments in a random order. The subjects are matched in pairs as close as possible, and one subject in each pair receives one treatment. Reading Assignment: Chapter 3 1/18/2019 Daniela Stan - CSC323

Daniela Stan Raicu School of CTI, DePaul University

Similar presentations

Presentation on theme: "Daniela Stan Raicu School of CTI, DePaul University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Daniela Stan Raicu School of CTI, DePaul University

Similar presentations

Presentation on theme: "Daniela Stan Raicu School of CTI, DePaul University"— Presentation transcript:

Similar presentations

About project

Feedback