Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Collection.

Similar presentations


Presentation on theme: "Data Collection."— Presentation transcript:

1 Data Collection

2 Study vs. Experiment Observational Study Experiment
Based on data in which no manipulation of factors has been employed Experiment Manipulates factors to create treatments Randomly assigns subjects to the treatments Compare the responses of the subjects across treatment levels

3 Study or Experiment? Observational study
Researchers have linked an increase in the incidence of breast cancer in Italy to dioxin released by an industrial accident in The study identified 981 women who lived near the site of the accident and were under age 40 at the time. Fifteen of the women had developed breast cancer at an unusually young average age of 45. Medical records showed that they had heightened concentrations of dioxin in their blood and that each tenfold increase in dioxin level was associated with a doubling of the risk of breast cancer. Observational study

4 Study or Experiment? Experiment
Is diet or exercise effective in combating insomnia? Some believe that cutting out desserts can help alleviate the problem, while others recommend exercise. Forty volunteers suffering from insomnia agreed to participate in a month-long test. Half were randomly assigned to a special no-desserts diet; the others continued desserts as usual. Half of the people in each of these groups were randomly assigned to an exercise program, while the others did not exercise. Those who ate no desserts and engaged in exercise showed the most improvement. Experiment

5 The Cycle of Statistics
Population Sample Parameter Statistic

6 Principles of Experimental Design
aspects of the experiment that we know may have an effect on the response, but that are not the factors being studied. Control Randomize to even out effects that we cannot control Replicate over as many subjects as possible.

7 Types of Sampling Random Sample Simple Random Sample
Stratified Random Sample Probability Random Sample

8 Random Sampling Simple Random Sample (SRS)
Every member of the population has an equal chance of being chosen for the sample Method Assign a random number to each individual in the sampling frame Select only those whose random numbers satisfy some rule

9 Simple Random Sample Example
There are 80 students enrolled in an introductory Statistics course; you are to select a sample of 5 Sampling frame The roster of all students enrolled in the course Label each student Use a random number generator and choose the first 5 students from the list that match the random numbers. Ignore numbers not on the list and repeats.

10 Stratified Random Sample
Population is divided into similar groups of individuals These are called strata Then a SRS is completed in each strata These are combined for the overall sample

11 Probability Random Sample
A sample is chosen by chance Each sample has a probability of being chosen We have to know this

12 What is the population. What is the sample
What is the population? What is the sample? Which random sample was used?  A company packaging snack foods maintains quality control by randomly selecting 10 cases from each day’s production and weighting the bags. Then they open on a bag from each case and inspect the contents. Population: All snack foods produced at the company Sample: 10 cases from each day’s production Random Sample: SRS

13 What is the population. What is the sample
What is the population? What is the sample? Which random sample was used? Dairy inspectors visit farms unannounced and take samples of the milk to test for contamination. If the milk is found to contain dirt, antibiotics, or other foreign matter, the milk will be destroyed and the farm re-inspected until purity is restored. Population: All milk at the dairy (in the tank) Sample: sample from the milk tank Random Sample: SRS

14 Terminology Factor: What is being manipulated Response:
What is being measured Experimental Units: individuals on which the experiment is done Subjects: Human experimental units Treatment: specific experimental condition applied Control group: Group that receives no treatment or a placebo Placebo: A treatment known to have no effect

15 Analyzing Experiments
Aspirin Study Factor: Aspirin Response: Number of heart attacks Subjects/Units: 1000 male volunteers Treatment : Aspirn Levels: Low dose and none (Placebo) Blinding: Patients not know which pill they are taking Control: A group will take a placebo pill The men will be randomly assigned to either the treatment group or placebo group. Randomization: Replication: Each treatment will be replicated 500 times

16 Displaying Data

17 Types of Variables Categorical: places an individual into one of several groups or categories. Ex: Eye color, favorite food Quantitative: takes a range of numeric values Ex: Height, weight, income Discrete: finite possible values EX: number of goals in soccer Continuous: infinite possible values EX: Height of males at Enloe

18 What kind of variable? Does it make sense to average the values?
Gender Telephone area code Amount of electricity used Zip code Ticket sales at Mylie Cyrus concert Number of chicken eggs hatched on Nov. 17, 2006 at 3:00 am Categorical Quantitative (C) Quantitative (D) Does it make sense to average the values?

19 Every graph I ever make will always
Have a title Axes labeled Units identified Legend For categorical data

20 Graphs for categorical data
Bar Chart Bars never touch!

21 Graphs for categorical data
Pie Chart *Used for comparing parts to a whole

22 Create a graph for the following…
A survey was conducted of 1000 individuals regarding their favorite color. The results are as follows: Red 367 Yellow 100 Green 68 Blue 159 Purple 200 Grey 26 Pink 80

23 Data Representation of Favorite color survey

24 Graphs for Quantitative Variables
Dot Plot Useful for small sets of data Stem and Leaf Plot More information than dot plot Histograms Box Plots More about these tomorrow!

25 Creating a Stem and Leaf plot
Sort the data Identify the min and max values to establish what kind of stems and leaves to use If leaves become too long split them Create a legend

26 Back to back stem and leaf plots
Used for comparing two similar sets of data Stems are in the middle and the leaves expand to the left for one data set and to the right for the other data set

27 Histograms Groups nearby values and displays frequencies
National SAT scores 2007

28 How to construct Histograms
Determine the bin size Divide the range into equal sections Min of 5 bins Create a frequency table Draw the graph

29 Wake County 2008 SAT scores Sort the data Identify the range of the data Identify a bin size that makes sense and will produce at least 5 bins

30 Relative Frequency Table
Score Count 1500 – 1599 1600 – 1699 1700 – 1799 Use this table to help draw your histogram!

31

32 Graphs can be MISLEADING!
Number of deaths in Iraq as Published by AOL news in March of 2006

33

34

35 Describing Data

36 Describing Data Shape Outlier Center Spread
Mound, symmetrical, skewed, single peak, multiple peaks Outlier Any observation that appears to not belong with the others Center The middle of the data Spread Min value to max value (including or excluding outliers)

37 Describing Graphs (Shape)
Symmetric: If the right and left sides of the histogram are approximately mirror images Skewed right: If the right side has outliers Skewed left: If the left side has outliers Bi-modal: If there are 2 peaks Uniform: There are the same number of observations for each value

38 Measures of center Median Mean Exact middle of a set of data
Arithmetic average of all of the observations in a data set

39 Ex: 1,2,3,4,5,6,7,8,9 What is the median? What is the mean?
What if 10 is added to the data set? What is the median and mean?

40 Resistant measures Def: A measure is resistant if it is not easily influenced by extreme observations Is the median a resistant measure? Yes Is the mean a resistant measure? No

41 Measures of spread Standard deviation Quartiles Range
Find this in your calculator under 1 variable stats! Quartiles IQR (Inter Quartile Range) Q3-Q1 These are found in your 5 # Summary! Range =Max-min

42 5 number summary Min Q1: quartile 1, median of the lower half
Median (Q2) Q3: quartile 3, median of the upper half Max

43 Components of a box plot
5 number summary Min, Q1, Median, Q3, Max Outliers Q1-1.5(IQR) Q3+1.5(IQR)

44 Q3 Max Min Q1 Med

45 Where’s the data? 25% 25% 50%

46 What about outliers? Smallest obs. That is not an outlier
Largest obs. That is not an outlier Min Q1 Med Q3 Max

47 Standard Deviation and Normal Distributions

48 Standard deviation Gives a measure of how far the data varies from the mean “on average” Is only used if the mean is the chosen measure of center Is the standard deviation a resistant measure? No!

49 Beginning pulse in class (n=23)
Min = 50 Q1 = 66 Median = 79 Q3 = 87 Max = 110

50 End pulse in class (n=24) Min = 54 Q1 = 64.5 Median = 70 Q3 = 77.5 Max = 109

51 Outliers Interquartile range (IQR) = Q3 – Q1
An observation is an outlier if it lies 1.5(IQR) above Q3 or 1.5(IQR) below Q1 End Class Data Q1 = Q3 = 77.5 IQR = 77.5 – 64.5 = 13 1.5(13) = 19.5 Q1 – 1.5(IQR) = 64.5 – 19.5 = 45 Q (IQR) = = 97

52 Outliers Any observation below 45 or above 97 will be an outlier 109 is an outlier

53 What is Normal? A bell shaped curve
Standard Normal distribution is when… Mean=0 Standard Deviation=1

54 Rule The normal curve can give us an idea of how extreme a value is based on how far away from the mean it is. 68% 95% 99.7% -3 -2 -1 mean 1 2 3 Standard Deviations Standard Deviations

55 Homework P. 65 # 12, 13 Make graph (box plot for #12 and histogram for #13) Describe the shape Find any outliers Find mean, and median Find range, standard deviation and IQR


Download ppt "Data Collection."

Similar presentations


Ads by Google