Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 3 Dustin Lueker. 2  Suppose the population can be divided into separate, non-overlapping groups (“strata”) according to some criterion ◦ Select.

Similar presentations


Presentation on theme: "Lecture 3 Dustin Lueker. 2  Suppose the population can be divided into separate, non-overlapping groups (“strata”) according to some criterion ◦ Select."— Presentation transcript:

1 Lecture 3 Dustin Lueker

2 2  Suppose the population can be divided into separate, non-overlapping groups (“strata”) according to some criterion ◦ Select a simple random sample independently from each group  Usefulness ◦ We may want to draw inference about population parameters for each subgroup ◦ Sometimes, (“proportional stratified sample”) estimators from stratified random samples are more precise than those from simple random samples STA 291 Summer 2008 Lecture 32

3 3  The proportions of the different strata are the same in the sample as in the population  Mathematically ◦ Population size N ◦ Subpopulation N i ◦ Sample size n ◦ Subpopulation n i STA 291 Summer 2008 Lecture 33

4 4  Summarize data ◦ Condense the information from the dataset  Graphs  Table  Numbers  Interval data ◦ Histogram  Nominal/Ordinal data ◦ Bar chart ◦ Pie chart

5  Difficult to see the “big picture” from these numbers ◦ We want to try to condense the data STA 291 Summer 2008 Lecture 35 Alabama 11.6Alaska 9.0 Arizona 8.6Arkansas 10.2 California 13.1Colorado 5.8 Connecticut 6.3Delaware 5.0 D C 78.5Florida 8.9 Georgia 11.4Hawaii 3.8 ……

6 STA 291 Summer 2008 Lecture 36  A listing of intervals of possible values for a variable  Together with a tabulation of the number of observations in each interval.

7 STA 291 Summer 2008 Lecture 37 Murder RateFrequency 0-2.95 3-5.916 6-8.912 9-11.912 12-14.94 15-17.90 18-20.91 >211 Total51

8  Conditions for intervals ◦ Equal length ◦ Mutually exclusive  Any observation can only fall into one interval ◦ Collectively exhaustive  All observations fall into an interval  Rule of thumb: ◦ If you have n observations then the number of intervals should approximately STA 291 Summer 2008 Lecture 38

9 9  Relative frequency for an interval ◦ Proportion of sample observations that fall in that interval  Sometimes percentages are preferred to relative frequencies

10 STA 291 Summer 2008 Lecture 310 Murder RateFrequencyRelative Frequency Percentage 0-2.95.1010 3-5.916.3131 6-8.912.2424 9-11.912.2424 12-14.94.088 15-17.9000 18-20.91.022 >211.022 Total511100

11 STA 291 Summer 2008 Lecture 311  Notice that we had to group the observations into intervals because the variable is measured on a continuous scale ◦ For discrete data, grouping may not be necessary  Except when there are many categories  Intervals are sometimes called classes ◦ Class Cumulative Frequency  Number of observations that fall in the class and in smaller classes ◦ Class Relative Cumulative Frequency  Proportion of observations that fall in the class and in smaller classes

12 STA 291 Summer 2008 Lecture 312 Murder RateFrequencyRelative Frequency Cumulative Frequency Relative Cumulative Frequency 0-2.95.105 3-5.916.3121.41 6-8.912.2433.65 9-11.912.2445.89 12-14.94.0849.97 15-17.90049.97 18-20.91.0250.99 >211.02511 Total511 1

13 STA 291 Summer 2008 Lecture 313  Use the numbers from the frequency distribution to create a graph ◦ Draw a bar over each interval, the height of the bar represents the relative frequency for that interval ◦ Bars should be touching  Equally extend the width of the bar at the upper and lower limits so that the bars are touching.

14 STA 291 Summer 2008 Lecture 314

15 STA 291 Summer 2008 Lecture 315

16 STA 291 Summer 2008 Lecture 316  Histogram: for interval (quantitative) data  Bar graph is almost the same, but for qualitative data  Difference: ◦ The bars are usually separated to emphasize that the variable is categorical rather than quantitative ◦ For nominal variables (no natural ordering), order the bars by frequency, except possibly for a category “other” that is always last

17  First Step ◦ Create a frequency distribution STA 291 Summer 2008 Lecture 317 Highest Degree ObtainedFrequency (Number of Employees) Grade School15 High School200 Bachelor’s185 Master’s55 Doctorate70 Other25 Total550

18  Bar graph ◦ If the data is ordinal, classes are presented in the natural ordering STA 291 Summer 2008 Lecture 318

19  Pie is divided into slices ◦ Area of each slice is proportional to the frequency of each class STA 291 Summer 2008 Lecture 319

20 STA 291 Summer 2008 Lecture 320

21 21  Write the observations ordered from smallest to largest ◦ Looks like a histogram sideways ◦ Contains more information than a histogram, because every single observation can be recovered  Each observation represented by a stem and leaf  Stem = leading digit(s)  Leaf = final digit STA 291 Summer 2008 Lecture 321

22 22STA 291 Summer 2008 Lecture 322 Stem Leaf # 20 3 1 19 18 17 16 15 14 13 135 3 12 7 1 11 334469 6 10 2234 4 9 08 2 8 03469 5 7 5 1 6 034689 6 5 0238 4 4 46 2 3 0144468999 10 2 039 3 1 67 2 ----+----+----+----+

23 23  Useful for small data sets ◦ Less than 100 observations  Practical problem ◦ What if the variable is measured on a continuous scale, with measurements like 1267.298, 1987.208, 2098.089, 1199.082 etc. ◦ Use common sense when choosing “stem” and “leaf”  Can also be used to compare groups ◦ Back-to-Back Stem and Leaf Plots, using the same stems for both groups.  Murder Rate Data from U.S. and Canada  Note: it doesn’t really matter whether the smallest stem is at top or bottom of the table STA 291 Summer 2008 Lecture 323

24 24STA 291 Summer 2008 Lecture 324 PRESIDENTAGEPRESIDENTAGEPRESIDENTAGE Washington67Fillmore74Roosevelt60 Adams90Pierce64Taft72 Jefferson83Buchanan77Wilson67 Madison85Lincoln56Harding57 Monroe73Johnson66Coolidge60 Adams80Grant63Hoover90 Jackson78Hayes70Roosevelt63 Van Buren79Garfield49Truman88 Harrison68Arthur56Eisenhower78 Tyler71Cleveland71Kennedy46 Polk53Harrison67Johnson64 Taylor65McKinley58Nixon81 Reagan 93 Ford 93 StemLeaf

25 25  Discrete data ◦ Frequency distribution  Continuous data ◦ Grouped frequency distribution  Small data sets ◦ Stem and leaf plot  Interval data ◦ Histogram  Categorical data ◦ Bar chart ◦ Pie chart  Grouping intervals should be of same length, but may be dictated more by subject-matter considerations STA 291 Summer 2008 Lecture 325

26 26  Present large data sets concisely and coherently  Can replace a thousand words and still be clearly understood and comprehended  Encourage the viewer to compare two or more variables  Do not replace substance by form  Do not distort what the data reveal STA 291 Summer 2008 Lecture 326

27 27  Don’t have a scale on the axis  Have a misleading caption  Distort by using absolute values where relative/proportional values are more appropriate  Distort by stretching/shrinking the vertical or horizontal axis  Use bar charts with bars of unequal width STA 291 Summer 2008 Lecture 327

28 28  Frequency distributions and histograms exist for the population as well as for the sample  Population distribution vs. sample distribution  As the sample size increases, the sample distribution looks more and more like the population distribution STA 291 Summer 2008 Lecture 328

29 29  The population distribution for a continuous variable is usually represented by a smooth curve ◦ Like a histogram that gets finer and finer  Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate  Symmetric distributions ◦ Bell-shaped ◦ U-shaped ◦ Uniform  Not symmetric distributions: ◦ Left-skewed ◦ Right-skewed ◦ Skewed STA 291 Summer 2008 Lecture 329

30 Symmetric Right-skewed Left-skewed STA 291 Summer 2008 Lecture 330


Download ppt "Lecture 3 Dustin Lueker. 2  Suppose the population can be divided into separate, non-overlapping groups (“strata”) according to some criterion ◦ Select."

Similar presentations


Ads by Google