Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and.

Similar presentations


Presentation on theme: "Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and."— Presentation transcript:

1 Copyright (c) Bani Mallick1 Lecture 1 STAT 651

2 Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and populations Relative frequency histograms The sample mean

3 Copyright (c) Bani Mallick3 Book Sections Covered in Lecture #1 Chapter 1 Chapter 3.3, pages 46-53

4 Copyright (c) Bani Mallick4 The Web Site Go to http://stat.tamu.edu/~bmallick/651/651.html Please make sure to check the web site regularly for notes from me and the TA I apologize in advance for any typos ( )

5 Copyright (c) Bani Mallick5 Emails and all that The TA will answer detailed questions about homework Check the web site for the TA name, office, email address and office hours My email (bmallick@stat.tamu.edu) should only be used as a last resort, or to set appointments (Spring 2004 only).

6 Copyright (c) Bani Mallick6 Office Hours (Spring 2003 only) My office hours are as follows (Spring 2003) Tuesdays: 11:00-12:30, 4:00-5:00 Thursdays: 11:00-12:30 The TA will also have office hours I am not available outside the office hours

7 Copyright (c) Bani Mallick7 Printing The Lectures The lectures are set up as PowerPoint files. You can download them from the STAT651 web site You can print them 2 or 3 per page Go to “file”, then “print”. A little box will open, and in the bottom left you will see “print what”. It should simply say “slides”, but you can click to open the available “handout” options

8 Copyright (c) Bani Mallick8 Other Web Material All homework assignments All data sets

9 Copyright (c) Bani Mallick9 Who Am I? You can check out my personal web site http://stat.tamu.edu/~bmallick

10 Copyright (c) Bani Mallick10 Course Mechanics Exams (3). You are encouraged to prepare “cheat sheets”, 3 pages for each exam. No formulae memorization: this is an applied statistics class

11 Copyright (c) Bani Mallick11 Course Mechanics You may bring the book to exams, but the cheat sheets will be more useful. I will expect you to be able to interpret computer output: both mechanically and conceptually The exams are multiple choice. Exam scores are curved.

12 Copyright (c) Bani Mallick12 Course Mechanics We will use SPSS. SPSS is available throughout campus Once you learn SPSS, other packages such as SAS will be easy The TA can give you help with SPSS

13 Copyright (c) Bani Mallick13 SPSS You are entitled to get SPSS at no additional cost Go to http://cis.tamu.edu/customer- sales/sell/student.php (Ignore any statement about cost)http://cis.tamu.edu/customer- sales/sell/student.php Go to http://stat.tamu.edu/~mspeed/spss for helphttp://stat.tamu.edu/~mspeed/spss

14 Copyright (c) Bani Mallick14 Course Mechanics Course rules (dates of exams, percent they count, policy on late homework, policy on missed exams) are available at the class web site. Please print them out and please read them.

15 Copyright (c) Bani Mallick15 NHANES National Health and Nutrition Examination Survey The major survey whereby the federal government monitors nutrition and health in the U.S. I will focus on women aged 30-50 First some important definitions

16 Copyright (c) Bani Mallick16 NHANES Population: The entire collection of individuals of interest In NHANES, the population is all women in the U.S. aged 30-50 Since there are millions of such women, it is impractical to figure out the health and nutrition for all of them: it would cost billions of dollars to do so

17 Copyright (c) Bani Mallick17 NHANES Sample: A subset of the population that is measured in lieu of measuring everyone in the population Since we want the sample to represent the population, the goal is to make sure we sample a representative subset of the population In NHANES, women were sampled at random from the population, the randomness meant to ensure that the sample is representative.

18 Copyright (c) Bani Mallick18 Samples and Populations Warning: I will make a big deal about the difference between samples and populations You will be asked multiple questions on every exam about this distinction. They will be phrased in various ways. This is the conceptually hardest part of this course The sample is not the population: learn this!

19 Copyright (c) Bani Mallick19 Variables What we measure: variables are things that we measure in a sample and population They can be numerical: your height They can be binary: your gender The can be categorical: preference in soft drinks (Pepsi, Coke, Dr. Pepper, None, Other)

20 Copyright (c) Bani Mallick20 Random Variation Different samples lead to different outcomes: This is a hard conceptual point First we will do an experiment, then discuss the implications

21 Copyright (c) Bani Mallick21 Random Variation Different samples lead to different outcomes: consider heights of males in this class Sample #1: males whose SSN’s end in 1,2,3 or 4 Sample #2: males whose SSN’s end in 6,7,8 or 9 Note how the numbers will not be identical

22 Copyright (c) Bani Mallick22 Random Variation Different samples lead to different outcomes: samples do not equal populations One of the main goals of statistics: ascertain how far a sample result is from the population result For example, how far is the sample mean height of 10 males from the population mean height? This will require probability statements

23 Copyright (c) Bani Mallick23 A Warning! Fancy statistical methods cannot rescue garbage data Fancy statistical methods can help you gain insight into your data, over and above what seems obvious on its face You should always worry about whether the sampled results are representative of the population, and whether your sample allows you to make inferences about the population.

24 Copyright (c) Bani Mallick24 Histograms A graphical means of looking at a sample from a population. Can be used to compare two populations. Allows you to judge central tendency, variation, and other odd features of the data A very useful graphical tool

25 Copyright (c) Bani Mallick25 Relative Frequency Histograms Simplest graphical technique to describe a sample. Divide range of variable into intervals of nearly equal length. Plot the % of the data which falls in each interval. Computers have various ways of choosing the intervals. You’ll not do these by hand, ever, with me

26 Copyright (c) Bani Mallick26 Relative Frequency Histograms Numerical Example: ages 26,29,30,34,37,38,39,41,43,45 Interval (selected arbitrarily by me): 26-30 31-35 36-40 41-45 46-50 Count # in each interval: 3 1 3 3 0 Compute % in each interval (relative frequencies): 30 10 30 30 0

27 Copyright (c) Bani Mallick27 NUMERICAL EXAMPLE Intervals 26-30 31-35 36-40 41-45 46-50 % in interval 30 10 30 30 0

28 Copyright (c) Bani Mallick28 NHANES Two subpopulations (yes, populations can have subpopulations) Subpopulation #1: All women in U.S. aged 30-50 and healthy in 1980 who developed breast cancer by 1995 Subpopulation #2: All women in U.S. aged 30-50 and healthy in 1980 who did not develop breast cancer by 1995

29 Copyright (c) Bani Mallick29 NHANES Two samples Sample #1: 59 women in U.S. aged 30-50 and healthy in 1980 who developed breast cancer by 1995 Sample #2: 60 women in U.S. aged 30-50 and healthy in 1980 who did not develop breast cancer by 1995

30 Copyright (c) Bani Mallick30 NHANES One Variable Measured on each (sub)population Saturated Fat intake in Diet: This was measured by a 24-hour recall: they asked each women once what they had eaten the previous day, and computed saturated fat This is a terrible measure of saturated fat intake (garbage data?), but all that is available I would have done multiple days, at least

31 Copyright (c) Bani Mallick31 NHANES: What do we Expect? Saturated Fat intake in Diet: One would expect that the women who developed breast cancer tended to have higher levels of saturated fat in their diet. What do the relative frequency histograms say?

32 Copyright (c) Bani Mallick32 NHANES Saturated Fat Relative Frequency Histograms (the scales are the same). What do you see? 0% 10% 20% 30 % Percent Cancer Healthy 2550 75 100 0% 10% 20% 30% Percent

33 Copyright (c) Bani Mallick33 NHANES log(Saturated Fat) Relative Frequency Histograms (the scales are the same). What do you see? 0% 5% 10% 15% Percent Cancer Healthy 2.003.00 4.0 Log(Saturated Fat) 0% 5% 10% 15% Percent

34 Copyright (c) Bani Mallick34 Construction in SPSS I will now show you a few things about SPSS

35 Copyright (c) Bani Mallick35 Construction in SPSS Select graphs in SPSS menu Select interactive Select Histogram Select percent instead of count for a relative frequency histogram Place variable of interest on X-axis

36 Copyright (c) Bani Mallick36 Construction in SPSS Select variable defining the populations and put it in “Panel Variables” The histograms will be side-by-side. I like them one on top of the other Double click on graph (may need to do this twice) A menu will pop up, go to “Arrangement”

37 Copyright (c) Bani Mallick37 Construction in SPSS Select “Down then Across” Then take over to PowerPoint (copy and paste) Click on the histogram in your PowerPoint presentation, and convert it to a Microsoft picture Change sizes, and edit as you wish

38 Copyright (c) Bani Mallick38 What Histograms Say Because each box is a relative frequency (percentage), you can use a histogram to learn a few things about the population You can also use them to compare two populations Whether one population has generally larger values Whether one population is more closely clumped

39 Copyright (c) Bani Mallick39 What percentage of the healthy women ate less than 25 grams of saturated fat? 0% 10% 20% 30 % Percent Cancer Healthy 2550 75 100 0% 10% 20% 30% Percent

40 Copyright (c) Bani Mallick40 What percentage of the healthy women ate less than 25 grams of saturated fat? Look at the 3 bars, of about 18%, 20% and 28%, for a total of about 66% 0% 10% 20% 30 % Percent Cancer Healthy 2550 75 100 0% 10% 20% 30% Percent

41 Copyright (c) Bani Mallick41 Histograms and Shifts: note how bottom plot has higher values Sample from population A Sample from population B

42 Copyright (c) Bani Mallick42 Histograms and Variability: note how top plot has more concentrated values Sample from population A Sample from population B

43 Copyright (c) Bani Mallick43 The Population Mean In many problems, the goal is to make inference about the population mean of a numerical variable, e.g., saturated fat intake Define in words what you mean by the population mean!

44 Copyright (c) Bani Mallick44 The Population Mean In many problems, the goal is to make inference about the population mean of a numerical variable, e.g., saturated fat intake You’re right! The population mean is the average of all the outcomes in the population It cannot be measured, hence we take samples. BTW, what’s an average?

45 Copyright (c) Bani Mallick45 The Sample Mean Formal definition: If the sample is of size n and the data are X 1,…, X n, then the sample mean is This is the sum over all the observed values, divided by the number of observations

46 Copyright (c) Bani Mallick46 Sample Mean: Example = the sum over all the observed values, divided by the number of observations Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5, n = sum = =

47 Copyright (c) Bani Mallick47 Sample Mean: Example = the sum over all the observed values, divided by the number of observations Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5, n = 12 sum = =

48 Copyright (c) Bani Mallick48 Sample Mean: Example = the sum over all the observed values, divided by the number of observations Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5, n = 12 sum = 3 =

49 Copyright (c) Bani Mallick49 Sample Mean: Example = the sum over all the observed values, divided by the number of observations Data: -4, –2, –2, –1, 0, 0, 0, 0, 2, 2, 3, 5, n = 12 sum = 3 = 3/12 = 0.25


Download ppt "Copyright (c) Bani Mallick1 Lecture 1 STAT 651. Copyright (c) Bani Mallick2 Topics in Lecture #1 Welcome and basic mechanics of the course Samples and."

Similar presentations


Ads by Google