Presentation is loading. Please wait.

Presentation is loading. Please wait.

0AP03: Methods and models in behavioral research Part 2: Understanding statistics using SPSS (Field) Chris Snijders

Similar presentations


Presentation on theme: "0AP03: Methods and models in behavioral research Part 2: Understanding statistics using SPSS (Field) Chris Snijders"— Presentation transcript:

1 0AP03: Methods and models in behavioral research Part 2: Understanding statistics using SPSS (Field) Chris Snijders c.c.p.snijders@gmail.com www.tue-tm.org/moodle

2 2 EXAMPLE: NETFLIX DVD RENTAL

3 3 Example: The Netflix Prize www.netflixprize.com $1,000,000

4 4 Example: the Netflix prize (3) input = kind of previous rentals input = number of previous rentals input = day of the week input =... output = extent to which you like a movie input output

5 5 Example: The Netflix Prize (2) Predict the extent to which a person will like a movie, from previous ratings by others. NB –Measurement – Root Mean Square Error –Large prizes! –You have about 2 Gb of data to work on...

6 6 0AP03: two parts Blumberg et al.Gerrit Rooks Blocks A and B FieldChris Snijders Blocks B and C {LANGUAGE=ENGLISH}

7 7 Understanding statistics using SPSS http://www.sagepub.co.uk/field/field.htm -CD rom material -- data sets -- some software (g*power) -answers to (some) assignments in the book -test banks (note: not identical to exam)

8 8 www.tue-tm.org/moodle enrolment key = "fieldspss"

9 9 http://www.tue-tm.org/moodle Course home-page

10 10 Let’s get acquainted … Technische InnovatieWetenschappen –Bachelor’s –Pre-Master program Technische Bedrijfskunde –Bachelor’s –Pre-Master program = = = Some key concepts: -Stochastic variables, distributions, normal distribution -SPSS usage (StatGraphics users?) -Mean, median, skewness, kurtosis -Correlation -Simple regression: Y = a + b X -Factor analysis -A chi2 test A: never heard of it B: was a topic in previous lectures, but don’t ask me what it is or how to do it C: was covered and understood

11 11 Understanding statistics using SPSS About: Style About: Content 1 About statistics8 ANOVA 2 SPSS9-12, 14 More ANOVAs 3 Exploring data13 Non-param. tests 4 Correlation15 Factor analysis 5 Multiple regression16 Chi2-tests etc 6 Logistic Regression 7 The t-test

12 12 T-test, chi2-test We have two groups of students, one group that started early and worked regularly, one group that started late (in the last three lectures or later) Are the grades of the students in the regular group higher? (t-test) AverageMax REGULAR6.3 8.4 LATE3.03.9 Are the regular students more likely to pass the course? (chi-2 test) PassNo REGULAR 3010 LATE 238

13 13 Exam for the Field-part [tentative: check the course website later] Chapters 1, 2, 3, 4, 7: assumed to be common knowledge Chapters 5, 8, 15, + additional material supplied with the course (such as PS – software) = = = Exam on laptop: 1 – multiple choice questions 2 – you are given data and must be able to handle the data sensibly

14 14 The average (quantitative) paper … Problem formulation –What are sensible questions? Theory-development and hypotheses –“What do I expect to be the answer to my question, and what are the implications from the theory that I want to test” (nb: different in exploratory work) Choice of research design –Experiment –Survey –Case study –Participant observation –… Data collection –Designing questionnaires –Designing experimental procedures –Finding your respondents. Sampling (how and how many?) –… Analysis of results –Measurement: from raw data to measured constructs –Relational claims: X  Y ? Conclusions –What can we conclude, given our analyses?

15 15 About the course setup Mainly on moodle-site, studyweb only used to send mail to you “Do-it-yourself course”: mastering SPSS, getting up to speed with SPSS, keeping up with the material is up to you –Extra material and links on the website –Practice material for the exam If you do not practice in between, you will not be able to pass the exam. Part 1-Rooks: “Think, then do” Part 2-me: “Do, then think” We have data, now what do we do? (and partly we collect these data from you) Hybrid setup: –English/Dutch –business administration / social sciences

16 16 THE ART OF SAMPLING

17 17 Sampling We want conclusions about the population, but we only have (enough time and money to collect) data from part of the population, a sample. From sample data to population statement: STATISTICAL INFERENCE sample population

18 18 Two parts to every analysis Calculate some property of the sample –Mean (mean length of soccer players) –Difference between mean of two groups (difference in length of soccer-players) –Correlation between two things measured (correlation between length and number of goals you score) Calculate a confidence interval around the property, creating a statement about the property in the sample sample population

19 19 On sampling "analog cheese" Analog cheese = palm oil + starch (zetmeel) "Keuringsdienst van waarde" took a sample of 11 products and found 5 to contain "analog cheese"  Estimate of the percentage of products containing analog cheese = 5/11 = 45% What is the (approximate) confidence interval? A40 – 50 % B32 – 58 % C25 – 65 % D17 – 77 % E 9 – 81 %

20 20 Applying the 1/sqrt(n) rule You want to predict how many seats in congres a certain Dutch political party will get. You allow for a range of plus or minus 2 seats. Say you expect the number of seats to be around 50. You intend to call a representative sample of people. About how many do you need? A50 B100 C500 D5,000 E50,000 Fmore than 50,000

21 21 Some more sampling Suppose you want to know, say, the percentage of people in The Netherlands who support the recent foreign policy of the US-government. The Netherlands has 12,000,000 voters. According to your (correct) calculations you need a sample of 2,000 people. Now you want to do the same, but in France (population = 36,000,000 voters). How large should your sample size be in France? Aless than 2,000 Babout 2,000 Cabout 6,000 Dmore than 6,000 Eyou need more information Rule of thumb: For large populations, the required sample size is independent of the population size

22 22 Explanation: Mean and variance of the mean We measure x and get measurements x 1, …, x n Expectation and variance give the 95%- confidence-interval:

23 23 Sample size determined by: Are white soccer players smaller? How precise do you want to measure your statistic? [what is the height difference you would find interesting enough to report about] What is the probability of Type I error that you will allow? (rejecting the H0-hypothesis when in fact it is true) Usually 5% [How small do you want the probability to be that you reject “(on average) black and non-black players are equally tall” when in fact it is true?] How likely do you want it to be that you will find an effect, assuming that it exists in the population? Power, usually 80% or 90%. Onesided or twosided tests? You need special purpose software for this, for instance G*Power (on the disc), or PS

24 24 X  Y

25 25 All the same, but different Problem formulation –What are sensible questions? Theory-development and hypotheses –“What do I expect to be the answer to my question, and what are the implications from the theory that I want to test” (nb: different in exploratory work) Choice of research design –Experiment –Survey –Case study –Participating research –… Data collection –Designing questionnaires –Designing experimental procedures –Finding your respondents. Sampling (how and how many?) –… Analysis of results –Measurement: from raw data to measured constructs –Relational claims: X  Y ? Conclusions –What can we conclude? X 1  Y 1 X 2  Y 2 … X 1  Y 1 X 2  Y 2 HOW? X 1  Y 1 X 2  Y 2 AND?

26 26 About $80 / hour

27 27 It is all about X  Y : X  Y “white soccer player”  “length” “being a woman”  “sensitive to alcohol” “being bald”  “ prob. of a heart-attack ” “left handed”  “die early” “listen to Mozart”  “score higher on IQ-test” Y =dependent variable response variable target variable Y-variable explanandum X = independent variable X-variable predictor variable explanans Usually we want to say something like “X causes Y”, but often we have to settle for “X is related to Y”.

28 28 Survey vs experiment (Milgram) Y = which voltage do you apply? measured X's: –subject is male –subject is young manipulated X's: –experimentor wears white coat –experimentor is older (vs young) Experiment:researcher determines X Survey:researcher measures X

29 29 X  Y

30 30 Kinds of variables (in case you forgot) Categorical / Nominal Two or more categories, without intrinsic ordering (ex.: “kind of movie”: action/drama/...) When only two categories, also called a binary variable (ex.: gender, “age over 40”, etc) Ordinal Two or more categories, with intrinsic ordering (ex.: 5-point ratings such as never/sometimes/often/always, …) Interval Ordinal + intervals between values are evenly spaced (age, income, number of movies rented). NB Not always easy to classify. Categorical and interval are the most important (often ordinal are treated as either categorical or interval).

31 31 Statistics at UCLA {http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm } Y X

32 32 Dealing with data 1.Import SPSS file 2.Check your data To get acquainted with it For outliers and coding errors 3.Determine the kind of analysis 4.Recode your data so that you have the variables in the appropriate format 5.Check the assumptions for the analysis of choice (1) 6.Run your analysis 7.Check the assumptions for the analysis of choice (2) 8.If necessary, back to 3. until CONCLUSION

33 33 Fact and fiction Are white soccer players smaller?

34 34 Example data: soccer players File: soccer_0AP03.sav. All players from WC2002. Let’s see what the data looks like: Variable view vs Data view Run a “Frequencies” Check histograms Create new variables (Transform > Compute) Recode variables (Transform > Recode) Run analyses USE SYNTAX FILES (*.SPS)!

35 35 Weekly not-on-the-exam fact Suppose: You have a handful of numerical inputs and want to use these to predict some output. For instance: chance of survival of a firm based on firm characteristics, probability of job success based on credentials, probability of surgery survival based on medical records, … We compare experts in the field with computer models (both have the same amount of data). Out of 160 studies of this kind, how often do the experts perform significantly better? (sources: see “Super Crunchers” by Ayres) input output

36 36 To Do Get familiar with SPSS: reading data, recoding variables, and running a t-test or a correlation. Especially recoding variables and the syntax window are important. You should be able to do the assignments on the web page fairly quickly. Check chapters 1 through 4 (up to 4.5.4) of the Field-book for anything that looks unfamiliar to you. Don’t wait until the last couple of weeks! Add to the WIKIs


Download ppt "0AP03: Methods and models in behavioral research Part 2: Understanding statistics using SPSS (Field) Chris Snijders"

Similar presentations


Ads by Google