Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross-sectional LCA Patterns of first response to cigarettes.

Similar presentations


Presentation on theme: "Cross-sectional LCA Patterns of first response to cigarettes."— Presentation transcript:

1 Cross-sectional LCA Patterns of first response to cigarettes

2 First smoking experience Have you ever tried a cigarette (including roll-ups), even a puff? How old were you when you first tried a cigarette? When you FIRST ever tried a cigarette can you remember how it made you feel? (tick as many as you want) –It made me cough –I felt ill –It tasted awful –I liked it –It made me feel dizzy

3 Aim To categorise the subjects based on their pattern of responses To assess the relationship between first-response and current smoking behaviour To try not to think too much about the possibility of recall bias

4 Step 1 Look at your data!!!

5 Examine your data structure LCA converts a large number of response patterns into a small number of ‘homogeneous’ groups If the responses in your data are fair mutually exclusive then there’s no point doing LCA Don’t just dive in

6 How many items endorsed? numresp | Freq. Percent Cum. ------------+----------------------------------- 0 | 69 2.75 2.75 1 | 1,597 63.70 66.45 2 | 569 22.70 89.15 3 | 202 8.06 97.21 4 | 68 2.71 99.92 5 | 2 0.08 100.00 ------------+----------------------------------- Total | 2,507 100.00

7 Frequency of each item (n ~ 2500)

8 Examine pattern frequency +---------------------------------------+ | cough ill taste liked dizzy num | |---------------------------------------| 1. | 0 0 1 0 0 468 | 2. | 0 0 0 1 0 452 | 3. | 1 0 0 0 0 449 | 4. | 1 0 1 0 0 279 | 5. | 0 0 0 0 1 194 | |---------------------------------------| 6. | 1 1 1 0 0 94 | 7. | 1 0 0 1 0 87 | 8. | 1 0 0 0 1 76 | 9. | 0 0 0 0 0 69 | 10. | 1 1 1 0 1 59 | |---------------------------------------| 11. | 0 0 0 1 1 56 | 12. | 1 0 1 0 1 47 | 13. | 1 0 0 1 1 35 | 14. | 0 1 0 0 0 34 | 15. | 0 0 1 0 1 27 | |---------------------------------------| +---------------------------------------+ | cough ill taste liked dizzy num | |---------------------------------------| 16. | 0 1 1 0 0 17 | 17. | 0 0 1 1 0 13 | 18. | 1 1 0 0 1 9 | 19. | 1 1 0 0 0 8 | 20. | 0 1 1 0 1 7 | |---------------------------------------| 21. | 1 0 1 1 1 7 | 22. | 1 0 1 1 0 6 | 23. | 0 1 0 0 1 5 | 24. | 1 1 1 1 1 2 | 25. | 0 1 0 1 1 2 | |---------------------------------------| 26. | 0 1 0 1 0 1 | 27. | 1 1 1 1 0 1 | 28. | 1 1 0 1 1 1 | 29. | 0 0 1 1 1 1 | 30. | 1 1 0 1 0 1 | +---------------------------------------+

9 Examine correlation structure Polychoric correlation matrix coughilltastelikeddizzy cough1 ill0.3711 taste0.0490.4681 liked-0.510-0.542-0.7861 dizzy-0.0300.246-0.241-0.1581

10 Step 2 Now you can fit a latent class model

11 Latent Class models Work with observations at the pattern level rather than the individual (person) level +---------------------------------------+ | cough ill taste liked dizzy num | |---------------------------------------| 1. | 0 0 1 0 0 468 | 2. | 0 0 0 1 0 452 | 3. | 1 0 0 0 0 449 | 4. | 1 0 1 0 0 279 | 5. | 0 0 0 0 1 194 | |---------------------------------------|

12 Latent Class models For a given number of latent classes, using application of Bayes’ rule plus an assumption of conditional independence one can calculate the probability that each pattern should fall into each class Derive the likelihood of the obtained data under each model (i.e. assuming different numbers of classes) and use this plus other fit statistics to determine optimal model i.e. optimal number of classes

13 Latent Class models Bayes’ rule: Conditional independence: P( pattern = ’01’ | class = i) = P(pat(1) = ‘0’ | class = i)*P(pat(2) = ‘1’ | class = i)

14 How many classes can I have? ~ degrees of freedom 32 possible patterns Each additional class requires –5 df to estimate the 5 prevalence of each item that class (i.e. 5 thresholds) –1 df for an additional cut of the latent variable defining the class distribution Hence a 5-class model uses up 5*5 + 4 = 29 degrees of freedom leaving up to 3df to test the model

15 Standard thresholds Mplus thinks of binary variables as being a dichotomised continuous latent variable The point at which a continuous N(0,1) variable must be cut to create a binary variable is called a threshold A binary variable with 50% cases corresponds to a threshold of zero A binary variable with 2.5% cases corresponds to a threshold of 1.96

16 Standard thresholds Figure from Uebersax webpage

17 Data: File is “..\smoking_experience.dta.dat"; listwise is on; Variable: Names are sex cough ill taste liked dizzy numresp less_12 less_13; categorical are cough ill taste liked dizzy ; usevariables are cough ill taste liked dizzy; Missing are all (-9999) ; classes = c(3); Analysis: proc = 2 (starts); type = mixture; starts = 1000 500; stiterations = 20; Output: tech10;

18 What you’re actually doing model: %OVERALL% [c#1 c#2]; %c#1% [cough$1]; [ill$1]; [taste$1]; [liked$1]; [dizzy$1]; + five more threshold parameters for %c#2% and %c#3% Defines the latent class variable Defines the within class thresholds i.e. the prevalence of the endorsement of each item

19 SUMMARY OF CATEGORICAL DATA PROPORTIONS COUGH Category 1 0.537 Category 2 0.463 ILL Category 1 0.904 Category 2 0.096 TASTE Category 1 0.590 Category 2 0.410 LIKED Category 1 0.735 Category 2 0.265 DIZZY Category 1 0.789 Category 2 0.211

20 RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES Final stage loglikelihood values at local maxima, seeds, and initial stage start numbers: -6343.937 685561 9973 -6343.937 172907 9395 -6343.937 497824 9464 -6343.937 770684 7725 -6343.937 584663 5193 -6343.937 872295 2899 -6343.937 116150 3570 -6343.937 271339 4768 -6343.937 472383 9650 -6343.937 707126 3683 Etc.

21 How many random starts? Depends on –Sample size –Complexity of model Number of manifest variables Number of classes Aim to find consistently the model with the lowest likelihood, within each run

22 Success Not there yet Loglikelihood values at local maxima, seeds, and initial stage start numbers: -10148.718 987174 1689 -10148.718 777300 2522 -10148.718 406118 3827 -10148.718 51296 3485 -10148.718 997836 1208 -10148.718 119680 4434 -10148.718 338892 1432 -10148.718 765744 4617 -10148.718 636396 168 -10148.718 189568 3651 -10148.718 469158 1145 -10148.718 90078 4008 -10148.718 373592 4396 -10148.718 73484 4058 -10148.718 154192 3972 -10148.718 203018 3813 -10148.718 785278 1603 -10148.718 235356 2878 -10148.718 681680 3557 -10148.718 92764 2064 Loglikelihood values at local maxima, seeds, and initial stage start numbers -10153.627 23688 4596 -10153.678 150818 1050 -10154.388 584226 4481 -10155.122 735928 916 -10155.373 309852 2802 -10155.437 925994 1386 -10155.482 370560 3292 -10155.482 662718 460 -10155.630 320864 2078 -10155.833 873488 2965 -10156.017 212934 568 -10156.231 98352 3636 -10156.339 12814 4104 -10156.497 557806 4321 -10156.644 134830 780 -10156.741 80226 3041 -10156.793 276392 2927 -10156.819 304762 4712 -10156.950 468300 4176 -10157.011 83306 2432

23 Scary “warnings” IN THE OPTIMIZATION, ONE OR MORE LOGIT THRESHOLDS APPROACHED AND WERE SET AT THE EXTREME VALUES. EXTREME VALUES ARE -15.000 AND 15.000. THE FOLLOWING THRESHOLDS WERE SET AT THESE VALUES: * THRESHOLD 1 OF CLASS INDICATOR TASTE FOR CLASS 3 AT ITERATION 11 * THRESHOLD 1 OF CLASS INDICATOR DIZZY FOR CLASS 3 AT ITERATION 12 * THRESHOLD 1 OF CLASS INDICATOR ILL FOR CLASS 3 AT ITERATION 16 * THRESHOLD 1 OF CLASS INDICATOR LIKED FOR CLASS 1 AT ITERATION 34 * THRESHOLD 1 OF CLASS INDICATOR TASTE FOR CLASS 1 AT ITERATION 93 WARNING: WHEN ESTIMATING A MODEL WITH MORE THAN TWO CLASSES, IT MAY BE NECESSARY TO INCREASE THE NUMBER OF RANDOM STARTS USING THE STARTS OPTION TO AVOID LOCAL MAXIMA.

24 THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Loglikelihood H0 Value -6343.937 H0 Scaling Correction Factor 1.006 for MLR Information Criteria Number of Free Parameters 17 Akaike (AIC) 12721.873 Bayesian (BIC) 12820.930 Sample-Size Adjusted BIC 12766.916 (n* = (n + 2) / 24)

25 Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes Pearson Chi-Square Value 623.040 Degrees of Freedom 14 P-Value 0.0000 Likelihood Ratio Chi-Square Value 563.869 Degrees of Freedom 14 P-Value 0.0000

26 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1 600.41143 0.23949 2 1517.83320 0.60544 3 388.75538 0.15507 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Latent Classes 1 630 0.25130 2 1396 0.55684 3 481 0.19186

27 Entropy (fuzzyness) CLASSIFICATION QUALITY Entropy 0.832 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 3 1 0.952 0.048 0.000 2 0.000 0.979 0.021 3 0.000 0.252 0.748

28 Model results Two-Tailed Estimate S.E. Est./S.E. P-Value Latent Class 1 Thresholds COUGH$1 1.604 0.133 12.103 0.000 ILL$1 7.371 4.945 1.490 0.136 TASTE$1 15.000 0.000 999.000 999.000 LIKED$1 -15.000 0.000 999.000 999.000 DIZZY$1 1.890 0.139 13.604 0.000

29 Categorical Latent Variables Two-Tailed Estimate S.E. Est./S.E. P-Value Means C#1 0.435 0.124 3.500 0.000 C#2 1.362 0.135 10.058 0.000

30 RESULTS IN PROBABILITY SCALE Latent Class 1 COUGH Category 1 0.833 0.018 45.072 0.000 Category 2 0.167 0.018 9.059 0.000 ILL Category 1 0.999 0.003 321.448 0.000 Category 2 0.001 0.003 0.202 0.840 TASTE Category 1 1.000 0.000 0.000 1.000 Category 2 0.000 0.000 0.000 1.000 LIKED Category 1 0.000 0.000 0.000 1.000 Category 2 1.000 0.000 0.000 1.000 DIZZY Category 1 0.869 0.016 54.848 0.000 Category 2 0.131 0.016 8.284 0.000

31 Class 1 from 3-class model

32 Conditional independence The latent class variable accounts for the covariance structure in your dataset Conditional on C, any pair of manifest variables should be uncorrelated Harder to achieve for a cross-sectional LCA With a longitudinal LCA there tends to be a more ordered pattern of correlations based on proximity in time

33 Tech10 – response patterns MODEL FIT INFORMATION FOR THE LATENT CLASS INDICATOR MODEL PART RESPONSE PATTERNS No. Pattern No. Pattern 1 10000 2 00100 3 00010 4 11100 5 11101 6 00001 7 10101 8 10010 9 10100 10 00101 11 10001 12 00000 13 00011 14 01101 15 10011 16 00110 17 11000 18 10111 19 11011 20 01100 21 10110 22 01000 23 01001 24 11111 25 01010 26 11001 27 01011 28 11010 29 00111 30 11110

34 Tech10 – Bivariate model fit 5 manifest variables → number of pairs = Overall Bivariate Pearson Chi-Square 215.353 Overall Bivariate Log-Likelihood Chi-Square 214.695 Compare with χ² (10 df) = 18.307

35 Tech10 – Bivariate model fit Not bad:- Estimated Probabilities Standardized Variable Variable H1 H0 Residual (z-score) COUGH ILL Category 1 Category 1 0.511 0.506 0.457 Category 1 Category 2 0.026 0.031 -1.321 Category 2 Category 1 0.393 0.398 -0.467 Category 2 Category 2 0.070 0.065 0.925 Bivariate Pearson Chi-Square 2.726 Bivariate Log-Likelihood Chi-Square 2.798

36 Tech10 – Bivariate model fit Terrible:- Estimated Probabilities Standardized Variable Variable H1 H0 Residual (z-score) COUGH ILL Category 1 Category 1 0.566 0.534 3.149 Category 1 Category 2 0.338 0.370 -3.255 Category 2 Category 1 0.024 0.056 -6.850 Category 2 Category 2 0.072 0.040 7.977 Bivariate Pearson Chi-Square 116.657 Bivariate Log-Likelihood Chi-Square 117.162

37 Conditional Independence violated Need more classes

38 Obtain the ‘optimal’ model Assess the following for models with increasing classes aBIC Entropy BLRT (Bootstrap LRT) Conditional Independence (Tech10) Ease of interpretation Consistency with previous work / theory

39 Model fit stats 1 class2 class3 class4 class5 class Estimated params 511172329 H0 Likelihood -6962.1-6458.7-6343.9-6200.1-6100.8 aBIC 13947.412968.5 12766.912507.112336.5 Entropy -0.9440.8320.8940.844 Tech 10 625.2228.1214.7135.917.6 BLRT statistic -1006.8229.5287.8198.4 BLRT p-value -< 0.0001

40 5-class model aBIC values are still decreasing Tech 10 is still quite high – residual correlations between ill and both liked and dizzy BLRT rejects 4-class model Not enough df to fit 6-class model so we cannot assess fit of 5-class Seems unlikely as BLRT values are decreasing slowly

41 Cross-sectional LCA Patterns of first response to cigarette Attempt 2

42 What to do? We need more degrees of freedom There were only 5 questions on response to smoking Add something else: –How old were you when you first tried a cigarette? –Split into pre-teen / teen 6 binary variables means 64 d.f. to play with

43 Model fit stats – attempt 2 3 class4 class5 class6 class7 class Estimated params 2027344148 H0 Likelihood -7866.3-7720.2-7616.0-7582.4-7576.2 aBIC 15825.615565.715389.915355.115375.2 Entropy 0.8230.8930.8120.8760.850 Tech 10 228.9144.616.81.20.29 BLRT statistic 123.3146.1104.267.312.4 BLRT p-value < 0.0001 0.2100

44 Model fit stats – attempt 2 3 class4 class5 class6 class7 class Estimated params 2027344148 H0 Likelihood -7866.3-7720.2-7616.0-7582.4-7576.2 aBIC 15825.615565.715389.915355.115375.2 Entropy 0.8230.8930.8120.8760.850 Tech 10 228.9144.616.81.20.29 BLRT statistic 123.3146.1104.267.312.4 BLRT p-value < 0.0001 0.2100

45 6-class model results CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent classes 1 53.23894 2.1% 2 541.96140 21.7% 3 396.04196 15.9% 4 454.89294 18.2% 5 750.87470 30.1% 6 295.99007 11.9% CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Latent classes 1 34 1.4% 2 540 21.7% 3 403 16.2% 4 447 17.9% 5 840 33.7% 6 229 9.2%

46 Examine entropy in more detail Model-level entropy = 0.876 Class level entropy: 1 2 3 4 5 6 1 0.953 0.000 0.000 0.000 0.026 0.020 2 0.000 0.997 0.000 0.000 0.002 0.001 3 0.000 0.000 0.958 0.000 0.017 0.025 4 0.000 0.000 0.000 0.949 0.041 0.011 5 0.025 0.005 0.000 0.036 0.851 0.083 6 0.000 0.000 0.043 0.003 0.036 0.918

47 Pattern level entropy Save out the model-based probabilities Open in another stats package Collapse over response patterns

48 Save out the model-based probabilities savedata: file is "6-class-results.dat"; save cprobabilities;

49 Varnames shown at end of output SAVEDATA INFORMATION Order and format of variables COUGH F10.3 ILL F10.3 TASTE F10.3 LIKED F10.3 DIZZY F10.3 LESS_13 F10.3 ALN F10.3 QLET F10.3 SEX F10.3 CPROB1 F10.3 CPROB2 F10.3 CPROB3 F10.3 CPROB4 F10.3 CPROB5 F10.3 CPROB6 F10.3 C F10.3

50 Open / process in Stata Remove excess spaces from data file, then: insheet using 6-class-results.dat, delim(" ") local i = 1 local varnames "COUGH ILL TASTE LIKED DIZZY LESS_13 ALN QLET SEX CPROB1 CPROB2 CPROB3 CPROB4 CPROB5 CPROB6 C" foreach x of local varnames { rename v`i' `x' local i=`i'+1 } gen num = 1 collapse (mean) CPROB* C (count) num, by(COUGH ILL TASTE LIKED DIZZY LESS_13)

51 Check the assignment probabilities for each class coughilltastelikeddizzy< 13P_c1P_c2P_c3P_c4P_c5P_c6 Mod classn 11100000000.0520.948664 111010000.00300.0010.996634 11100100000.0270.973630 101010000.13500.0620.803629 111011000.00300.0010.996625 101011000.15400.0320.815618 1100000000.0710.0540.87466 011011000.07300.0120.91564 110010000.30300.0010.69664 110011000.329000.67164 01100100000.4110.58963 011010000.06500.0240.91263 1100010000.0550.0290.91762 11110100.001000.0230.97761 111110000.03900.0010.9661 111111000.044000.95561

52 coughilltastelikeddizzy< 13P_c1P_c2P_c3P_c4P_c5P_c6 Mod classn 11100000000.0520.948664 111010000.00300.0010.996634 11100100000.0270.973630 101010000.13500.0620.803629 111011000.00300.0010.996625 101011000.15400.0320.815618 1100000000.0710.0540.87466 011011000.07300.0120.91564 110010000.30300.0010.69664 110011000.329000.67164 01100100000.4110.58963 011010000.06500.0240.91263 1100010000.0550.0290.91762 11110100.001000.0230.97761 111110000.03900.0010.9661 111111000.044000.95561 Check the assignment probabilities for each class

53 coughilltastelikeddizzy< 13P_c1P_c2P_c3P_c4P_c5P_c6 Mod classn 11100000000.0520.948664 111010000.00300.0010.996634 11100100000.0270.973630 101010000.13500.0620.803629 111011000.00300.0010.996625 101011000.15400.0320.815618 1100000000.0710.0540.87466 011011000.07300.0120.91564 110010000.30300.0010.69664 110011000.329000.67164 01100100000.4110.58963 011010000.06500.0240.91263 1100010000.0550.0290.91762 11110100.001000.0230.97761 111110000.03900.0010.9661 111111000.044000.95561 Check the assignment probabilities for each class

54 Bad taste (30.1%)

55 Positive experience (21.7%)

56 Coughed (18.2%)

57 Dizziness (15.9%)

58 V negative experience (11.9%)

59 Felt ill (2.1%)

60 Well that was a complete waste of time! You might think that those resulting classes could have been derived just looking at the response patterns and making some arbitrary decisions e.g. –Group all of those who had >1 negative experience –Keep separate each group who had 1 experience You would have ended up with a bunch of weird patterns with no clue of what to do with them Strange patterns likely to be measurement error? LCA incorporates ALL patterns and deals with uncertainty through the posterior probabilities

61 Conclusions / warning Like EFA, LCA is an exploratory tool with the aim of summarising the variability in the dataset in a simple/interpretable way These results do not prove that there are 6 groups of young people in real life. LCA will find groupings in the data even if there is no reason to think such groups might exist. It’s just mathematics and it knows no better

62 Remember, we are dealing with probabilities Model-based “Modal assignment” Ill 53.24 2.1% 34 1.4% Positive 541.96 21.7% 540 21.7% Dizzy 396.04 15.9% 403 16.2% Coughed 454.89 18.2% 447 17.9% Bad taste 750.87 30.1% 840 33.7% V negative 295.99 11.9% 229 9.2% Working with modal assignment is easy –chuck each pattern into it’s most likely class and pretend everything is OK –Equivalent to doing a single imputation for missing data – shudder! Unless entropy is V high, stick with the probabilities

63 Covariates and outcomes

64 Merging the classes with other data In the “olden days”, you could pass your ID variable through Mplus so when you saved your class probabilities you could merge this with other data. Now you can pass other data through Mplus as well – hurrah! Variable: auxiliary are ID sex;

65 Reshaping the dataset To account for the uncertainty in our class variable we will need to weight by the posterior probabilities obtained from Mplus Weighted model requires a reshaping of the dataset so that each respondent has n-rows (for an n-class model) rather than just 1

66 Pre-shaped – first 20 kids | ID sex dev_18 dev_42 pclass1 pclass2 pclass3 pclass4 pclass5 modclass | |--------------------------------------------------------------------------------------------------| | 30004 male 3..001 0.803 0.197 3 | | 30008 male 2 1.908 0 0.007.085 1 | | 30010 male 2 2.053.001.052 0.894 5 | | 30023 male 1 3.115 0.596.001.288 3 | | 30031 male 3 4 0 0.983 0.016 3 | |--------------------------------------------------------------------------------------------------| | 30033 male 4 4.392 0.397 0.211 3 | | 30042 male 1 3 0 0.983 0.016 3 | | 30050 male 3 2 0 0.983 0.016 3 | | 30051 male 2 2 0 0 0 1 0 4 | | 30057 male 1 3.135 0.002 0.864 5 | |--------------------------------------------------------------------------------------------------| | 30058 male 1 4 0 0.958 0.041 3 | | 30064 male 2 4 0 0.983 0.016 3 | | 30068 male 4 3.001 0.803 0.197 3 | | 30070 male 3 4 0 0.983 0.016 3 | | 30072 male 1 1 0 0.983 0.016 3 | |--------------------------------------------------------------------------------------------------| | 30075 male 3 3 0 0.982 0.018 3 | | 30088 male 3 4.03.002.889.003.076 3 | | 30095 male 3. 0 0.983 0.016 3 | | 30098 male 3..068.158.173.018.583 5 | | 30104 male 4 1.008 0.775 0.217 3 | +--------------------------------------------------------------------------------------------------+

67 Pre-shaped – first 20 kids | ID sex dev_18 dev_42 pclass1 pclass2 pclass3 pclass4 pclass5 modclass | |--------------------------------------------------------------------------------------------------| | 30004 male 3..001 0.803 0.197 3 | | 30008 male 2 1.908 0 0.007.085 1 | | 30010 male 2 2.053.001.052 0.894 5 | | 30023 male 1 3.115 0.596.001.288 3 | | 30031 male 3 4 0 0.983 0.016 3 | |--------------------------------------------------------------------------------------------------| | 30033 male 4 4.392 0.397 0.211 3 | | 30042 male 1 3 0 0.983 0.016 3 | | 30050 male 3 2 0 0.983 0.016 3 | | 30051 male 2 2 0 0 0 1 0 4 | | 30057 male 1 3.135 0.002 0.864 5 | |--------------------------------------------------------------------------------------------------| | 30058 male 1 4 0 0.958 0.041 3 | | 30064 male 2 4 0 0.983 0.016 3 | | 30068 male 4 3.001 0.803 0.197 3 | | 30070 male 3 4 0 0.983 0.016 3 | | 30072 male 1 1 0 0.983 0.016 3 | |--------------------------------------------------------------------------------------------------| | 30075 male 3 3 0 0.982 0.018 3 | | 30088 male 3 4.03.002.889.003.076 3 | | 30095 male 3. 0 0.983 0.016 3 | | 30098 male 3..068.158.173.018.583 5 | | 30104 male 4 1.008 0.775 0.217 3 | +--------------------------------------------------------------------------------------------------+ covariatesPosterior probs Modal class

68 The reshaping. reshape long pclass, i(id) j(class) (note: j = 1 2 3 4 5) Data wide -> long ------------------------------------------------------- -- Number of obs. 5584 -> 27920 Number of variables 66 -> 63 j variable (5 values) -> class xij variables: pclass1 pclass2... pclass5 -> pclass ------------------------------------------------------- --

69 Re-shaped – first 3 kids +--------------------------------------------------+ | id sex dev_18 dev_42 pclass class | |--------------------------------------------------| 1. | 30004 male 3..001 1 | 2. | 30004 male 3. 0 2 | 3. | 30004 male 3..803 3 | 4. | 30004 male 3. 0 4 | 5. | 30004 male 3..197 5 | |--------------------------------------------------| 6. | 30008 male 2 1.908 1 | 7. | 30008 male 2 1 0 2 | 8. | 30008 male 2 1 0 3 | 9. | 30008 male 2 1.007 4 | 10. | 30008 male 2 1.085 5 | |--------------------------------------------------| 11. | 30010 male 2 2.053 1 | 12. | 30010 male 2 2.001 2 | 13. | 30010 male 2 2.052 3 | 14. | 30010 male 2 2 0 4 | 15. | 30010 male 2 2.894 5 | +--------------------------------------------------+ First kid Third kid Second kid Sum = 1 Constant within child

70 Similar with our data:. list id SEX CPROB class C in 1/12 +---------------------------------+ | id SEX CPROB class C | |---------------------------------| 1. | 30012 2 0 1 4 | 2. | 30012 2 0 2 4 | 3. | 30012 2 0 3 4 | 4. | 30012 2.945 4 4 | 5. | 30012 2.045 5 4 | 6. | 30012 2.01 6 4 | |---------------------------------| 7. | 30024 2 0 1 5 | 8. | 30024 2 0 2 5 | 9. | 30024 2 0 3 5 | 10. | 30024 2 0 4 5 | 11. | 30024 2.991 5 5 | 12. | 30024 2.009 6 5 | |---------------------------------| First respondent Second respondent

71 Simple crosstab. tab class SEX, row nofreq | SEX class | 1 2 | Total -----------+----------------------+---------- Ill | 40.87 59.13 | 100.00 Positive | 40.87 59.13 | 100.00 Dizzy | 40.87 59.13 | 100.00 Coughed | 40.87 59.13 | 100.00 Bad taste | 40.87 59.13 | 100.00 V negative | 40.87 59.13 | 100.00 -----------+----------------------+---------- Total | 40.87 59.13 | 100.00 Oops!

72 Simple crosstab – take 2. tab class SEX [iw = CPROB], row nofreq | SEX class | Male Female | Total -----------+-------------------+------- Ill | 52.9% 47.1% | 100% Positive | 32.9% 67.1% | 100% Dizzy | 43.2% 56.8% | 100% Coughed | 40.8% 59.2% | 100% Bad taste | 45.2% 54.8% | 100% V negative | 39.3% 60.7% | 100% -----------+-------------------+------- Total | 40.9% 59.1% | 100%

73 Compare with modal class assignment. tab C SEX if (class==1), row nofreq | SEX C | Male Female | -----------+-----------------+ Ill | 50.0% 50.0% | Positive | 33.0% 67.0% | Dizzy | 43.4% 56.6% | Coughed | 40.7% 59.3% | Bad taste | 45.4% 54.6% | V negative | 37.6% 62.4% | -----------+-----------------+ Total | 40.9% 59.1% |. tab class SEX [iw = CPROB], row nofreq | SEX class | Male Female | -----------+-----------------+ Ill | 52.9% 47.1% | Positive | 32.9% 67.1% | Dizzy | 43.2% 56.8% | Coughed | 40.8% 59.2% | Bad taste | 45.2% 54.8% | V negative | 39.3% 60.7% | -----------+-----------------+ Total | 40.9% 59.1% |

74 Multinomial logistic. xi: mlogit class i.SEX [iw = CPROB], rrr Multinomial logistic regression Number of obs = 2493 LR chi2(5) = 24.52 Prob > chi2 = 0.0002 Log likelihood = -4053.3746 Pseudo R2 = 0.0030 ------------------------------------------------------------------------------ class | RRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Ill | _ISEX_2 |.7322787.2081189 -1.10 0.273.4195259 1.278186 -------------+---------------------------------------------------------------- Positive | _ISEX_2 | 1.677364.1965463 4.41 0.000 1.333175 2.110413 -------------+---------------------------------------------------------------- Dizzy | _ISEX_2 | 1.082775.1355213 0.64 0.525.8472297 1.383807 -------------+---------------------------------------------------------------- Coughed | _ISEX_2 | 1.194885.1437877 1.48 0.139.9438344 1.512712 -------------+---------------------------------------------------------------- V negative | _ISEX_2 | 1.274734.1782148 1.74 0.083.9692081 1.676572 ------------------------------------------------------------------------------ (class==Bad taste is the base outcome)

75 Class predicts binary outcome. Outcome = weekly smoker at age of 15 char class[omit] 5. xi: logistic sm1100 i.class [iw = CPROB] Logistic regression Number of obs = 2493 LR chi2(5) = 229.03 Prob > chi2 = 0.0000 Log likelihood = -1168.697 Pseudo R2 = 0.0892 ------------------------------------------------------------------------------ sm1100 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Ill | 2.132652.9125838 1.77 0.077.9218961 4.933531 Positive | 7.190203 1.231216 11.52 0.000 5.140265 10.05766 Dizzy | 7.899915 1.413907 11.55 0.000 5.562583 11.21937 Coughed | 3.686492.6831946 7.04 0.000 2.563689 5.301041 V negative | 2.243034.497619 3.64 0.000 1.452099 3.46478 ------------------------------------------------------------------------------

76 Compare with modal class. Posterior probabilities ------------------------------------------------------------------------------ sm1100 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Ill | 2.132652.9125838 1.77 0.077.9218961 4.933531 Positive | 7.190203 1.231216 11.52 0.000 5.140265 10.05766 Dizzy | 7.899915 1.413907 11.55 0.000 5.562583 11.21937 Coughed | 3.686492.6831946 7.04 0.000 2.563689 5.301041 V negative | 2.243034.497619 3.64 0.000 1.452099 3.46478 ------------------------------------------------------------------------------ Modal assignment ------------------------------------------------------------------------------ sm1100 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Ill | 2.560182 1.291868 1.86 0.062.9522577 6.88315 Positive | 7.802047 1.313428 12.20 0.000 5.609367 10.85184 Dizzy | 8.3454 1.467249 12.07 0.000 5.912796 11.77881 Coughed | 4.224301.7686958 7.92 0.000 2.957071 6.034592 V negative | 2.861537.6548723 4.59 0.000 1.827254 4.481255 ------------------------------------------------------------------------------

77 Conclusions Young people at 15yrs can report a variety of responses to their first cigarette Certain responses are associated with current regular smoking behaviour 15 year-old girls are more likely to retrospectively report a positive experience Recall bias is likely to play a part in these associations

78 Conclusions LCA is an exploratory tool which can be used to simplify a set of binary responses Extension to ordinal responses is straight-forward The use of ordinal data is an alternative way to boost degrees of freedom Resulting probabilities can be used model latent class variable as a risk factor or outcome A modal class variable should be used with caution


Download ppt "Cross-sectional LCA Patterns of first response to cigarettes."

Similar presentations


Ads by Google