Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins

Similar presentations


Presentation on theme: "Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins"— Presentation transcript:

1 Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins
Andy Bogart, MS Jack Goldberg, PhD

2 Military Service in Vietnam
Multiple Informant Data Military Service in Vietnam id pairid PTSD self report military rec. 1 45 yes no 2 1 17 yes 3 2 66 no yes 4 2 58 yes

3

4 Vietnam service is defined as having a self report or a military report (or both) of having served in Vietnam All veterans provided valid post traumatic stress disorder data and at least one valid report of Vietnam service (yes/no). All Vietnam veterans also provided at least one valid report of purple heart receipt (yes/no).

5 Vietnam service is defined as having a self report or a military report (or both) of having served in Vietnam All veterans provided valid post traumatic stress disorder data and at least one valid report of Vietnam service (yes/no). All Vietnam veterans also provided at least one valid report of purple heart receipt (yes/no).

6 Command Self Report regress ptsd sr, robust sr | .1793066 .0070909
Linear regression Number of obs = F( 1, 10794) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err t P>|t| [95% Conf. Interval] sr | _cons |

7 Self Report Military Record
Command regress ptsd mr, robust Self Report sr | Military Record mr | Linear regression Number of obs = F( 1, 10710) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err t P>|t| [95% Conf. Interval] mr | _cons | Linear regression Number of obs = F( 1, 10710) = Prob > F = R-squared = Root MSE = | Robust ptsd | Coef. Std. Err t P>|t| [95% Conf. Interval] mr | _cons |

8 source by exposure interaction terms
Model 1: The General Multiple Source Model expected outcome Generates same estimates as the k marginal source-specific models intercept source indicators source by exposure interaction terms Allows testing for a difference in sources

9 Multiple Informant Data
id pairid PTSD self report military rec. 1 45 yes no 2 1 17 yes 3 2 66 no yes 4 2 58 yes

10 Command expand 2 id pairid PTSD sr mr id pairid PTSD sr mr 1 45 1 45 1
1 45 1 45 2 1 17 2 1 17 3 2 66 1 2 1 17 4 2 58 1 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

11 Command expand 2 id pairid PTSD sr mr 1 45 1 45 2 1 17 2 1 17 3 2 66 1
1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

12 Command generate service=0 id pairid PTSD sr mr service 1 45 1 45 2 1
1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

13 Command by id: replace service = sr if _n==1 id pairid PTSD sr mr
45 1 1 45 2 1 17 1 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 1 4 2 58 1

14 Command by id: replace service = mr if _n==2 id pairid PTSD sr mr
1 45 1 1 1 45 2 1 17 1 1 2 1 17 1 3 2 66 1 3 2 66 1 1 4 2 58 1 1 1 4 2 58 1 1

15 Command id pairid PTSD service 1 45 1 1 45 2 1 17 1 2 1 17 1 3 2 66 3
2 1 17 1 2 1 17 1 3 2 66 3 2 66 1 4 2 58 1 4 2 58 1

16 Command id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 3
generate s1 = 0 generate s2 = 0 id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 3 2 66 1 4 2 58 1 4 2 58 1

17 Command id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 1
by id: replace s1 = 1 if _n==1 by id: replace s2 = 1 if _n==2 id pairid PTSD service s1 s2 1 45 1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

18 Command generate z1 = service * s1 generate z2 = service * s2 id
pairid PTSD service s1 s2 z1 z2 1 45 1 45 2 1 17 2 1 17 3 2 66 1 3 2 66 1 4 2 58 1 4 2 58 1

19 Self Report Military Record
Command xtgee ptsd s1 z1 z2, i(pin) corr(ind) family(gau) robust Self Report sr | Military Record mr | Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = Group variable: pin Number of groups = Link: identity Obs per group: min = Family: Gaussian avg = Correlation: independent max = Wald chi2(3) = Scale parameter: Prob > chi = Pearson chi2(21508): Deviance = Dispersion (Pearson): Dispersion = (Std. Err. adjusted for clustering on pin) | Semi-robust ptsd | Coef. Std. Err z P>|z| [95% Conf. Interval] s1 | z1 | z2 | _cons | Iteration 1: tolerance = 7.894e-14 GEE population-averaged model Number of obs = Group variable: pin Number of groups = Link: identity Obs per group: min = Family: Gaussian avg = Correlation: independent max = Wald chi2(3) = Scale parameter: Prob > chi = Pearson chi2(21508): Deviance = Dispersion (Pearson): Dispersion = (Std. Err. adjusted for clustering on pin) | Semi-robust ptsd | Coef. Std. Err z P>|z| [95% Conf. Interval] s1 | z1 | z2 | _cons |

20 But wait . . . these guys are twins!
Data within twin pairs might be correlated . . .

21 All subjects provided at least one valid source report for the specified analyses
Both twins provided valid PTSD data We created sham PTSD data for the ineligible twin, and set his sampling probability to zero We created a sham twin for the existing twin, and set the sham twin’s sampling probability to zero

22 Command svyset id [pweight = sampweight], strata(pairid)
VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero> pweight: sampweight VCE: linearized Strata 1: pairid SU 1: id FPC 1: <zero>

23 Self Report Military Record
Command svy: regress ptsd s1 z1 z2 Self Report sr | Military Record mr | Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

24 Self Report Military Record
Command test z1 = z2 Self Report sr | Military Record mr | . test z1 = z2 ( 1) z1 - z2 = 0 chi2( 1) = Prob > chi2 = . test z1 = z2 Adjusted Wald test ( 1) z1 - z2 = 0 chi2( 1) = Prob > chi2 = Moral of the story: The two sources contain different information. We should not combine them. Or, should we??

25 Model 2: Multiple Source Model of Within- and
Model 2: Multiple Source Model of Within- and Between-pair exposure effects Same estimates as k separate marginal within & between models intercept source indicators source by within-pair effect interaction terms source by between-pair effect interaction terms Allows testing for a difference in reports of within effects & between effects

26 Command id pairid s1 z1 1 1 2 1 2 1

27 Command id pairid s1 z1 z1bar 1 1 . 2 1 2 1 .
bysort pairid: egen z1bar = mean(z1) if s1==1 id pairid s1 z1 z1bar 1 1 . 2 1 2 1 .

28 Command id pairid s1 z1 z1bar 1 1 2 1 2 1
bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 1 1 2 1 2 1

29 Command id pairid s1 z1 z1bar 3 2 1 0.5 3 2 4 2 1 0.5 4 2
bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 3 2 1 0.5 3 2 4 2 1 0.5 4 2

30 Command id pairid s1 z1 z1bar 1 1 2 1 2 1 3 2 1 0.5 3 2 4 2 1 0.5 4 2
bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 id pairid s1 z1 z1bar 1 1 2 1 2 1 3 2 1 0.5 3 2 4 2 1 0.5 4 2

31 Command id pairid s1 z1 z1bar z1diff 1 1 2 1 2 1 3 2 1 0.5 -0.5 3 2 4
bysort pairid: egen z1bar = mean(z1) if s1==1 bysort pairid: replace z1bar=0 if s1==0 generate z1diff = z1 – z1bar id pairid s1 z1 z1bar z1diff 1 1 2 1 2 1 3 2 1 0.5 -0.5 3 2 4 2 1 0.5 4 2

32 Command (Repeat that procedure to make z2bar and z2diff)

33 Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar
Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 5, ) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

34 Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar
Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 5, ) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

35 Command svy: regress ptsd s1 z1diff z1bar z2diff z2bar
Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 5, ) = Prob > F = R-squared = | Linearized ptsd | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

36 Command test z1diff = z2diff Adjusted Wald test
Prob > F = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

37 Combine the within-pair info. Keep between-pair info. separate
Command test z1diff = z2diff test z1bar = z2bar Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = Prob > F = ( 1) z1bar - z2bar = 0 F( 1, 6172) = Prob > F = Adjusted Wald test ( 1) z1diff - z2diff = 0 F( 1, 6172) = Prob > F = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1diff | z1bar | z2diff | z2bar | _cons | Note: 35 strata omitted because they contain no population members Within-pair estimates don’t differ much Moral of the story: Combine the within-pair info. Keep between-pair info. separate Between-pair estimates do!!

38 source by between-pair effect interaction terms
Model 3: Multiple Source Model with a Combined within-pair effect Assumes within-pair effect to be common to all k sources intercept source indicators combined source within-pair effect source by between-pair effect interaction terms Often yields a more precise estimate of the within-pair effect

39 Command id pairid z1diff z2diff 1 1 -0.5 2 1 2 1 0.5 3 2 -0.5 3 2 4 2
1 -0.5 2 1 2 1 0.5 3 2 -0.5 3 2 4 2 0.5 4 2

40 Command id pairid z1diff z2diff wservice 1 1 -0.5 2 1 2 1 0.5 3 2 -0.5
generate wservice = z1diff + z2diff id pairid z1diff z2diff wservice 1 1 -0.5 2 1 2 1 0.5 3 2 -0.5 3 2 4 2 0.5 4 2

41 Command svy: regress ptsd s1 wservice z1bar z2bar
Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 4, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | wservice | z1bar | z2bar | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

42 Command svy: regress ptsd s1 wservice z1bar z2bar
Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 4, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | wservice | z1bar | z2bar | _cons | Note: 35 strata omitted because they contain no population members Survey: Linear regression Number of strata = Number of obs = Number of PSUs = Population size = Design df = F( 3, ) = Prob > F = R-squared = | Linearized logptsd2 | Coef. Std. Err t P>|t| [95% Conf. Interval] s1 | z1 | z2 | _cons | Note: 35 strata omitted because they contain no population members

43

44

45 7 – 14% gain in efficiency over individual sources
Conclusions from VET Registry analysis Sources differed in Model 1, so we did not combine them overall Within-pair estimates in Model 2 did not differ much by source, so . . . Model 3 combined within-pair estimates Within-pair estimate: Combined Record (0.14, 0.19) 7 – 14% gain in efficiency over individual sources

46 from VET Registry analysis
Conclusions from VET Registry analysis Between-pair estimates in Model 2 differed significantly Model 3 estimates separate between-pair effects for each source Source-specific between-pair estimates: Self Report (0.17, 0.20) Military Record (0.13, 0.16)

47 Future Directions Accommodate covariate adjustment Compare pooled estimators to “AND” and “OR” type derived exposure variables Address zygosity within regression models

48 Acknowledgements & References
Jack Goldberg at UW Margaret Pepe at UW Pepe MS, Whitaker RC, Seidel K. Estimating and comparing univariate associations with application to the prediction of adult obesity. Statistics in Medicine 1999; 18: Nicholas Horton at Harvard Horton NJ, Fitzmaurice GM. Regression analysis of multiple source and multiple informant data from complex survey samples. Statistics in Medicine 2004; 23:

49 Thank you for listening

50


Download ppt "Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins"

Similar presentations


Ads by Google