Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics Workshop Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA April 14, 9:00 AM to 3:30 PM.

Similar presentations


Presentation on theme: "Genomics Workshop Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA April 14, 9:00 AM to 3:30 PM."— Presentation transcript:

1 Genomics Workshop Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA April 14, 9:00 AM to 3:30 PM – Hyatt Regency, Dallas, Texas Sponsored by USC/UCLA Center of Biodemography and Population Health Organized by Teresa Seeman, Steven Cole, Eileen Crimmins

2 Tactical aspects of study administration and sample capture/storage Biological overview of genetics & functional genomics Strategic aspects of study design and data analysis Lunch Technical aspects of study design and data analysis Perspectives on the State of the Field Application clinic

3 Tactical aspects of study administration and sample capture/storage DNA 1.New sample capture Methods: e.g., Oragene, leukocytes Consent & administrative issues 2.Retrospective analyses Sources: blood spots, cheek swabs, etc Consent & administrative issues 3.Epigenetics DNA methylation Histone acetylation & chromatin dynamics Tissue specificity (vs DNA) 4.Tactical issues – Reports from the Field I wish I’d known then… RNA 1.Identifying appropriate target tissues Whole blood, PBMC, saliva, hair, path specim. 2.Sample capture/storage 3.Consent & administrative issues

4

5

6

7 Tactical aspects of study administration and sample capture/storage DNA 1.New sample capture Methods: e.g., Oragene, leukocytes Consent & administrative issues 2.Retrospective analyses Sources: blood spots, cheek swabs, etc Consent & administrative issues 3.Epigenetics DNA methylation Histone acetylation & chromatin dynamics Tissue specificity (vs DNA) 4.Tactical issues – Reports from the Field I wish I’d known then… RNA 1.Identifying appropriate target tissues Whole blood, PBMC, saliva, hair, path specim. 2.Sample capture/storage 3.Consent & administrative issues

8

9

10 Tactical aspects of study administration and sample capture/storage DNA 1.New sample capture Methods: e.g., Oragene, leukocytes Consent & administrative issues 2.Retrospective analyses Sources: blood spots, cheek swabs, etc Consent & administrative issues 3.Epigenetics DNA methylation Histone acetylation & chromatin dynamics Tissue specificity (vs DNA) 4.Tactical issues – Reports from the Field I wish I’d known then… RNA 1.Identifying appropriate target tissues Whole blood, PBMC, saliva, hair, path specim. 2.Sample capture/storage 3.Consent & administrative issues

11 Gene IL6 DNA

12 Gene IL6 DNA

13 Gene IL6 RNA DNA

14 Gene Health IL6 RNA DNA

15 Tactical aspects of study administration and sample capture/storage DNA 1.New sample capture Methods: e.g., Oragene, leukocytes Consent & administrative issues 2.Retrospective analyses Sources: blood spots, cheek swabs, etc Consent & administrative issues 3.Epigenetics DNA methylation Histone acetylation & chromatin dynamics Tissue specificity (vs DNA) 4.Tactical issues – Reports from the Field I wish I’d known then… RNA 1.Identifying appropriate target tissues Whole blood, PBMC, saliva, hair, path specim. 2.Sample capture/storage 3.Consent & administrative issues

16 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

17 Gene IL6 DNA

18 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

19 Gene IL6 DNA

20 Gene IL6 DNA

21 Gene IL6 RNA DNA

22 Gene Health IL6 RNA DNA

23 Gene Health IL6 RNA DNA

24 Social Environment Gene Health IL6 RNA DNA

25 Social Environment Gene Health IL6 RNA DNA

26 Social Environment Gene Health IL6 RNA DNA

27 Social Environment Gene Health IL6 RNA DNA

28 IL6 TCT TGCGATGCTA AAG IL6 gene transcription

29 IL6 TCT TGCGATGCTA AAG IL6 gene transcription NE

30 IL6 TCT TGCGATGCTA AAG IL6 gene transcription NE PKA

31 IL6 TCT TGCGATGCTA AAG IL6 gene transcription NE GATA1 P PKA

32 IL6 TCT TGCGATGCTA AAG IL6 gene transcription NE GATA1 P PKA

33 IL6 TCT TGCGATGCTA AAG IL6 gene transcription NE GATA1 P PKA IL6 promoter activity (fold-change) Norepinephrine (  M):

34 Non-depressed Depressed p =.008 Socio-environmental regulation of IL6

35 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

36 Gene IL6 DNA

37 Gene IL6 DNA

38 Gene Health IL6 RNA DNA

39 Gene Health IL6 RNA DNA

40 Gene IL6 DNA

41 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

42 Social Environment Gene Health IL6 RNA DNA

43 Social Environment Gene Health IL6 … [G/C] … RNA DNA

44 Social Environment Gene Health IL6 … [G/C] … RNA DNA

45 Social Environment Gene IL6 … [G/C] … DNA

46 IL6 TCT TGCGATGCTA AAG Gene x Environment Interaction In silico

47 IL6 TCT TGCGATGCTA AAG V$GATA1_01 =.943 Gene x Environment Interaction In silico

48 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 Gene x Environment Interaction In silico

49 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 Gene x Environment Interaction In silico

50 Transcriptional activity (fold-change) IL6 promoter: WT -174C Norepinephrine (  M): IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 Gene x Environment Interaction In silico In vitro

51 Transcriptional activity (fold-change) IL6 promoter: WT -174C Norepinephrine (  M): Difference: p <.0001 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 Gene x Environment Interaction In silico In vitro

52 Non-depressed Depressed p =.008 Gene x Environment Interaction IL GG IL CC/GC

53 p =.439 Non-depressed Depressed Non-depressed Depressed p =.008 Gene x Environment Interaction IL GG IL CC/GC

54 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

55 Social Environment Gene Health IL6 RNA DNA

56 Social Environment Gene Health IL6 RNA DNA … [G/C] …

57 Social Environment Gene Health 2 IL6 RNA 2 DNA … [G/C] …

58

59

60 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

61 Social Environment Gene Health IL6 RNA DNA

62 Social Environment Gene IL6 RNA DNA Behavior

63 Social Environment Gene IL6 RNA DNA Behavior Gene-Environment Correlation

64 Social Environment Gene IL6 RNA DNA Behavior Gene-Environment Correlation

65 Social Environment Gene IL6 RNA DNA Behavior Gene-Environment Correlation

66 Social Environment Gene IL6 RNA DNA Behavior Gene-Environment Correlation

67 Social Environment Gene IL6 RNA DNA Behavior Gene-Environment Correlation Recursive Molecular Remodeling

68 Body 1 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

69 Environment 1 Body 1 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

70 Environment 1 Body 1 Behavior 1 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

71 Environment 1 Body 1 RNA 1 Behavior 1 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

72 Time 1 Environment 1 Body 1 RNA 1 Behavior 1 Time 2 Body 2 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

73 Time 1 Environment 1 Body 1 RNA 1 Behavior 1 Time 2 Environment 2 Body 2 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

74 Time 1 Environment 1 Body 1 RNA 1 Behavior 1 Time 2 Environment 2 Body 2 RNA 2 Behavior 2 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

75 Time 1 Environment 1 Body 1 RNA 1 Behavior 1 Time 2 Environment 2 Body 2 RNA 2 Behavior 2 Time 3 Environment 3 Body 3 RNA 3 Behavior 3 Recursive developmental remodeling Cole (2009) Current Directions in Psychological Science

76 Time 1 Environment 1 Body 1 RNA 1 Behavior 1 Time 2 Environment 2 Body 2 RNA 2 Behavior 2 Time 3 Environment 3 Body 3 RNA 3 Behavior 3 Recursive developmental remodeling RNA = intra-organismic adaptation Cole (2009) Current Directions in Psychological Science

77 Biological overview of genetics & functional genomics Theoretical framework: Genes, Environments, transcription, and health 1.“Genetic” influences (missing h, penetrance R-square, etc.) 2.Functional genomics Transcription factors Epigenetics 3.Gene-Environment interactions Regulatory polymorphism Coding polymorphism System dynamics 1.Feedback, network pleiotropy 2.Recursive developmental trajectories

78 Strategic aspects of study design and data analysis Basic substantive objectives & study designs 1.“Gene discovery” (e.g., genetic epidemiology) 2.Environmental regulation of health (via transcription) 3.Gene-Environment interaction

79 Gene IL6 DNA

80 Gene Health IL6 DNA

81 Strategic aspects of study design and data analysis Basic substantive objectives & study designs 1.“Gene discovery” (e.g., genetic epidemiology) 2.Environmental regulation of health (via transcription) 3.Gene-Environment interaction

82 Gene Health IL6 DNA

83 Gene Health IL6 RNA DNA

84 Strategic aspects of study design and data analysis Basic substantive objectives & study designs 1.“Gene discovery” (e.g., genetic epidemiology) 2.Environmental regulation of health (via transcription) 3.Gene-Environment interaction

85 Gene Health IL6 RNA DNA

86 Gene Health IL6 RNA DNA … [G/C] …

87 Strategic aspects of study design and data analysis Basic substantive objectives & study designs 1.“Gene discovery” (e.g., genetic epidemiology) 2.Environmental regulation of health (via transcription) 3.Gene-Environment interaction Antagonistic pleiotropy

88 IL6 -174: CC GC GG CC GC GG p =.007 Older Adult Adolescent CRP mg/L / Adversity SD p =.032 Antagonistic pleiotropy

89 IL6 -174: CC GC GG CC GC GG p =.007 Older Adult Adolescent CRP mg/L / Adversity SD p =.032 Antagonistic pleiotropy

90 IL6 -174: CC GC GG CC GC GG p =.007 Older Adult Adolescent CRP mg/L / Adversity SD p =.032 Antagonistic pleiotropy Evolution deletes disadvantage, particularly to the young

91 GG GC CC Outcome

92 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + e

93 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + e Environment A GG GC CC Outcome Environment B

94 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + c(Env) + d(#G x Env) + e Environment A GG GC CC Outcome Environment B

95 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e Environment A GG GC CC Outcome Environment B

96 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e ↓ power Environment A GG GC CC Outcome Environment B

97 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e ↓ power ↑ parameter estimate bias Environment A GG GC CC Outcome Environment B

98 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + e’ ← c(Env) + d(#G x Env) + e ↓ power ↑ parameter estimate bias Marginal: 0 Environment A GG GC CC Outcome Environment B

99 Strategic aspects of study design and data analysis Basic substantive objectives & study designs 1.“Gene discovery” (e.g., genetic epidemiology) 2.Environmental regulation of health (via transcription) 3.Gene-Environment interaction Antagonistic pleiotropy Valid statistical models are one major reason that substantive interests (environments) matter.

100 Strategic aspects of study design and data analysis Basic substantive objectives & study designs 1.“Gene discovery” (e.g., genetic epidemiology) 2.Environmental regulation of health (via transcription) 3.Gene-Environment interaction Antagonistic pleiotropy Valid statistical models are one major reason that substantive interests (environments) matter. OK, then, let’s have lunch.

101 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic middle road 2.Environmental regulation of health (via transcription) Candidate transcript studies Genome-wide approaches 3.Gene-Environment interaction Statistical issues Revisiting the bioinformatic middle road

102 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

103

104 IL6 TCT TGCGATGCTA AAG Gene x Environment Interaction

105 IL6 TCT TGCGATGCTA AAG C

106 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 Gene x Environment Interaction In silico

107 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 Gene x Environment Interaction In silico

108 Transcriptional activity (fold-change) IL6 promoter: WT -174C Norepinephrine (  M): Difference: p <.0001 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 Gene x Environment Interaction In silico In vitro

109 p =.439 Non-depressed Depressed Non-depressed Depressed p =.008 Gene x Environment Interaction IL GG IL CC/GC

110 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

111

112

113 WellID1ID2 RFU1 RFU2Ct1Ct2Call A Heterozygote A Allele2 A Heterozygote A Allele1 A Allele1

114 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

115

116

117

118 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

119 Fisher’s regression: GG GC CC Outcome

120 Fisher’s regression: GG GC CC Outcome

121 Fisher’s regression: GG GC CC Outcome

122 Fisher’s regression: GG GC CC Outcome

123 Fisher’s regression: GG GC CC Outcome y = a + b(#G)

124 Fisher’s regression: GG GC CC Outcome y = a + b(#G) y = a + b(GG) + c(GC) + d(CC)

125 Fisher’s regression: GG GC CC Outcome y = a + b(#G) y = a + b(GG) + c(GC) + d(CC)

126 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

127

128

129

130

131 Fisher’s regression: GG GC CC Outcome y = a + b(#G rs )

132 Fisher’s regression: GG GC CC Outcome y = a + b(#G rs ) y = a + b(#G rs ) + c(#T rs20937) + ….

133 Fisher’s regression: GG GC CC Outcome y = a + b(#G rs ) y = a + b(Haplotype containing rs )

134 Fisher’s regression: GG GC CC Outcome y = a + b(#G rs ) y = a + b(Haplotype containing rs ) y = a + b(ATTCGTAC)

135 Fisher’s regression: GG GC CC Outcome y = a + b(#G rs ) y = a + b(Haplotype containing rs ) y = a + b(ATTCGTAC) HapMap Tag SNP

136 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

137 Linkage-driven indirect association gradients

138

139 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

140

141 Culture/behavior/exposure “Environment”

142

143

144 Ancestry classification via mitochondrial haplogroups (also Y haplogroups for paternal lineage)

145 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

146

147

148 CRP CVD

149 CRP CVD CRP

150 CVD CRP

151 CVD CRP IL-6

152 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies - Candidate identification - Targeted genotyping a.PCR b.High-throughput approaches - Statistical models a.Fisher’s basic regression model b.Multivariate mapping / association / recombination i.Recombination ii.Haplotype blocks c.Confounding i.Linkage disequilibrium & haplotype analyses ii.Ethnic stratification Phenotypic ascertainment Genetic ancestry iii.Mendelian randomization

153 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies

154 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies

155 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies - Marker selection for blind search: tag SNPs - Massively parallel genotyping a.Array-based strategies b.Deep resequencing - Statistical models a.Main effect models b.Interaction models c.Managing Type I error - Bonferronni & FDR - Internal cross-validation - External replication

156

157 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies - Marker selection for blind search: tag SNPs - Massively parallel genotyping a.Array-based strategies b.Deep resequencing - Statistical models a.Main effect models b.Interaction models c.Managing Type I error - Bonferronni & FDR - Internal cross-validation - External replication

158

159

160

161 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies - Marker selection for blind search: tag SNPs - Massively parallel genotyping a.Array-based strategies b.Deep resequencing - Statistical models a.Main effect models b.Interaction models c.Managing Type I error - Bonferronni & FDR - Internal cross-validation - External replication

162 Fisher’s regression: GG GC CC Outcome y = a + b(#G) y = a + b(GG) + c(GC) + d(CC)

163 Fisher’s regression: GG GC CC Outcome Environment A GG GC CC Outcome Environment B y = a + b(#G) y = a + b(GG) + c(GC) + d(CC)

164 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + c(Env) + d(#G x Env) y = a + b(GG) + c(GC) + d(CC) + e(Env) + f(Env x GG) + g(Env x GC) + h(Env x CC) Environment A GG GC CC Outcome Environment B

165 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies - Marker selection for blind search: tag SNPs - Massively parallel genotyping a.Array-based strategies b.Deep resequencing - Statistical models a.Main effect models b.Interaction models c.Managing Type I error - Bonferronni & FDR - Internal cross-validation - External replication

166 Type 1 / false positive error:

167 Confirmatory hypothesis testing (candidate genes) 1 hypothesis = 1 t-test = 1 p-value = no problem: p <.05 = p <.05

168 Type 1 / false positive error: Confirmatory hypothesis testing (candidate genes) 1 hypothesis = 1 t-test = 1 p-value = no problem: p <.05 = p <.05 Gene mapping (exploratory association testing) Gene expression: 22,000 p-values = 1,100 false positives (p <.05) p(false discovery > 0) =

169 Type 1 / false positive error: Confirmatory hypothesis testing (candidate genes) 1 hypothesis = 1 t-test = 1 p-value = no problem: p <.05 = p <.05 Gene mapping (exploratory association testing) Gene expression: 22,000 p-values = 1,100 false positives (p <.05) p(false discovery > 0) = Gene polymorphism: 10,000,000 p-values = 500,000 false positives (p <.05) p(false discovery > 0) =

170 What to do?

171 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error

172 What to do? 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error Model/simulate error Randomization test or FDR modeling = less conservative bias Unimpressive yield: p = if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )

173

174 What to do? 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error Model/simulate error Randomization test or FDR modeling = less conservative bias Unimpressive yield: p = if you’re lucky. Still too conservative, and biased ( omitted true effects in error term )

175 What to do? 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error Model/simulate error Randomization test or FDR modeling = less conservative bias Unimpressive yield: p = if you’re lucky. Still too conservative, and biased ( omitted true effects in error term ) Use a better sampling design

176 Population prevalence design

177 Outcome-stratified design

178 What to do? 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error Model/simulate error Randomization test or FDR modeling = less conservative bias Unimpressive yield: p = if you’re lucky. Still too conservative, and biased ( omitted true effects in error term ) Use a better sampling design

179 What to do? 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error Model/simulate error Randomization test or FDR modeling = less conservative bias Unimpressive yield: p = if you’re lucky. Still too conservative, and biased ( omitted true effects in error term ) Use a better sampling design 2.Replicate (inter-study or intra-study cross-validation).05 x.05 x.05 = x 22,000 = 2.75 false positives ( vs. 1,100 )

180

181 What to do? 1.Increase stringency (intra-study) Bonferroni correct ( p =.05/22,000 = ) Choice: huge samples or massive Type 2 “false negative” error Model/simulate error Randomization test or FDR modeling = less conservative bias Unimpressive yield: p = if you’re lucky. Still too conservative, and biased ( omitted true effects in error term ) Use a better sampling design 2.Replicate (inter-study or intra-study crossvalidation).05 x.05 x.05 = x 22,000 = 2.75 false positives ( vs. 1,100 )

182 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies - Marker selection for blind search: tag SNPs - Massively parallel genotyping a.Array-based strategies b.Deep resequencing - Statistical models a.Main effect models b.Interaction models c.Managing Type I error - Bonferronni & FDR - Internal cross-validation - External replication

183 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies

184 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power

185 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power - Candidate set selection a.Regulatory polymorphism b.Coding polymorphism - Statistical considerations a.Power b.Differential enrichment

186 IL6 TCT TGCGATGCTA AAG In silico prediction of Gene x Environment Interaction

187 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 In silico prediction of Gene x Environment Interaction In silico

188 Transcriptional activity (fold-change) IL6 promoter: WT -174C Norepinephrine (  M): Difference: p <.0001 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 In silico prediction of Gene x Environment Interaction In silico In vitro

189 p =.439 Non-depressed Depressed Non-depressed Depressed p =.008 IL GG IL CC/GC In silico prediction of Gene x Environment Interaction In vivo

190 FLJ LOC AKR7A RHCE -292 LOC SOC -39 SOC -49 SOC -26 UNQ LAPTM PHC PHC2 -16 ITGB3BP -311 FLJ ZNF FUBP LOC LOC PDE4DIP -175 COAS LOC LOC LOC LOC FLG -17 LEP RAB LOC LOC LOC PKLR -118 PKLR -597 FCRH SPTA SLAMF KCNJ ITLN F11R -798 LMX1A -85 SELP -144 LOC F13B -881 MYOG -951 LOC LGTN -331 FLJ GPATC LOC AGT 1 FLJ LOC LOC MGC KIAA LOC LOC MIG MIG MIG LOC LOC LOC LOC LOC LOC LOC LOC FLJ LOC LOC C20orf STK PIGT -910 DNTTIP C20orf67 -1 MMP CEBPB -978 RNPC TH1L -26 LOC LOC CGI FKHL C20orf TGM LOC Kua-UEV -465 Kua-UEV -561 Kua -465 BTBD C21orf C21orf KRTAP B3GALT LOC LOC CLDN8 -17 KRTAP DSCR C21orf KRTAP FTCD -410 LOC PEX PEX ZNF LOC SMARCB CABIN KIAA ARP ADSL -602 ARHGAP NUP PPARA -184 BID -126 DGCR TXNRD LOC LOC LOC LOC GSTT SEC14L SSTR FLJ DIA ATP5L A4GALT -825 SULT4A C2orf LOC LOC IL1RL MRPS LOC IL1F MGC MGC MGC MAP1D -120 COL3A SLC39A LOC IL8RB -447 TUBA FLJ ALPPL UGT1A UGT1A UGT1A UGT1A TRPM ASB GCKR -204 LOC FLJ MSH MSH MSH SBLF -59 LOC LOC SEMA4F -751 RBM29 -1 LOC LOC LOC TXNDC FLJ LOC LOC ORC4L -16 ARL NR4A ATP5G3 -55 ZNF ZSWIM PGAP PGAP SF3B ORC2L -786 LOC CRYGC -765 PECR -942 SLC23A LOC LOC LOC LOC ALK -710 BCL11A -615 PAP -438 PAP -531 CNTN PPARG -584 PPARG -914 LOC GALNTL FBXL APRG APRG LOC LOC LOC NR1I STXBP5L -480 LOC MRPS KCNAB LOC LOC NLGN FLJ ATP2B LOC ANKRD LOC FLJ SLC4A MST LOC LOC CPOX -150 LOC CBLB -250 LOC GPR IQCB MGC LOC KIAA MGC LOC LRRC KIAA LOC IBSP -319 MGC NDST LOC LOC FLJ CYP4V LOC LOC LOC ZAR LOC PF EIF4E -716 ADH TACR AGXT2L PLA2G12A -795 PITX LOC CDHJ -652 FGA -110 PPID -384 LOC GPM6A -203 LOC LOC LOC LOC FGFBP LOC FLJ FLJ FLJ LOC SRD5A LOC LOC MGC PELO -938 BDP DKFZp564C LOC TSLP -331 LOC SNCAIP -671 LOC SLC27A CDC42SE PHF LOC PCDHA4 -26 PCDHB PCDHB PCDHB ABLIM LARP -716 LOC FGFR FGFR LOC LOC LOC LOC LOC OR2V OR2V TPPP -454 MYO LOC GDNF -36 LOC FOXD ARSB -493 DHFR -473 SPATA CHD STK22D -863 LOC CDO FLJ LOC ALDH7A CAMK2A -429 C5orf LOC DUSP LOC NQO MRS2L -22 HIST1H2BA -960 HIST1H2BD -597 HIST1H2BH -618 HIST1H4I -283 HLA-H -477 MRPS18B -207 LOC LOC NFKBIL LY6G5B -359 C6orf HSPA1B -942 C HLA-DRA -774 HLA-DQA ZBTB LOC TLT C6orf KIAA C6orf C6orf POU3F LOC C6orf LOC LOC LOC LOC LOC SERPINB OFCC LOC SMA LOC LOC OR12D LOC HCG4P6 -80 HCG4P PSORS1C2 -78 HLA-C -512 HLA-B -594 HLA-DRB HLA-DRB HLA-DQB2 0 HLA-DQB HLA-DQB2 0 HLA-DOB -500 MLN -740 LRFN C6orf PLA2G CRISP IL17F -733 HMGCLL LOC C6orf DJ467N RTN4IP SLC22A LOC DEADC FLJ SYNE SYNE LOC LOC PIP3-E -457 T -9 T -3 LOC DKFZP434J LOC LOC GHRHR -646 ADCYAP1R1 -60 C7orf LOC GPR C7orf BLVRA -400 LOC WBSCR LOC LOC FZD LOC LOC AKR1D LOC OR2F OR2A LOC LOC LOC LOC LOC LOC ICA AGR2 -65 LOC LOC CRHR PDE1C -20 LOC LOC LOC LOC LOC LOC CCL SEMA3C -385 C7orf PON GATS -36 ACHE -715 ACHE -224 ACHE -715 ACHE -224 ORC5L -990 CHCHD MGC LOC FLJ HIPK2 -70 ZC3HDC LOC BAGE BAGE MCPH MCPH AMAC -766 NEIL NEF PNOC -756 LOC FKSG2 -72 DKFZp586M SNTG LOC ADHFE1 -54 SULF WWP LOC FLJ LOC LOC ANGPT SPAG SPAG SPAG DEFB LOC ASAH ASAH FLJ FLJ SNAI CPA FSBP -393 MFTC -905 MRPL LOC TOP1MT -477 LOC LOC DOCK LOC C9orf SH3GL C9orf LOC LOC DKFZP434M SECISBP LOC PHF PHF LOC LOC PRG RAD23B -998 SLC31A OR1N C9orf54 -2 LAMC LOC DBH -768 OBP2A -732 EGFL EGFL TRAF2 -32 LOC LOC C9orf SLC24A IFNA IFNA C9orf C9orf UNQ STOML LOC LOC HNRPK -86 LOC DIRAS LOC TXNDC TXN -239 OR1L DYT ABO -790 ABO -789 ABO -790 XPMC2H -374 LOC LOC LOC FCN FCN LOC GAGE1 -21 RRAGB -788 LOC SH3BGRL -944 DIAPH HSU NXF2 -89 PLP LOC SLC6A LOC FLJ MAGEB LOC LOC UBE LOC LOC DMD -923 RPGR 3 ZNF PRKY -308 LOC LOC LOC LOC LOC UCN NET1 -14 MAPK LOC CDC SLC29A LOC LOC LIPL3 -68 LIPL LOC LOC LOC FRAT ABCC2 -3 HPS NFKB PNLIPRP DMBT FANK1 3 TAF LOC LOC TPRT -277 C10orf C10orf ZNF33A -477 LOC PPYR1 -81 LOC AKR1C LOC LOC NEUROG3 6 AMID -452 PPP3CB -854 LOC LOC MMS19L -221 C10orf GPR C10orf ASB IL15RA -222 IL15RA -827 USP6NL -573 C10orf NMT SIAT8F -676 NEBL -727 C10orf LOC LOC CTGLF LOC KCNQ1 -40 LOC OR51F TRIM OR10A SAA SAA LOC PDHX -845 TRIM LOC NDUFS LOC OR5T3 -97 CTNND CTNND CNTF -149 ROM MARK RAB1B -75 GSTP LOC USP LOC OR4D OR8G MGC LOC LOC NUP KIAA LOC LOC LOC LOC LOC HPS LOC LOC FLJ OR8H AGTRL PRG TCN RAB3IL KIAA CHRDL KCTD MRE11A -879 MRE11A -982 MMP CRYAB -175 ZNF LOC LOC CCND NDUFA KCNA FLJ FLJ LOC LOC FLJ LOC PFKM -838 DKFZp686O C12orf DGKA -806 DGKA -800 SUOX -384 ZNFN1A LYZ -944 GAS VEZATIN -34 LOC C12orf COX6A LOC LOC LOC SLC6A NRIP NOL1 -122LOC FRAT ABCC2 -3 HPS NFKB PNLIPRP CLECSF KLRK PRB ADAMTS SLC38A K-ALPHA KIAA RACGAP K6IRS KRT4 -83 NPFF -777 STAT2 -94 FLJ IFNG -795 MGC HAL -358 DKFZp434M LOC TSC -785 GPR109B -392 EPIM -568 GALNT LOC LOC LOC LOC LOC NURIT -947 RB DKFZP434K LOC LOC LOC LOC PROZ -865 CRYL POSTN -32 LOC EBPL -973 GUCY1B LOC LOC OR11H C14orf PSMA KTN C14orf166B -786 EVL -28 CCNB1IP NEDD BAZ1A -508 NFKBIA -963 LOC CDKL LOC RTN LOC PLEK PIGH -153 RDH FLJ KIAA SERPINA SERPINA LOC LOC LOC LOC LOC FLJ LOC HH PLA2G4B -483 CAPN LOC SLC28A DUT -32 SCG LIPC -853 OSTbeta -781 LOC COMMD LOC LOC FLJ LOC LOC CHSY LOC LOC LOC LOC LOC FLJ AVEN -767 KIAA FBN SPPL2A -4 BCL2L LOC BNIP BNIP RASL SNAPC BG LOC LOC FLJ IP -207 TBL3 0 KIAA TNFRSF12A -968 DNAJA3 -24 ALG FLJ LOC TMC MGC RBBP ITGAX -504 ERAF -510 LOC FLJ CES MT1H -280 GAN -839 PLCG CDH HSBP MLYCD -917 FLJ DPEP FLJ FLJ MGC FLJ LOC LOC DOC2A -265 LOC LOC ZNF DNCLI DKFZP434A LOC CHST CHST LOC DPH2L1 -42 LOC MAP2K KRTAP JJAZ CCL PSMB LOC FLJ SP2 -57 LOC TBX DDX LOC DKFZP586L SSTR MRPS MRPS LOC NARF -669 GEMIN OR1D ALOX SLC16A CLECSF CLECSF FLJ RCV CDRT NOS2A -287 KRT25D -828 KRT HUMGT198A -797 HUMGT198A -690 FLJ LOC GIP -957 LOC UNC13D -695 LOC LOC SEH1L -801 LOC LOC CABLES CABYR -908 DSG SLC14A DCC -386 RAB27B -713 ZCCHC LOC LOC MYOM MC2R -113 LOC KIAA FBXO FBXO LOC TXNL4 -33 CDC GZMM -678 C19orf ARID3A -913 LOC MGC TRAPPC LOC OR7C OR10H OR10H LOC HSPC PGLS -935 LOC ZNF CLECSF PSMC EGLN LOC SYNGR RPL13A -816 LOC FLJ LOC LAIR KIR2DL KIR3DL2 3 ZNF ZNF MGC ZNF LOC APBA3 -13 FUT TNFSF7 8 SH2D3A D6A -950 EIF3S RAB3D -852 MGC NDUFB LOC IL12RB1 -56 LOC CEBPA -564 UNQ FLJ CLC -823 DYRK1B -849 PSG PSG PSG FLJ ERCC DMPK -988 PGLYRP LIG FLJ CGB TEAD FLJ LOC SIGLEC ZNF ZNF ZNF ZNF NALP PRDM LOC PADI FLJ DJ462O PPP1R8 5 ATPIF LOC CGI FLJ UROD -715 LOC DKFZp761D DKFZp761D IL23R -322 CTH -6 AK DNAJB CDC LOC DCLRE1B -406 LOC LOC LOC LOC LOC BNIPL -420 BNIPL -419 SPRR1B -826 IL6R -110 CKS1B -983 SYT PMF LOC FY -397 NCSTN -809 HSPA HSPA CGI-01 7 DKFZP564J HFL HFL NEK MGC OR2AK LOC LOC LOC LOC ARID3A -913 LOC MGC TRAPPC LOC OR7C OR10H OR10H LOC HSPC PGLS -935 LOC ZNF CLECSF GRE-modifying SNPs

191 Gene set enrichment analysis

192 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power - Candidate set selection a.Regulatory polymorphism b.Coding polymorphism - Statistical considerations a.Power b.Differential enrichment

193 Population prevalence design Outcome-stratified design

194 Population prevalence design GEscan Outcome-stratified design

195 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power - Candidate set selection a.Regulatory polymorphism b.Coding polymorphism - Statistical considerations a.Power b.Differential enrichment

196

197 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power - Candidate set selection a.Regulatory polymorphism b.Coding polymorphism - Statistical considerations a.Power b.Differential enrichment

198 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power

199 Technical take-home points: Strengths & weaknesses of alternative approaches 1.Candidate gene studies: focus on 1 candidate Advantages - Scientifically tractable: incremental & cross-validatable - Maximal statistical power (focused hypothesis) Disadvantages - Can only “discover” what we already know (i.e., biased) 2.Genome-wide association studies: focus on all candidates Advantages - Unbiased de novo discovery Disadvantages - Minimal statistical power, particularly for interactions 3.The bioinformatic “middle road”: focus on a small set of causally plausible candidates (unbiased search of regulatory and coding SNPs) Advantages - Scientifically tractable: “short leap of inference” & cross-validatable - Relatively high statistical power (focus on 1-10% of plausible SNPs) Disadvantages - Likely missing some true causal genetic influences - Bioinformatically intensive – thought (and programming) required

200 Take-home points for this group:

201 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume.

202 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find.

203 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:

204 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.)

205 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)

206 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes) - combinatorial data-mining (e.g., machine learning in discovery sample)

207 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes) - combinatorial data-mining (e.g., machine learning in discovery sample) - sequential testing designs (low stringency discovery, med stringency test, high stringency confirm)

208 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes) - combinatorial data-mining (e.g., machine learning in discovery sample) - sequential testing designs (low stringency discovery, med stringency test, high stringency confirm) Your advantage is smart data analysis.

209 Follow-up references Overview of genetics / biology Attia, J., et al. (2009) How to use an article about genetic association: A: Background concepts. JAMA, 301, Genetic association studies Hirschhorn, J., & Daly, M. (2005) Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6, Attia, J., et al. (2009) How to use an article about genetic association: B: Are the results of the study valid? JAMA, 301, Cordell, H, & Clayton, D. (2005) Genetic epidemiology 3: Genetic association studies. Lancet, 366, Basic statistical modeling for genetics Siegmund, D., & Yakir, B. (2007) The statistics of gene mapping. New York, Springer Sampling & statistical approaches for GxE discovery Thomas, D., (2010) Gene-environment-wide association studies: emerging approaches. Nature Reviews Genetics, 11, Statistical strategies for combinatorial discovery Hastie, T., Tibshirani, R. & Friedman, J. (2001) The elements of statistical learning. New York, Springer..

210 Perspectives on the State of the Field How can we best promote the integration of genetic and demographic approaches?

211 Application clinic Open microphone 1.What do you want to accomplish? 2.At what stage are you now? i.Study design? ii.Data collection? iii.Analysis and reporting? 3.How can we be of help?

212 Genomics Workshop Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA April 14, 9:00 AM to 3:30 PM – Hyatt Regency, Dallas, Texas Sponsored by USC/UCLA Center of Biodemography and Population Health Organized by Teresa Seeman, Steven Cole, Eileen Crimmins

213

214 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies - RT-PCR - Statistical analyses incorporating temporal & spatial heterogeneity Genome-wide approaches - Microarrays - Theme discovery a.Functional (Gene Ontology) b.Regulatory (TELiS) c.Spatial (SpAnGEL)

215 RNA DNA RT

216 IFN-  Antiviral cytokine mRNA IFN-  consensus mRNA (fold-induction over baseline) Exposure (hrs.) IFN-  IFN-  mRNA (fold-induction over baseline) Exposure (hrs.) CpG + NE CpG Collado-Hidalgo et al (2006) Brain, Behavior and Immunity

217

218 SIV RNA (in situ hybridization) SIV replication Social Stress - + SIV replication (sites / spatial quadrat) p <.0001 SNS neurons - + SIV replication (sites / spatial quadrat) p <.0001 Sloan et al. (2006) Journal of Virology Sloan et al. (2007) Journal of Neuroscience

219 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies - RT-PCR - Statistical analyses incorporating temporal & spatial heterogeneity Genome-wide approaches - Microarrays - Theme discovery a.Functional (Gene Ontology) b.Regulatory (TELiS) c.Spatial (SpAnGEL)

220

221 Lonely Integrated Social isolation J. Cacioppo Genome Biology,

222 Palmer et al. BMC Genomics (2006)

223 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies - RT-PCR - Statistical analyses incorporating temporal & spatial heterogeneity Genome-wide approaches - Microarrays - Theme discovery a.Functional (Gene Ontology) b.Regulatory (TELiS) c.Spatial (SpAnGEL)

224 Social Environment Gene Biological function IL6 RNA DNA

225 Social Environment Gene Biological function IL6 RNA DNA

226 Social Environment Gene Biological function IL6 RNA DNA

227 Social Environment Gene Biological function IL6 RNA DNA

228 Social Environment Gene Biological function IL6 RNA DNA

229 Lonely Integrated Social isolation J. Cacioppo Genome Biology,

230 Lonely Integrated Social isolation J. Cacioppo Genome Biology, Inflammation Cell growth/differentiation Transcription control

231 Lonely Integrated Social isolation J. Cacioppo Genome Biology, Inflammation Cell growth/differentiation Transcription control Immunoglobulin production Type I interferon antiviral response

232

233 TRIM54 ACSBG2 HIST4H4 KLHL32 FLJ35773 GPC4 TRPV4 LBP C20ORF200 ASB15 OCLM

234

235 Social Environment Gene Biological function IL6 RNA DNA

236

237

238 Sp1 CREB NF-  B

239 Sp1 CREB NF-  B

240 Sp1 CREB NF-  B

241 Sp1 CREB NF-  B Environment S equence Expression Promoter Sequence

242 Sp1 CREB NF-  B Environment S equence Expression Promoter Sequence

243 Sp1 CREB NF-  B Environment S equence Expression Promoter Sequence

244 Sp1 CREB NF-  B

245 Sp1 CREB NF-  B

246 Sp1 CREB NF-  B Environment S equence Expression Promoter Sequence ?

247 Sp1 CREB NF-  B

248 Sp1 CREB NF-  B

249 Sp1 CREB NF-  B

250 Sp1 CREB NF-  B

251 Cole et al (2005) Bioinformatics, 21, 803

252 Cole et al (2005) Bioinformatics, 21, 803

253 Cole et al (2005) Bioinformatics, 21, 803

254 Lonely Integrated Social isolation J. Cacioppo Genome Biology,

255 Lonely Integrated Social isolation J. Cacioppo Genome Biology, NF-  B

256 Lonely Integrated Social isolation J. Cacioppo Genome Biology, NF-  B GRE

257 Social Environment Gene Biological function IL6 RNA DNA

258

259

260

261

262 NaB de-repression - fibroblast

263 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41

264 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1

265 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2

266 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3

267 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3

268 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3

269 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3

270 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3 miRNA 1 miRNA 2 miRNA 3

271 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3 miRNA 1 miRNA 2 miRNA 3 DNMT1 DNMT2 DNMT3

272 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies - RT-PCR - Statistical analyses incorporating temporal & spatial heterogeneity Genome-wide approaches - Microarrays - Theme discovery a.Functional (Gene Ontology) b.Regulatory (TELiS) c.Spatial (SpAnGEL)

273 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies Genome-wide approaches

274 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies Genome-wide approaches 3.Gene-Environment interaction Statistical considerations - Main effects and antagonistic pleiotropy - Interaction models - Combinatorial discovery Revisiting the “bioinformatic” middle road - Candidate set selection a.Regulatory polymorphism b.Coding polymorphism

275 Fisher’s regression: GG GC CC Outcome y = a + b(#G) y = a + b(GG) + c(GC) + d(CC)

276 Fisher’s regression: GG GC CC Outcome y = a + b(#G) + c(Env) + d(#G x Env) y = a + b(GG) + c(GC) + d(CC) + e(Env) + f(Env x GG) + g(Env x GC) + h(Env x CC) Environment A GG GC CC Outcome Environment B

277 Combinatorial explosion 10 7 SNPs x environments = intx terms

278 Combinatorial explosion 10 7 SNPs x environments = intx terms N = 2,000-20,000 for current main effect studies Given that power/effect size, need 2 Million subjects for interaction sweep.

279 What to do? 1.Increase stringency (intra-study) Bonferroni correct / FDR correct Model/simulate error Use a better sampling design 2.Replicate (inter-study or intra-study crossvalidation) 3.Get a hypothesis -Biological -Empirical

280 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint

281 Population prevalence design Outcome-stratified design

282

283 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint

284 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation

285 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM

286 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units

287 Transcriptional activity (fold-change) IL6 promoter: WT -174C Norepinephrine (  M): Difference: p <.0001 IL6 TCT TGCGATGCTA AAG C V$GATA1_01 =.943 V$GATA1_01 =.619 In silico prediction of Gene x Environment Interaction In silico In vitro

288 FLJ LOC AKR7A RHCE -292 LOC SOC -39 SOC -49 SOC -26 UNQ LAPTM PHC PHC2 -16 ITGB3BP -311 FLJ ZNF FUBP LOC LOC PDE4DIP -175 COAS LOC LOC LOC LOC FLG -17 LEP RAB LOC LOC LOC PKLR -118 PKLR -597 FCRH SPTA SLAMF KCNJ ITLN F11R -798 LMX1A -85 SELP -144 LOC F13B -881 MYOG -951 LOC LGTN -331 FLJ GPATC LOC AGT 1 FLJ LOC LOC MGC KIAA LOC LOC MIG MIG MIG LOC LOC LOC LOC LOC LOC LOC LOC FLJ LOC LOC C20orf STK PIGT -910 DNTTIP C20orf67 -1 MMP CEBPB -978 RNPC TH1L -26 LOC LOC CGI FKHL C20orf TGM LOC Kua-UEV -465 Kua-UEV -561 Kua -465 BTBD C21orf C21orf KRTAP B3GALT LOC LOC CLDN8 -17 KRTAP DSCR C21orf KRTAP FTCD -410 LOC PEX PEX ZNF LOC SMARCB CABIN KIAA ARP ADSL -602 ARHGAP NUP PPARA -184 BID -126 DGCR TXNRD LOC LOC LOC LOC GSTT SEC14L SSTR FLJ DIA ATP5L A4GALT -825 SULT4A C2orf LOC LOC IL1RL MRPS LOC IL1F MGC MGC MGC MAP1D -120 COL3A SLC39A LOC IL8RB -447 TUBA FLJ ALPPL UGT1A UGT1A UGT1A UGT1A TRPM ASB GCKR -204 LOC FLJ MSH MSH MSH SBLF -59 LOC LOC SEMA4F -751 RBM29 -1 LOC LOC LOC TXNDC FLJ LOC LOC ORC4L -16 ARL NR4A ATP5G3 -55 ZNF ZSWIM PGAP PGAP SF3B ORC2L -786 LOC CRYGC -765 PECR -942 SLC23A LOC LOC LOC LOC ALK -710 BCL11A -615 PAP -438 PAP -531 CNTN PPARG -584 PPARG -914 LOC GALNTL FBXL APRG APRG LOC LOC LOC NR1I STXBP5L -480 LOC MRPS KCNAB LOC LOC NLGN FLJ ATP2B LOC ANKRD LOC FLJ SLC4A MST LOC LOC CPOX -150 LOC CBLB -250 LOC GPR IQCB MGC LOC KIAA MGC LOC LRRC KIAA LOC IBSP -319 MGC NDST LOC LOC FLJ CYP4V LOC LOC LOC ZAR LOC PF EIF4E -716 ADH TACR AGXT2L PLA2G12A -795 PITX LOC CDHJ -652 FGA -110 PPID -384 LOC GPM6A -203 LOC LOC LOC LOC FGFBP LOC FLJ FLJ FLJ LOC SRD5A LOC LOC MGC PELO -938 BDP DKFZp564C LOC TSLP -331 LOC SNCAIP -671 LOC SLC27A CDC42SE PHF LOC PCDHA4 -26 PCDHB PCDHB PCDHB ABLIM LARP -716 LOC FGFR FGFR LOC LOC LOC LOC LOC OR2V OR2V TPPP -454 MYO LOC GDNF -36 LOC FOXD ARSB -493 DHFR -473 SPATA CHD STK22D -863 LOC CDO FLJ LOC ALDH7A CAMK2A -429 C5orf LOC DUSP LOC NQO MRS2L -22 HIST1H2BA -960 HIST1H2BD -597 HIST1H2BH -618 HIST1H4I -283 HLA-H -477 MRPS18B -207 LOC LOC NFKBIL LY6G5B -359 C6orf HSPA1B -942 C HLA-DRA -774 HLA-DQA ZBTB LOC TLT C6orf KIAA C6orf C6orf POU3F LOC C6orf LOC LOC LOC LOC LOC SERPINB OFCC LOC SMA LOC LOC OR12D LOC HCG4P6 -80 HCG4P PSORS1C2 -78 HLA-C -512 HLA-B -594 HLA-DRB HLA-DRB HLA-DQB2 0 HLA-DQB HLA-DQB2 0 HLA-DOB -500 MLN -740 LRFN C6orf PLA2G CRISP IL17F -733 HMGCLL LOC C6orf DJ467N RTN4IP SLC22A LOC DEADC FLJ SYNE SYNE LOC LOC PIP3-E -457 T -9 T -3 LOC DKFZP434J LOC LOC GHRHR -646 ADCYAP1R1 -60 C7orf LOC GPR C7orf BLVRA -400 LOC WBSCR LOC LOC FZD LOC LOC AKR1D LOC OR2F OR2A LOC LOC LOC LOC LOC LOC ICA AGR2 -65 LOC LOC CRHR PDE1C -20 LOC LOC LOC LOC LOC LOC CCL SEMA3C -385 C7orf PON GATS -36 ACHE -715 ACHE -224 ACHE -715 ACHE -224 ORC5L -990 CHCHD MGC LOC FLJ HIPK2 -70 ZC3HDC LOC BAGE BAGE MCPH MCPH AMAC -766 NEIL NEF PNOC -756 LOC FKSG2 -72 DKFZp586M SNTG LOC ADHFE1 -54 SULF WWP LOC FLJ LOC LOC ANGPT SPAG SPAG SPAG DEFB LOC ASAH ASAH FLJ FLJ SNAI CPA FSBP -393 MFTC -905 MRPL LOC TOP1MT -477 LOC LOC DOCK LOC C9orf SH3GL C9orf LOC LOC DKFZP434M SECISBP LOC PHF PHF LOC LOC PRG RAD23B -998 SLC31A OR1N C9orf54 -2 LAMC LOC DBH -768 OBP2A -732 EGFL EGFL TRAF2 -32 LOC LOC C9orf SLC24A IFNA IFNA C9orf C9orf UNQ STOML LOC LOC HNRPK -86 LOC DIRAS LOC TXNDC TXN -239 OR1L DYT ABO -790 ABO -789 ABO -790 XPMC2H -374 LOC LOC LOC FCN FCN LOC GAGE1 -21 RRAGB -788 LOC SH3BGRL -944 DIAPH HSU NXF2 -89 PLP LOC SLC6A LOC FLJ MAGEB LOC LOC UBE LOC LOC DMD -923 RPGR 3 ZNF PRKY -308 LOC LOC LOC LOC LOC UCN NET1 -14 MAPK LOC CDC SLC29A LOC LOC LIPL3 -68 LIPL LOC LOC LOC FRAT ABCC2 -3 HPS NFKB PNLIPRP DMBT FANK1 3 TAF LOC LOC TPRT -277 C10orf C10orf ZNF33A -477 LOC PPYR1 -81 LOC AKR1C LOC LOC NEUROG3 6 AMID -452 PPP3CB -854 LOC LOC MMS19L -221 C10orf GPR C10orf ASB IL15RA -222 IL15RA -827 USP6NL -573 C10orf NMT SIAT8F -676 NEBL -727 C10orf LOC LOC CTGLF LOC KCNQ1 -40 LOC OR51F TRIM OR10A SAA SAA LOC PDHX -845 TRIM LOC NDUFS LOC OR5T3 -97 CTNND CTNND CNTF -149 ROM MARK RAB1B -75 GSTP LOC USP LOC OR4D OR8G MGC LOC LOC NUP KIAA LOC LOC LOC LOC LOC HPS LOC LOC FLJ OR8H AGTRL PRG TCN RAB3IL KIAA CHRDL KCTD MRE11A -879 MRE11A -982 MMP CRYAB -175 ZNF LOC LOC CCND NDUFA KCNA FLJ FLJ LOC LOC FLJ LOC PFKM -838 DKFZp686O C12orf DGKA -806 DGKA -800 SUOX -384 ZNFN1A LYZ -944 GAS VEZATIN -34 LOC C12orf COX6A LOC LOC LOC SLC6A NRIP NOL1 -122LOC FRAT ABCC2 -3 HPS NFKB PNLIPRP CLECSF KLRK PRB ADAMTS SLC38A K-ALPHA KIAA RACGAP K6IRS KRT4 -83 NPFF -777 STAT2 -94 FLJ IFNG -795 MGC HAL -358 DKFZp434M LOC TSC -785 GPR109B -392 EPIM -568 GALNT LOC LOC LOC LOC LOC NURIT -947 RB DKFZP434K LOC LOC LOC LOC PROZ -865 CRYL POSTN -32 LOC EBPL -973 GUCY1B LOC LOC OR11H C14orf PSMA KTN C14orf166B -786 EVL -28 CCNB1IP NEDD BAZ1A -508 NFKBIA -963 LOC CDKL LOC RTN LOC PLEK PIGH -153 RDH FLJ KIAA SERPINA SERPINA LOC LOC LOC LOC LOC FLJ LOC HH PLA2G4B -483 CAPN LOC SLC28A DUT -32 SCG LIPC -853 OSTbeta -781 LOC COMMD LOC LOC FLJ LOC LOC CHSY LOC LOC LOC LOC LOC FLJ AVEN -767 KIAA FBN SPPL2A -4 BCL2L LOC BNIP BNIP RASL SNAPC BG LOC LOC FLJ IP -207 TBL3 0 KIAA TNFRSF12A -968 DNAJA3 -24 ALG FLJ LOC TMC MGC RBBP ITGAX -504 ERAF -510 LOC FLJ CES MT1H -280 GAN -839 PLCG CDH HSBP MLYCD -917 FLJ DPEP FLJ FLJ MGC FLJ LOC LOC DOC2A -265 LOC LOC ZNF DNCLI DKFZP434A LOC CHST CHST LOC DPH2L1 -42 LOC MAP2K KRTAP JJAZ CCL PSMB LOC FLJ SP2 -57 LOC TBX DDX LOC DKFZP586L SSTR MRPS MRPS LOC NARF -669 GEMIN OR1D ALOX SLC16A CLECSF CLECSF FLJ RCV CDRT NOS2A -287 KRT25D -828 KRT HUMGT198A -797 HUMGT198A -690 FLJ LOC GIP -957 LOC UNC13D -695 LOC LOC SEH1L -801 LOC LOC CABLES CABYR -908 DSG SLC14A DCC -386 RAB27B -713 ZCCHC LOC LOC MYOM MC2R -113 LOC KIAA FBXO FBXO LOC TXNL4 -33 CDC GZMM -678 C19orf ARID3A -913 LOC MGC TRAPPC LOC OR7C OR10H OR10H LOC HSPC PGLS -935 LOC ZNF CLECSF PSMC EGLN LOC SYNGR RPL13A -816 LOC FLJ LOC LAIR KIR2DL KIR3DL2 3 ZNF ZNF MGC ZNF LOC APBA3 -13 FUT TNFSF7 8 SH2D3A D6A -950 EIF3S RAB3D -852 MGC NDUFB LOC IL12RB1 -56 LOC CEBPA -564 UNQ FLJ CLC -823 DYRK1B -849 PSG PSG PSG FLJ ERCC DMPK -988 PGLYRP LIG FLJ CGB TEAD FLJ LOC SIGLEC ZNF ZNF ZNF ZNF NALP PRDM LOC PADI FLJ DJ462O PPP1R8 5 ATPIF LOC CGI FLJ UROD -715 LOC DKFZp761D DKFZp761D IL23R -322 CTH -6 AK DNAJB CDC LOC DCLRE1B -406 LOC LOC LOC LOC LOC BNIPL -420 BNIPL -419 SPRR1B -826 IL6R -110 CKS1B -983 SYT PMF LOC FY -397 NCSTN -809 HSPA HSPA CGI-01 7 DKFZP564J HFL HFL NEK MGC OR2AK LOC LOC LOC LOC ARID3A -913 LOC MGC TRAPPC LOC OR7C OR10H OR10H LOC HSPC PGLS -935 LOC ZNF CLECSF GRE-modifying SNPs

289 Population prevalence design GEscan Outcome-stratified design

290 Coding sequence polymorphisms

291 gene 1 gene 2 gene 3 gene 4 gene 5 gene 6 gene 7 gene 8 gene 9 gene 10 gene 11 gene 12 gene 13 gene 14 gene 15 gene 16 gene 17 gene 18 gene 19 gene 20 gene 21 gene 22 gene 23 gene 24 gene 25 gene 26 gene 27 gene 28 gene 29 gene 30 gene 31 gene 32 gene 33 gene 34 gene 35 gene 36 gene 37 gene 38 gene 39 gene 40 gene 41 TF 1 TF 2 TF 3

292 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units

293 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units Why is this critical?

294 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units Why is this critical? Antagonistic pleiotropy is the norm → GxE

295 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units Why is this critical? Antagonistic pleiotropy is the norm → GxE Epistatic interaction is the norm → GxG

296 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units Why is this critical? Antagonistic pleiotropy is the norm → GxE Epistatic interaction is the norm → GxG High-order interactions are likely normal → GxGxExE

297 Combinatorial discovery strategies Smart study design + smart statistics + biological constraint Stratified sampling Multi-stage testing Cross-validation Data-mining / Machine learning -CART/forests -MARS -PRIM Functional pathways Regulatory pathways Chromosomal units Why is this critical? Antagonistic pleiotropy is the norm → GxE Epistatic interaction is the norm → GxG High-order interactions are likely normal → GxGxExE Low power, “replication failure”, and epistemological slop - the missing “h”, and the missing “E”

298 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies Genome-wide approaches 3.Gene-Environment interaction Statistical considerations - Main effects and antagonistic pleiotropy - Interaction models - Combinatorial discovery Revisiting the “bioinformatic” middle road - Candidate set selection a.Regulatory polymorphism b.Coding polymorphism

299 Technical aspects of study design and data analysis Study designs, assay technologies, and statistical methods 1.“Gene discovery” (e.g., genetic epidemiology) Candidate gene studies Genome-wide association studies The bioinformatic “middle road” – biological hypotheses buy power 2.Environmental regulation of health (via transcription) Candidate transcript studies Genome-wide approaches 3.Gene-Environment interaction Statistical considerations Revisiting the “bioinformatic” middle road

300 Take-home points for this group:

301 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume.

302 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find.

303 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve:

304 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.)

305 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes)

306 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes) - combinatorial data-mining (e.g., machine learning in discovery sample)

307 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes) - combinatorial data-mining (e.g., machine learning in discovery sample) - sequential testing designs (low stringency discovery, med stringency test, high stringency confirm)

308 Take-home points for this group: 1.Gene-Environment interactions are likely far more… - ubiquitous - large in effect size - clinically/socially meaningful …than current genetic analyses presume. There is plenty left for you to find. 2.If you have the study you have (i.e., can’t alter sampling design), your major opportunities for increasing power/discovery involve: - focusing on substantive effects that are true/big (e.g., GxE, not G, given antagonistic pleiotropy; E, ExE, GxG, etc.) - modeling biological mechanisms to focus power/impose constraints (e.g., candidate systems, functional themes, regulatory themes) - combinatorial data-mining (e.g., machine learning in discovery sample) - sequential testing designs (low stringency discovery, med stringency test, high stringency confirm) Your advantage is smart data analysis.

309 Follow-up references Overview of genetics / biology Attia, J., et al. (2009) How to use an article about genetic association: A: Background concepts. JAMA, 301, Genetic association studies Hirschhorn, J., & Daly, M. (2005) Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics, 6, Attia, J., et al. (2009) How to use an article about genetic association: B: Are the results of the study valid? JAMA, 301, Cordell, H, & Clayton, D. (2005) Genetic epidemiology 3: Genetic association studies. Lancet, 366, Basic statistical modeling for genetics Siegmund, D., & Yakir, B. (2007) The statistics of gene mapping. New York, Springer Sampling & statistical approaches for GxE discovery Thomas, D., (2010) Gene-environment-wide association studies: emerging approaches. Nature Reviews Genetics, 11, Statistical strategies for combinatorial discovery Hastie, T., Tibshirani, R. & Friedman, J. (2001) The elements of statistical learning. New York, Springer..

310 Perspectives on the State of the Field How can we best promote the integration of genetic and demographic approaches?

311 Application clinic Open microphone 1.What do you want to accomplish? 2.At what stage are you now? i.Study design? ii.Data collection? iii.Analysis and reporting? 3.How can we be of help?

312 Genomics Workshop Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA April 14, 9:00 AM to 3:30 PM – Hyatt Regency, Dallas, Texas Sponsored by USC/UCLA Center of Biodemography and Population Health Organized by Teresa Seeman, Steven Cole, Eileen Crimmins

313

314 Richlin et al. Brain, Behavior & Immunity (2004)


Download ppt "Genomics Workshop Demography of Aging Centers Biomarker Network Meeting in Conjunction with the Annual Meeting of the PAA April 14, 9:00 AM to 3:30 PM."

Similar presentations


Ads by Google