1 ARIC Why Large Databases as a Topic? Publically available. Many relevant questions unanswered. Information about health disparities. Good for testing/validating new methods. Relevant for health disparities Health care not econometrics; Cohort not claims

2 ARIC Outline ARIC Study Description Science Results Health disparities Genomic data Other large database

3 ARIC The Atherosclerosis Risk in Communities (ARIC) Study is an NHLBI-sponsored study of cardiovascular disease in four communities in the United States. Includes a Community Surveillance and a Cohort Component.

6 Cohort Component Probability samples of 4 communities 15,792 men and women 45-64 yrs at baseline examination (1987-1989) Re-examined every three years 1987-1989, 1990-1992, 1993-1995, 1996-1998 Extensive examinations include medical, social and demographic data Annual follow-ups by telephone to maintain contact and assess health status

7 ARIC Community Surveillance Component CVD endpoint surveillance of all residents of the 4 communities, ages 35-74 years Ascertainment and classification of coronary and cerebral clinical events, trends over time


9 Characteristics of the Four ARIC Communities Study CommunityPopulation% Ages 35-74TotalBlack>12 education Forsyth County, NC95,863243,6832463 Jackson, MS68,303202,8954871 Minneapolis suburbs, MN69,338192,004185 Washington County, MD45,539113,068460 US Total279,043751,650

10 ARIC Measure Variation in Cardiovascular Risk Factors, Medical Care & Disease by Race, Sex, Place & Time ARIC communities differ in their reported cardiovascular mortality rates; atherosclerosis prevalence rates may also differ Ecologic comparison of community rates with factors that may influence these rates Study CommunityAll-Cause MortalityHeart Disease Mortality MenWomenMenWomen Forsyth County, NC16. Jackson, MS20.810.06.62.9 Minneapolis suburbs, MN9. Washington County, MD16. US Total14. Age-adjusted mortality rates* for men & women aged 35-74 years in ARIC study communities, 1980 *indirect age adjustments; annual rate per 1,000 population

11 ARIC Sampling Framework Probability sample from the previous census, except for Jackson, MS, which is an all black sample In Forsyth, the original sampling unit was a household. In the other three locations, the sampling unit was an individual Jackson, MS – drivers license database Minneapolis, MN – eligible for jury duty (drivers license and voters) Washington County, MD – drivers license database

12 ARIC Achilles Heal Given what I just said, what is ARICs Achilles heal?

13 ARIC Achilles Heal Given what I just said, what is ARICs Achilles heal? Confounding between race and geography! Terrible decision!

14 ARIC Elements of Baseline Examination Sitting blood pressure – 3 measurements w/ random zero sphygmomanometer Anthropometry – weight, standing & sitting height, triceps & subscapular skinfolds, waist, hip, arm & calf girths, wrist breadth Venipuncture – fasting blood samples for lipids, hemostasis, hematology & chemistry Electrocardiogram – digitally recorded 12-lead electrocardiogram & 2-minute rhythm strip

15 ARIC Lipid Determinations 8 or 12 hour (overnight) fast information Central laboratory CDC certification Cholesterol measured enzymatically HDL measured by precipitation LDL estimated by Friedewald formula LDL=Total – HDL - (Trigs/5)

16 ARIC ATP III Classification LDL CholesterolDescription <100Optimal 100-129Near optimal 130-159Borderline high 160-189High >190Very High Total Cholesterol <200Desirable 200-239Borderline high >240High HDL Cholesterol <40Low >60High

17 ARIC Elements of Baseline Examination (contd) Ultrasound, postural change – B-mode scan for wall & lumen measurements in both carotid arteries & 1 popliteal artery; supine brachial & ankle blood pressures, heart rate & blood pressures as participant rises Interview – medical history, physical activity, TIA & respiratory symptoms, reproductive history, medication use, food frequency Pulmonary function – digitally recorded forced vital capacity & timed expiratory volumes

18 ARIC Elements of Baseline Examination (contd) Physical exam – brief exam including heart, lungs & extremities; neurologic & breast exam Medical data review – verify selected positive findings, report selected results to participants, refer for diagnosis or treatment Reporting of results (deferred) – mail results from routine medical tests to participants & their physicians

19 ARIC Definition of Hypertension Systolic BP > 140 mmHg Diastolic BP > 90 mmHg Regular use of medications for high blood pressure or hypertension (participants brought all medications with them to the examination)

20 ARIC Measurements of the Environment Smoking Alcohol Diet Exercise Education and income Psychosocial Employment GIS Some previous exposures Medications Biomarkers

21 ARIC Data Collection & Quality Control Immediate entry of data from interviews & exams into computer-assisted data collection system; data monitoring Trained & certified staff; monitored performance; implement recertification & retraining as needed Selected measures repeated during exams by same & different technicians Duplicate blood samples drawn & shipped to labs with separate IDs; duplicate electrocardiograms transmitted blindly to ECG center

22 ARIC Study Questions It is better to know some of the questions than all of the answers. James Thurber

23 ARIC Study Questions Diversity of measurements included in ARIC permits many important questions to be addressed 3 primary objectives Investigate the etiology and natural history of atherosclerosis Investigate the etiology of clinical atherosclerotic diseases (especially incident diseases) Measure variation in cardiovascular risk factors, medical care and disease by race, sex, place and time

24 ARIC Investigate the Etiology & Natural History of Atherosclerosis Ultrasound used to identify signs of early arterial disease Arterial wall dimensions; Arterial distensibility Expect atherosclerosis to be associated with the following lipid parameters Elevated levels of total cholesterol, LDL-C, apoB, Lp(a), TGs Reduced levels of HDL-C, apoA-I Predominance of small LDL DNA variations in specific genes (apolipoprotein E)

25 ARIC Investigate the Etiology & Natural History of Atherosclerosis (contd) Evaluate associations of atherosclerosis with factors that are less directly related to lipid and thrombosis theories Established risk factors (hypertension, smoking) Fasting insulin and glucose levels Routine hematologic measures (WBC, RBC and platelet counts, hematocrit) Lifestyle factors (diet, physical activity)

26 ARIC Investigate the Etiology of Clinical Atherosclerotic Diseases Study both risk factors and indicators of pre-clinical disease in relation to subsequent incident CHD and stroke Risk factors measured in ARIC permit testing of new hypotheses Indications of preclinical disease include not only ultrasound measurements but also Ankle-arm index of peripheral vascular disease Subtle changes in digitized electrocardiogram

27 ARIC Processed CCA and Plaque Images Fibrous cap segmentationPlaque segmentation CCA segmentation

28 ARIC Measurement/ascertainment of incident disease

29 ARIC Measurement/ascertainment of incident disease Study both risk factors and indicators of pre-clinical disease in relation to subsequent incident CHD and stroke Ascertainment of incident disease Limited to CHD, CVD, Stroke (hospitalized) Goal is 100% ascertainment Annual telephone contact Hospital record abstraction Death certificates, death indices Adjudication

30 ARIC Effects of Study Design ARICs ability to meet its objectives is enhanced by several design features Consistency is evaluated by studying associations in four geographic locations among men, women, blacks and whites Generalizability is examined by nesting cohorts into communities covered by broad surveillance Permits interpretation of study results in terms of representativeness of cohort participants & their CHD events in their communities & the characteristics of those communities

31 ARIC Effects of Study Design (contd) Surveillance rates are monitored and validated by each community cohort in two ways Replication of event identification, investigation and diagnosis activity Greater effort for accuracy that is afforded each potential cohort event Cohorts also provide information on risk factors, preclinical disease and medical care which are used to interpret the rates of clinical disease found in surveillance

32 ARIC Effects of Study Design (contd) ARIC cohort study is prospective Design of choice for identifying precursors of disease Important for studying any potential risk factor that may be influenced by disease or by changes in medications, diet or habits resulting from disease ARIC observes directly the early signs of atherosclerosis, assessing the association of factors with atherosclerosis in particular Attempts to unravel some complexity by investigating risk factor associations with both atherosclerosis and its clinical sequelae

36 A Sampling of ARIC Cohort Publications Risk factors / predictors of prevalent and incident: Coronary heart disease Stroke Diabetes Obesity Hypertension Venous Thromboembolism Renal dysfunction

37 ARIC A Sampling of ARIC Cohort Publications Risk factors / predictors of subclinical vascular diseases: Carotid atherosclerosis Cerebral infarcts, white matter disease Peripheral arterial disease Microvascular retinal disease Arterial stiffness Cardiac autonomic tone

38 ARIC ARIC Ancillary Studies To enhance the value of ARIC, welcome proposals from individual investigators to carry out ancillary studies and to promote the advancement of science An ancillary study is one based on information from ARIC participants in an investigation that is not described in the ARIC protocol Involves data collection or data analyses under additional funding that are not included as part of the routine ARIC data set or data analyses

39 ARIC Active ARIC Ancillary Studies Intimately tied to ARIC, with new data collection and external funding Periodontal disease, subclinical atherosclerosis and CVD Chronic inflammation of endodontic origin Longitudinal investigation of venous thromboembolism Life course SES and CVD Using historical records to reconstruct SES exposures in decedents Physical activity in context of the environment Cardiovascular responses to particulate air pollution

43 ARIC Cohort Baseline Characteristics Mean risk factor level Black Women Black MenWhite Women White Men HDL (mg/dl)57.850.457.442.6 LDL (mg/dl)138.0137.3135.6140.0 BMI (kg/m 2 )30.827.626.627.4

44 ARIC Summary of Incident Events 1987-2002 Black Women Black MenWhite Women White Men Stroke153 (6%)111 (8%)135 (2%)178 (4%) CHD184 (8%)190 (13%)389 (7%)857 (18%) *prevalent cases excluded

45 ARIC ARIC Baseline Characteristics: Gender/Racial Differences in Drinking, Smoking and BMI

46 ARIC Gender/Racial Differences in HDL Levels, by Drinking Status Low-Mod Drinker = 2 drinks/day Heavy Drinker = >2 drinks/day *P<0.001 * * * * * * * *

47 ARIC Gender/Racial Differences in TG Levels, by Drinking Status Low-Mod Drinker = 2 drinks/day Heavy Drinker = >2 drinks/day *P<0.05 * * *

48 ARIC CHD Risk is Influenced by Interaction between Drinking Status & Genotype

49 ARIC Stroke Risk is Influenced by Interaction between Drinking Status & Genotype

50 ARIC ARIC: Sustainable Philosophy Role of epidemiologic research in the investigation of etiologic hypotheses is one of active interchange with other disciplines Basic discoveries often come first in epidemiology Importance of specific lipoprotein fractions was found first in population studies, leading to specific investigations of cholesterol transport Multidisciplinary team of ARIC investigators hopes to promote such scientific interchange

51 ARIC ARIC: Future Goals Another examination of the entire cohort. Healthy aging Cognitive decline Imaging Some day we will all know our DNA sequence? First population-based cohort with the complete DNA sequence? Analysis

52 ARIC Genome-wide Association 300,000 – 1,000,000 markers Cases Controls SNP1SNP2SNP3 SNPn ….....

53 ARIC Genome-wide Scan Replicate 1: Replicate 2: Genome-wide Association Scan for CHD Ottawa Heart Institute #1 Cases (n=323): CABG, MI < 60 yrs, no FH, no DM Controls (n=312): asymptomatic, > 65 yr Ottawa Heart Institute #2 (304 cases/326 controls) Atherosclerosis Risk in Communities (ARIC)(n=15,782) 2,586 SNPs 50 SNPs 2 SNPs

54 ARIC SNP 107 and CHD risk 0 0.5 1 1.5 AA 4 8 12 16 AGGG AAAGGG 0 Relative Risk Absolute Risk

55 ARIC Predictive Ability of 9p21 Individual risk factors do not cause large changes in the area under the CHD Risk Score ROC curve. 55 AUC curves plot one minus specificity vs sensitivity, and they are used by regulatory agencies to evaluate new diagnostics.

56 ARIC ATP III Guidelines ATP III classification using ACRS + 9p21 allele ATP III classification using ACRS alone HighMid-highMidLow CHD and CHD risk equivalents 10-year risk >20% LDL-C goal <100 mg/dL High1,870 (372) 18.69% 1760 (360)109 (12) 3.95%* 00 Multiple (2+) risk factors 10-year risk 10–20% LDL-C goal <130 mg/dL Mid-high2,049 (219) 20.48% 217 (27) 10.59%* 1,701 (179)131 (13) 6.39%* 0 Multiple (2+) risk factors 10-year risk <10% LDL-C goal <130 mg/dL Mid1,737 (80) 17.36% 0179 (17) 10.31%* 1,558 (63)0 0–1 risk factor 10-year risk <10% LDL-C goal <160 mg/dL Low4,349 (107) 43.47% 0004,349 (107) Total10,004 (778) (100%) 1,977 (19.76%) 1989 (19.88%) 1,689 (16.88%) 4,349 (43.47%) * Percentage of people re-classified. (Number of events on 10 years of follow-up.)

57 ARIC The Future is Here!

58 ARIC The field of human genetics: the amount of data is growing # variants Year 1980s1990s200020072010 10s 1000s 100s 1x10 5 1x10 6 10x10 6 Candidate Genes Linkage GWAS Exome and Whole-genome sequencing

59 ARIC Potential to survey all genetic variation in the genome (or at least ~2.5 M variants!) Individual researchers can access this data Genome-wide association and whole-genome sequencing

60 ARIC Research Participants Informed consent Submitting Investigators Data Collection Submission & Management of Data GWAS Data Repository De-identified, Coded Data As a part of funding and generating GWAS data, public repositories have been developed Distribution & Secondary Use of Data Recipient Investigators Data Access Request Data Submission NIH Genome-Wide Association Studies Policy

61 ARIC NIH Genome-Wide Association Studies Policy dbGAP is one of the central repositories

62 ARIC Open Access (summary level) Search for studies, review protocols and questionnaires View summary phenotype and genotype data View pre-computed or published genetic associations (after embargo) Identify studies of interest, view their consent conditions, and review terms for data access Locate potential collaborators for follow up studies No individual data! NIH Genome-Wide Association Studies Policy

63 ARIC Controlled Access ( individual level) dbGaP Database Genotype & Phenotype Data Public Access Study Protocol Descriptive Information Coded Genotypes Phenotypes Pre-computes Controlled Access Specific Research Use Request data for specific research use Agreement by PI and institution to terms of access in the Data Use Certification Data Access Committee Specific access rights NIH Genome-Wide Association Studies Policy

64 ARIC Data Release

65 ARIC Framingham Heart Study

66 ARIC Framingham Heart Study In 1948, the Framingham Heart Study embarked on an ambitious project in health research. At the time, little was known about the general causes of heart disease and stroke, but the death rates for CVD had been increasing steadily since the beginning of the century and had become an American epidemic. Since 1971, the Framingham Heart Study has been conducted in collaboration with Boston University. Objective - to identify the common factors or characteristics that contribute to CVD by following its development over a long period of time in a large group of participants who had not yet developed overt symptoms of CVD or suffered a heart attack or stroke. recruited 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts,

67 ARIC Framingham Heart Study

68 ARIC Framingham Heart Study

69 ARIC Framingham Heart Study

70 ARIC Framingham SHARe

71 ARIC Framingham SHARe

72 ARIC Womens Health Initiative (WHI)

73 ARIC Womens Health Initiative (WHI) WHI is a long-term national health study (1993-2005) Objective: strategies for preventing heart disease, breast and colorectal cancer and osteoporotic fractures in postmenopausal women. 161,000 women ages 50-79 Two major parts: a randomized Clinical Trial and an Observational Study Clinical Trial (CT) enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Replacement Trials Dietary Modification Trial Calcium / Vitamin D Trial The Observational Study (OS) examines the relationship between lifestyle, health and risk factors and specific disease outcomes. This component involves tracking the medical history and health habits of 93,676 women. Recruitment for the observational study was completed in 1998 and participants were followed for 8 to 12 years.

78 ARIC Large datasets are not limited to genetic datasets

