Download presentation

Presentation is loading. Please wait.

Published byMatthew Holcombe Modified about 1 year ago

1
Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University School of Public Health

2
May 22, 2008Interface - RISK : Reality2 Network Analyses of Linked Data Sets ─ Yook (2002) developed network generators that captured the Internet’s topology; postulated preferential attachment and linear distance dependence. Yook, S.-H., Jeong, H., & Barabasi, A.-L Modeling the Internet’s large- scale topology. PNAS, 99, ─ Schwikowski (2000) built a protein-protein interaction network in yeast to predict protein function. Schwikowski, B., Uetz, P., & Fields, S A network of protein-protein interaction in yeast. Nature Biotechnology, 18 12,

3
May 22, 2008Interface - RISK : Reality3 Networks in Public Health ─ Jones (2003) reported on power-law scaling in sexual contact networks, relating the scaling coefficient to the rate of disease transmission and the threat of epidemic. Jones, J. H., & Handcock, M. S An assessment of preferential attachment as a mechanism for human sexual network formation. Proc. R. Soc. Lond. B, 270, ─ De (2004) used network centrality measures to identify key individuals in a gonorrhea outbreak. De, P., Singh, A. E., Wong, T., Yacoub, W. & Jolly, A. M Sexual network analysis of gonorrhea outbreak. Sex Transm Infect, 80,

4
May 22, 2008Interface - RISK : Reality4 Natural Mapping of a Data Set When linkages are not predefined, suitable criteria for identifying linkages must be developed. We propose a natural mapping of a data set onto a network: variables map to nodes and the associations among variables map to edges

5
May 22, 2008Interface - RISK : Reality5 The NHANES Data Set The National Health and Nutrition Examination Survey (NHANES) assesses the health and nutritional status of adults and children in the United States through interviews and physical examinations. The NHANES data set includes: ─ Demographics─ Laboratory test results ─ Dietary records─ Physiological measurements ─ General health information

6
May 22, 2008Interface - RISK : Reality6 Selecting Data to Map A selected subset of continuous measures from all four of the NHANES modules were included in the analysis. Continuous measures with small numbers of observations (< 20) were excluded. Examples: ─ Age (years) ─ Blood titers ─ Number of green vegetables eaten per month ─ Cardiovascular stress test measurements

7
May 22, 2008Interface - RISK : Reality7 Generating a Correlation Matrix We generated a correlation matrix that includes the Spearman correlation between every variable and every other variable. All the correlations were converted to their absolute value. We included correlations in in the matrix regardless of their significance.

8
May 22, 2008Interface - RISK : Reality8 Mapping the NHANES Data Set Variables were mapped to nodes. Spearman correlations among the variables were mapped to edges. The exact correlation was either retained as a measure of the strength of an association or was dichotomized (0, 1) based on a cutoff. Age (years) Body Mass Index 0.6 Age (years) Body Mass Index Cutoff = 0.7

9
May 22, 2008Interface - RISK : Reality9 Software SAS 9.1 – Integrate NHANES data modules and generate correlation matrix. UUCINET – Convert correlation data to network data. Netdraw – Visualize and analyze network data. KeyPlayer – Identify key players.

10
May 22, 2008Interface - RISK : Reality10 Networks by Cutoff Cutoff = 0.2Cutoff = 0.5Cutoff = 0.8

11
May 22, 2008Interface - RISK : Reality11 Distribution of Connections by Cutoff Cutoff = 0.2Cutoff = 0.5Cutoff = 0.8

12
May 22, 2008Interface - RISK : Reality12 Degrees and Unlinked Nodes Mean number of connections per node (degree) Percentage of unlinked nodes (isolates) Cutoff

13
Hubs and Key Players May 22, 2008Interface - RISK : Reality13 Hubs – Nodes with many connections (edges). Key Players – A set of N nodes that, in this case, is maximally correlated with the rest of the network.

14
May 22, 2008Interface - RISK : Reality14 10 Key Players For the entire weighted network: ─ Age (years)─ CD4 count (cells/mm 3 ) ─ Urine creatinine (mg/dl)─ CD8 count (cells/mm 3 ) ─ Upper arm length (cm)─ Alcohol fasting time (min) ─ Antacid / laxative fasting time (min) ─ Number of years taking insulin ─ How often wore hearing aid in the past year (number) ─ Lipid adjusted dioxin (pg/g)

15
May 22, 2008Interface - RISK : Reality15 Hubs and Key Players - Creatinine Nodes with higher degrees are larger. The purpple squares are the10 key players. Notice that the key players are not necessarily the largest hubs.

16
May 22, 2008Interface - RISK : Reality16 Urine Creatinine Ego Network Urine Elements e.g. Molybdenum Urine Creatinine Urine Phthalates Urine Phosphates

17
May 22, 2008Interface - RISK : Reality17 Hubs and Key Players – CD4, CD8 Nodes with higher degrees are larger. The blue squares are the 10 key players. Notice that the key players are not necessarily the largest hubs.

18
May 22, 2008Interface - RISK : Reality18 CD4, CD8, and Immunotoxins Isoflavones CD-4 counts CD-8 counts PCBs TCDDs

19
May 22, 2008Interface - RISK : Reality19 Conclusion Future directions: ─ Further exploration of scale-free (power law) properties of the NHANES data network. ─ Extend methodology to binary outcomes. ─ Account for negative correlations. ─ Investigate confounding. ─ Analyze additional data sets.

20
May 22, 2008Interface - RISK : Reality20 Network Terms Node – a junction point. Edge – a line connecting two nodes. Degree – the number of edges a node has. Hub – a node with many connections (edges). Key players – a group of nodes who together are connected to the maximum number of distinct nodes. Power distribution – f(x) ~ x- γ

21
May 22, 2008Interface - RISK : Reality21 A Basic Undirected Network Isolate – a node that is not connected to the rest of the network. Pendant – a node that is connected to the rest of the network by only one edge.

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google