Download presentation
Presentation is loading. Please wait.
Published byGladys Cobb Modified over 8 years ago
1
Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson, Anna Sillén, Göran Sedvall, Karolinska Institutet http://www.nada.kth.se/~stefan Data Mining in Schizophrenia Research -preliminary
2
Data mining in science? Hasty decision making or eternal truth? Underpinning of findings?...
4
Human Brain Informatics - HUBIN A project to accelerate research and development to find new treatments for human brain disease
5
Human Brain Informatics - HUBIN Intentions: To develop a uniform database for brain information from defined human subject groups To implement data from many research areas - “datadomains” - into the database To perform statistical and datamining analyses using data from all the data domains
6
Leading causes of disability in the world, WHO (1990) Cause of disability Total % of millions world total 1. Unipolar major depression50.810.7 2. Iron deficiency anemia22.04.7 3. Falls22.04.6 4. Alcohol use15.83.3 5. Chronic obstructive pulmonary disease14.73.1 6. Bipolar disorder14.13.0 7. Congenital anomalies13.52.9 8. Osteoarthritis13.32.8 9. Schizophrenia12.12.6 10. Obsessive compulsive disorder10.22.2
7
Schizophrenia - Questions and Clues Cause(s) of schizophrenia not known. Does not appear in animals-no experimental clues. Explanation models vary over time. Disturbed neuronal circuitry in schizophrenia? (currently hottest hypothesis) Influenced by genotype or/and environment? (clustering in families - but epidemiologic studies and studies on adopted twins suggest both causes)
8
Schizophrenia - Questions and Clues Which processes result in disease? Traces of disturbed development visible in MRI (anatomy) and blood tests? Genetic risk factors? Causal pathways?
9
Hubin organization Ethical group Göran Sedvall, Chairman Ethical group Göran Sedvall, Chairman Hubin AB Stig Larsson, Chairman Hubin AB Stig Larsson, Chairman Håkan Hall, CEO Hubin AB Stig Larsson, Chairman Hubin AB Stig Larsson, Chairman Håkan Hall, CEO Project staff Data domain responsibles Management group Håkan Hall, Assoc. Prof. (project manager) Stig Larsson, T.D. hc Göran Sedvall, Prof. Stefan Arnborg, Prof. Tom McNeil, Prof. Lars Therenius Prof. Management group Håkan Hall, Assoc. Prof. (project manager) Stig Larsson, T.D. hc Göran Sedvall, Prof. Stefan Arnborg, Prof. Tom McNeil, Prof. Lars Therenius Prof. Scientific advisory board Göran Sedvall, Chairman Nancy Andreasen, Univ of Iowa Paul Greengard, Rockefeller Univ Tomas Hökfelt, Karolinska Inst. Scientific advisory board Göran Sedvall, Chairman Nancy Andreasen, Univ of Iowa Paul Greengard, Rockefeller Univ Tomas Hökfelt, Karolinska Inst.
10
Preliminary analysis Test case: 144 subjects: 61 affected, 83 controls Variables: Diagnosis Demography Blood tests Genetics Anatomy (MRI)
11
In vivo imaging Magnetic resonance images(MRI) Functional magnetic resonance images (fMRI) Positron emission tomography (PET) Single photon emission tomography (SPECT) MRI PET In vitro imaging (whole hemispheres) Autoradiography In situ hybridization ISHH LAR Types of images used in HUBIN
12
T1T2 T1 T2 CSFwhite gray
13
Brain boxes Picture from BRAINS II manual, Magnotta et al, University of Iowa
14
Manually drawn vermis regions ROIs drawn by Gaku Okugawa
15
Single Nucleotide Polymorphism A U G U U C C A U U A U U G U A U G U U U C A U U A U U A U RNA: Protein APhe His Tyr Cys Phe non-coding SNP coding SNP Tyr Protein A’ Protein A can be slightly different from A´
16
Genes studied DBH dopamine beta-hydroxylase DRD2 dopamine receptor D2 + DRD3 dopamine receptor D3 HTR5A serotonin receptor 5A NPY neuropeptide Y SLC6A4 serotonin transporter BDNF brain derived neurotrophic factor NRG1 neuregulin +
17
Intracranial volume (ml) Cumulative distribution + = schiz o = controls Elementary Visualizations MRI Intracranial volume
18
Elementary Visualizations MRI data Total CSF volumes (ml) Cumulative distribution + = schiz o = controls p < 0.0002
19
Gamma GT Cumulative distribution + = schiz o = controls p < 0.01 Blood data Gamma GT- alcohol marker
20
Men Women Subcortical white + = schiz o = controls Subcortical white + = schiz o = controls Gender differences MRI
21
Which methods to use? Visualizations, cdf and scatter plots, give intuitive grasp of variables - problems with many interrelated variables Statistical modelling required to decide significance of visible trend, and to rank effects
22
Statistical methods Bayesian methods intuitive and rational - but conventional testing required for publications Linear models - need to account for mixing and over-dispersion Discretization and Bayesian analysis of discrete distributions - intuitive, but information lost Non-parametric randomization tests - most sensitive and accomodates modern multiple testing paradigms
23
Statistical methods Bayesian methods intuitive and rational - but conventional testing required for publications Linear models - need to account for mixing and over-dispersion Discretization and Bayesian analysis of discrete distributions - intuitive, but information lost Non-parametric randomization tests - most sensitive and accomodates modern multiple testing paradigms
24
Bayes’ factor Choice between two hypotheses, H1 and H2, given experimental/observational data D P(H1|D) P(D|H1) P(H1) P(H2|D) P(D|H2) P(H2) Posterior odds Bayes factor prior odds
25
Hypotheses in test matrix H1: (no effect) a data column is generated independently of diagnosis (composite model) H2: the data for controls are generated by one composite model, for affected by another one.
26
Non-parametric Bayesian methods- Do the three point sets have the same underlying distribution, or not? Which is the alternative hypothesis?
27
Graphical models Y Z X Y Z X Y Z X f(x,y,z)=f(x)f(y)f(z) f(x,y,z)= f(x,z)f(x,y)/f(x) f(x,y,z)
28
V-structures, causality X Y A B C A B C X Y A C A C | B A C A C | B V-structures detectable from observational data Indistinguishable A B C f(x,y)=f(y|x)f(x) =f(x|y)f(y)
29
Pairs associated to Diagnosis Y Z D Y Z D Y Z D Y Z D Y and Z co-vary differently for Affected and Controls
30
Age-dependency of Posterior Superior Vermis Age at MRI Post sup vermis + = schiz o = controls
31
No co-variation between Posterior inferior vermis and parietal white for affected Parietal white Post inf vermis + = schiz o = controls
32
MRI volumes, blood, demography Dia BrsCSF TemCSF SubCSFTotCSF Multivariate characterization by graphical models
33
Adding Vermis variables Dia BrsCSF TemCSF PSV
34
PSV has best explanatory power affected - healthy Posterior superior vermis + = schiz o = controls
35
Decision tree for Diagnosis MRI Data A = schiz C = controls () = misscls
36
Classification explains data! X Y Z X Y Z H W W
37
Autoclass1 Total gray A= schiz C= controls
38
Weak signals in genetics data Numerous investigations have indicated ‘almost significant’ signals of SNP:s to diagnosis Typically, these findings cannot be confirmed in other studies - populations genetically heterogeneous and measurements nonstandardized. We try to connect SNP:s both to diagnosis and to other phenotypical variables Multiple testing and weak signal problems.
39
Empirical distribution by genotype Gene BDNF (schiz + controls) Frontal CSF A/A A/G G/G Cumulative distribution
40
Compensating multiple comparisons Bonferroni 1937: For level and n tests, use level /n Hochberg 1988: step-up procedure Benjamini,Hochberg 1996: False Discovery Rate J. Storey, 2002: pFDRi, pFDRd Bayesian interpretations being developed (Wasserman & Genovese, 2002)
41
2-var associations abs(a c) ‘no effect’ Observed p-values FDRi 71 FDRd 62 Bonferroni-Hochberg-Benjamini methods MRI and lab data Number of p-values p-values
44
Significance model-dependent! Linear additive effect on variable - heterozygote midway between homozygotes Frontal CSFBDNF0.001 Serum T4 BDNF0.001 Subcortical GrayBDNF0.001 Frontal GrayBDNF0.002 LPK 01 NPY 0.002 Corpuscular volumeBDNF0.003 ….
45
SNP genotypes not all equal Subcortical whiteHTR5A0.008 Temporal whiteHTR5A0.01 DiagnosisDRD20.01 NRG10.005
46
That’s all, folks! High-quality databases for medical research of the HUBIN type open up for intelligent data analysis methods used in engineering and business Already with the limited data presently available, interesting clues emerge Long term effort - stable economy and engagement is vital.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.