Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson, Anna Sillén, Göran Sedvall, Karolinska Institutet Data.

Similar presentations


Presentation on theme: "Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson, Anna Sillén, Göran Sedvall, Karolinska Institutet Data."— Presentation transcript:

1 Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson, Anna Sillén, Göran Sedvall, Karolinska Institutet http://www.nada.kth.se/~stefan Data Mining in Schizophrenia Research -preliminary

2 Data mining in science? Hasty decision making or eternal truth? Underpinning of findings?...

3

4 Human Brain Informatics - HUBIN A project to accelerate research and development to find new treatments for human brain disease

5 Human Brain Informatics - HUBIN Intentions: To develop a uniform database for brain information from defined human subject groups To implement data from many research areas - “datadomains” - into the database To perform statistical and datamining analyses using data from all the data domains

6 Leading causes of disability in the world, WHO (1990) Cause of disability Total % of millions world total 1. Unipolar major depression50.810.7 2. Iron deficiency anemia22.04.7 3. Falls22.04.6 4. Alcohol use15.83.3 5. Chronic obstructive pulmonary disease14.73.1 6. Bipolar disorder14.13.0 7. Congenital anomalies13.52.9 8. Osteoarthritis13.32.8 9. Schizophrenia12.12.6 10. Obsessive compulsive disorder10.22.2

7 Schizophrenia - Questions and Clues Cause(s) of schizophrenia not known. Does not appear in animals-no experimental clues. Explanation models vary over time. Disturbed neuronal circuitry in schizophrenia? (currently hottest hypothesis) Influenced by genotype or/and environment? (clustering in families - but epidemiologic studies and studies on adopted twins suggest both causes)

8 Schizophrenia - Questions and Clues Which processes result in disease? Traces of disturbed development visible in MRI (anatomy) and blood tests? Genetic risk factors? Causal pathways?

9 Hubin organization Ethical group Göran Sedvall, Chairman Ethical group Göran Sedvall, Chairman Hubin AB Stig Larsson, Chairman Hubin AB Stig Larsson, Chairman Håkan Hall, CEO Hubin AB Stig Larsson, Chairman Hubin AB Stig Larsson, Chairman Håkan Hall, CEO Project staff Data domain responsibles Management group Håkan Hall, Assoc. Prof. (project manager) Stig Larsson, T.D. hc Göran Sedvall, Prof. Stefan Arnborg, Prof. Tom McNeil, Prof. Lars Therenius Prof. Management group Håkan Hall, Assoc. Prof. (project manager) Stig Larsson, T.D. hc Göran Sedvall, Prof. Stefan Arnborg, Prof. Tom McNeil, Prof. Lars Therenius Prof. Scientific advisory board Göran Sedvall, Chairman Nancy Andreasen, Univ of Iowa Paul Greengard, Rockefeller Univ Tomas Hökfelt, Karolinska Inst. Scientific advisory board Göran Sedvall, Chairman Nancy Andreasen, Univ of Iowa Paul Greengard, Rockefeller Univ Tomas Hökfelt, Karolinska Inst.

10 Preliminary analysis Test case: 144 subjects: 61 affected, 83 controls Variables: Diagnosis Demography Blood tests Genetics Anatomy (MRI)

11 In vivo imaging Magnetic resonance images(MRI) Functional magnetic resonance images (fMRI) Positron emission tomography (PET) Single photon emission tomography (SPECT) MRI PET In vitro imaging (whole hemispheres) Autoradiography In situ hybridization ISHH LAR Types of images used in HUBIN

12 T1T2 T1 T2 CSFwhite gray

13 Brain boxes Picture from BRAINS II manual, Magnotta et al, University of Iowa

14 Manually drawn vermis regions ROIs drawn by Gaku Okugawa

15 Single Nucleotide Polymorphism A U G U U C C A U U A U U G U A U G U U U C A U U A U U A U RNA: Protein APhe His Tyr Cys Phe non-coding SNP coding SNP Tyr Protein A’ Protein A can be slightly different from A´

16 Genes studied DBH dopamine beta-hydroxylase DRD2 dopamine receptor D2 + DRD3 dopamine receptor D3 HTR5A serotonin receptor 5A NPY neuropeptide Y SLC6A4 serotonin transporter BDNF brain derived neurotrophic factor NRG1 neuregulin +

17 Intracranial volume (ml) Cumulative distribution + = schiz o = controls Elementary Visualizations MRI Intracranial volume

18 Elementary Visualizations MRI data Total CSF volumes (ml) Cumulative distribution + = schiz o = controls p < 0.0002

19 Gamma GT Cumulative distribution + = schiz o = controls p < 0.01 Blood data Gamma GT- alcohol marker

20 Men Women Subcortical white + = schiz o = controls Subcortical white + = schiz o = controls Gender differences MRI

21 Which methods to use? Visualizations, cdf and scatter plots, give intuitive grasp of variables - problems with many interrelated variables Statistical modelling required to decide significance of visible trend, and to rank effects

22 Statistical methods Bayesian methods intuitive and rational - but conventional testing required for publications Linear models - need to account for mixing and over-dispersion Discretization and Bayesian analysis of discrete distributions - intuitive, but information lost Non-parametric randomization tests - most sensitive and accomodates modern multiple testing paradigms

23 Statistical methods Bayesian methods intuitive and rational - but conventional testing required for publications Linear models - need to account for mixing and over-dispersion Discretization and Bayesian analysis of discrete distributions - intuitive, but information lost Non-parametric randomization tests - most sensitive and accomodates modern multiple testing paradigms

24 Bayes’ factor Choice between two hypotheses, H1 and H2, given experimental/observational data D P(H1|D) P(D|H1) P(H1) P(H2|D) P(D|H2) P(H2) Posterior odds Bayes factor prior odds

25 Hypotheses in test matrix H1: (no effect) a data column is generated independently of diagnosis (composite model) H2: the data for controls are generated by one composite model, for affected by another one.

26 Non-parametric Bayesian methods- Do the three point sets have the same underlying distribution, or not? Which is the alternative hypothesis?

27 Graphical models Y Z X Y Z X Y Z X f(x,y,z)=f(x)f(y)f(z) f(x,y,z)= f(x,z)f(x,y)/f(x) f(x,y,z)

28 V-structures, causality X Y A B C A B C X Y A C A C | B A C A C | B V-structures detectable from observational data Indistinguishable A B C f(x,y)=f(y|x)f(x) =f(x|y)f(y)

29 Pairs associated to Diagnosis Y Z D Y Z D Y Z D Y Z D Y and Z co-vary differently for Affected and Controls

30 Age-dependency of Posterior Superior Vermis Age at MRI Post sup vermis + = schiz o = controls

31 No co-variation between Posterior inferior vermis and parietal white for affected Parietal white Post inf vermis + = schiz o = controls

32 MRI volumes, blood, demography Dia BrsCSF TemCSF SubCSFTotCSF Multivariate characterization by graphical models

33 Adding Vermis variables Dia BrsCSF TemCSF PSV

34 PSV has best explanatory power affected - healthy Posterior superior vermis + = schiz o = controls

35 Decision tree for Diagnosis MRI Data A = schiz C = controls () = misscls

36 Classification explains data! X Y Z X Y Z H W W

37 Autoclass1 Total gray A= schiz C= controls

38 Weak signals in genetics data Numerous investigations have indicated ‘almost significant’ signals of SNP:s to diagnosis Typically, these findings cannot be confirmed in other studies - populations genetically heterogeneous and measurements nonstandardized. We try to connect SNP:s both to diagnosis and to other phenotypical variables Multiple testing and weak signal problems.

39 Empirical distribution by genotype Gene BDNF (schiz + controls) Frontal CSF A/A A/G G/G Cumulative distribution

40 Compensating multiple comparisons Bonferroni 1937: For level  and n tests, use level  /n Hochberg 1988: step-up procedure Benjamini,Hochberg 1996: False Discovery Rate J. Storey, 2002: pFDRi, pFDRd Bayesian interpretations being developed (Wasserman & Genovese, 2002)

41 2-var associations abs(a  c) ‘no effect’ Observed p-values FDRi 71 FDRd 62 Bonferroni-Hochberg-Benjamini methods MRI and lab data Number of p-values p-values

42

43

44 Significance model-dependent! Linear additive effect on variable - heterozygote midway between homozygotes Frontal CSFBDNF0.001 Serum T4 BDNF0.001 Subcortical GrayBDNF0.001 Frontal GrayBDNF0.002 LPK 01 NPY 0.002 Corpuscular volumeBDNF0.003 ….

45 SNP genotypes not all equal Subcortical whiteHTR5A0.008 Temporal whiteHTR5A0.01 DiagnosisDRD20.01 NRG10.005

46 That’s all, folks! High-quality databases for medical research of the HUBIN type open up for intelligent data analysis methods used in engineering and business Already with the limited data presently available, interesting clues emerge Long term effort - stable economy and engagement is vital.


Download ppt "Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson, Anna Sillén, Göran Sedvall, Karolinska Institutet Data."

Similar presentations


Ads by Google