Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics.

Similar presentations


Presentation on theme: "Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics."— Presentation transcript:

1 Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics dpwall@hms.harvard.edu http://wall.hms.harvard.edu

2 Outline Rationale & Biological Significance (30 mins) Present status (5 mins) Project Plan (25 mins)

3 Introduction Polygenic & Multigenic Many genes have been linked to autism Few genes have been replicated in across studies Difficult for a single researcher to grasp the complexity of the autism gene landscape

4 Statistics U.S. number of cases 1992-2006 http://www.fightingautism.org

5 Autism Fragile X Rett Syndrome Tuberous Sclerosis Angelman Schizophrenia Epilepsy Seizure Disorder Mental Retardation Others?? Behavioral overlap with other disorders

6 Approach Build the network of all genes implicated in Autism to date Conduct large comparative analysis of Autism and other neurological disorders at the level of genes, biological processes, and networks Leverage existing research on Autism- related disorders to find new genetic leads.

7 Building Gene Lists for All Neurological Disorders (433) OMIM GeneCards NINDS Asperger Fragile X Tourette’s OCD… Autism OCD Epilepsy Ataxia Gene Lists Disease source Gene-Disease sources Disease gene database

8 Autism Cluster 1100100101… 1110101011… 1001010100… 1001011101… // 1101011101… Genes Disorders Autism Cluster

9 Network Construction Data derived from STRING (http://string.embl.de/) Integration of p-p interaction (interactome), co-expression (transcriptome), orthology (orthologome),text (bibliome), and other lines of evidence. Focus on creating a networks of possible interactions within a normal cell using classification methods (random forests)

10 Sequence coEvolution P-P Interaction FXYD1 is identified as a MeCP2 target gene whose de-repression may directly contribute to Rett syndrome neuronal pathogenesis Text (aka Bibliome) http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.0030043 Random Forest Decision D1D2D3D4D5 D1D3 D4 = {1,0,2,1,0} D2 D3 D5 D4 D1 D2 D4 Yes No A B AB Correlated Expression

11 Networks for all AC disorders Hypoxia (586 N/4359E) FragileX (97N/100E) Tuberous Sclerosis (110N/204E) Hypotonia (154N/208E) Autism (145N/164E) Microcephaly (135N/166E) Rett (48N/74E) Angelman (51N/57E) Spasticity (62N/40E) Mental Retardation (573N/1035E) Ataxia (428N/1489E) Seizure Disorder (35N/13E) Inf. Hypotonia (29N/16E) Asperger (15N/9E) autworks.hms.harvard.edu

12 Multi-disorder component of autism (MDAG) 66 out of 127 involved in at least one member of the autism cluster Highly connected component of the autism network

13 Biological Processp valueMDAG genes transmission of nerve impulse 3.00E-11ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, TH, TPH1, TSC1 nervous system development 3.29E-11ALDH5A1, APOE, ARX, BTD, CHRNA4, DAB1, DCX, FMR1, FOXP2, GABRA5, GATA3, GRIN2A, HOXA1, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1, UBE3A, VLDLR synaptic transmission 7.68E-10ABAT, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SLC6A4, TH, TPH1 cell-cell signaling 3.12E-09ABAT, ADM, ALDH5A1, APOE, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GATA3, GRIN2A, MAOA, MET, NF1, NTF5, SCN1A, SLC6A4, SSTR5, TH, TPH1, TSC1 brain development 2.64E-06ARX, DAB1, DCX, FOXP2, GABRA5, HOXA1, MET, NF1, RELN, TSC1, UBE3A generation of neurons 2.43E-05APOE, ARX, DAB1, DCX, MAP2, MECP2, MET, NDN, NF1, NTF5, PTEN, RELN, VLDLR regulation of cell proliferation 2.45E-04ADM, ARX, CHRNA7, DHCR7, FOXP2, GRPR, MECP2, MET, NDN, NF1, PAX3, PTEN, SSTR5, TSC1 cell migration 3.93E-03ARX, DAB1, DCX, MET, NDN, NF1, PAX3, PTEN, RELN, VLDLR homeostasis 1.90E-02ADM, APOE, ARX, CHRNA4, CHRNA7, GRIN2A, MBD1, NDN, NF1, SCN1A, SLC40A1, SSTR5, TH cell morphogenesis 1.94E-02APOE, ARX, ATP10A, DCX, MAP2, MECP2, NDN, PTEN, RELN, TSC1 ion transport 2.74E-02ARX, CACNA1D, CHRNA4, CHRNA7, GABRA5, GABRB3, GABRG2, GRIN2A, MECP2, MET, SCN1A, SLC40A1, TSC1 cell differentiation 4.35E-02ADM, APOE, ARX, DAB1, DCX, DHCR7, EXT2, FXR1, GATA3, GLO1, GRIN2A, MAP2, MECP2, MET, NDN, NF1, NTF5, PAX3, PTEN, RELN, TSC1, VLDLR

14 Significantly enriched MDAG processes Ion Transport Cell Proliferation CNS Development Synaptic Transmission P = 2.7E-02 P = 2.45E-04 P = 3.29E-11 P = 7.68E-10 Fisher’s exact test Bonferroni adjustment 14648 biological processes from Gene Ontology tested

15 Process-Driven Predictions Fragile X Tuberous Sclerosis Seizure Disorder Mental Retardation CNS development Synaptic Transmission Ion Transport Biological ProcessesAutism Cluster Disorders Putative New Genes 64 new genes, all of which occur in 2 or more of the Autism Cluster Disorders Cell Proliferation …

16 Experimental Validation GEO6575 (from UC Davis M.I.N.D. institute) White blood cell Affymetrix U133plus2.0 17 samples of autistic children without regression 18 children with regression 9 children with mental retardation or developmental delay 12 typically developing children from the general population

17 Blood for Brain

18 Autism without regression (17) Autism with regression (18)

19 Experimental Validation GEO6575 (from U.C. Davis M.I.N.D. institute) White blood cell Affymetrix U133plus2.0 17 samples of autistic patients without regression 18 patients with regression 9 patients with mental retardation or developmental delay 12 typically developing children from the general population

20 Data-driven approach to FDR detection can be ineffective Standard data-driven application of false discovery rate control yields few genes below FDR threshold of 0.05. (with these data, only 2 genes survive) This is a frequent circumstance in instances of weak signal and large background noise (e.g. microarray experiments)

21 Results of process-driven search 43 Process-derived gene predictions had FDR-adjusted p values <0.05 Highly significant rate of validation -- 65% of predictions confirmed by expression data

22 Network-Driven Predictions

23 Results of network-driven search 267 occurred in 1 autism cluster disorder 58 occurred in 2 17 in 3 3 in 4 sibling disorders A total of 345 new predictions

24 Results of network-driven search 301 had FDR-adjusted p values <0.05 90% (!) of predictions verified by expression data

25 8101214 43 Prior knowledge focuses whole- genomic search 43 Process-derived gene predictions had FDR-adjusted p values <0.05. 65% 301 Network-derived gene predictions had FDR-adjusted p values <0.05. 90% The rate of validation in both cases is significantly non-random

26 Top 20 genes occurring in 3 or more Autism Sibling Disorders For many of these candidates, their roles in neurological impairment have been studied in autism cluster disorders, but not in autism.

27 SLC16A2 Molecular Triangulation OPHN1 AR PAFAH1B1 FLNA SLC6A8 MYO5A FXN L1CAM Mental Retardation Fragile X Hypotonia Ataxia Hypoxia Microcephaly Rett Syndrome Spasticity Tuberous Sclerosis - cytoskeleton organization - cell organization/biogenesis - cell communication - cell motility GO biological process enrichment

28 Conclusions Previous research has implicated between 100 and 1500 genes as contributors to the molecular physiology of Autism. Our knowledge-driven approach provides a logical means to filter the genome wide search.

29 Conclusions Global “ask” swamped by noisy signal Informed, knowledge-driven “ask” results in biologically significant gene predictions Comparative analysis of Autism with related neurological disorders provides a focused search for novel gene candidates

30 Autworks Autworks is a web-driven navigation system that allows any researcher to view and search through the network of genes implicated in autism and related neurological disorders Built to aid and abet the role of serendipity and inspiration for researchers working on autism and other complex neuro diseases. http://autworks.hms.harvard.edu

31 Autworks now

32 The Plan Bring our analytical strategies and Autworks to the cloud –Beef up underbelly using AWS storage and the Amazon “Turkforce” –Scale up comparative network analysis –Enlarge validation database, verify/re-verify computational predictions, robustify the candidates

33 DatabaseDescriptionStats Database of Genomic VariantsA curated catalogue of structural variation in the human genome ~31615 total entries (indels, inversions, and copy number variation) dbSNPNCBI’s central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms ~6,136,008 SNPs for Human6,136,008 Chromosomal Variation in Man*Searchable reference of chromosomal variation > 3000 links to publications describing 30 different types of chromosomal variation in human disease Human Gene Mutation Database* Established for the study of mutational mechanisms in human genes 62901 mutations in public release OMIM*NCBI’s compendium of human genes and genetic phenotypes 12,634 genes for ~2459 phenotypes/diseases GeneCards*searchable database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information all known and predicted human genes with summaries of known disease association SNPedia*SNPedia shares information about the effects of variations in DNA, citing peer- reviewed scientific publications 4621 SNPs Aim 1: Build the neurological disease “gene core” of the Autworks relational database * Can be queried with a disease or gene term

34 Aim 1: Steps (1) Extract the entire set of neurological disorders listed by NINDS (currently 433) to ensure that we can find any and all commonalities to Autism. (2) Mine all databases in above Table that can be searched using a disease term as the query, specifically the Online Mendelian Inheritance in Man (OMIM), GeneCards, Chromosomal Variation in Man, the Human Gene Mutation Database (HGMD), and SNPedia. (3) Combine and import the features from each of the online resources into a relational database that will become the backend of Autworks, being careful to remove any redundancies. (4) Cross-reference resources to comprehensively populate data model.

35 Gene-disease data model “Gene Core” FieldDescription Geneofficial gene symbol from HUGO Variant IDunique identifier (e.g., RS#, SS#, etc.) Variant TypeSNP, CNV, Indel, etc. Genomic Locationchromosomal coordinates (hg build 36) SourceDatabase(s) from where gene and/or gene variant was derived OMIM scoreConfidence score used by OMIM Polyphen scoreScore indicating severity of mutation DiseaseAutism and related neurological disorders PubmedIDArticle(s) describing the genetic variant This data model will share much in common with Variome project’s database

36 ABI1 Medline PMID: 17173049 SHANK3 (also known as ProSAP2) regulates the structural organization of dendritic spines and is a binding partner of neuroligins; genes encoding neuroligins are mutated in autism and Asperger syndrome. Here, we report that a mutation of a single copy of SHANK3 on chromosome 22q13 can result in language and/or social communication disorders... GeneTagger PMID: 17304222 We identified an important component for controlled actin assembly, abelson interacting protein-1 (Abi-1), as a binding partner for the postsynaptic density (PSD) protein ProSAP2/Shank3. During early neuronal development, Abi-1 is localized in neurites and growth cones; at later stages, the protein is enriched in dendritic spines and PSDs… Candidate gene filtered MeSH Major Topics MeSH term filtered Annotator Checks Accuracy through BioNotate system Shank3 Autism Results: Gene-Gene Gene-Disease Corpora Can we Turkify this process???

37 Aim 2: Build interaction & network cores for Autworks DatabaseDescription Protein-Protein Interaction Derived directly from STRING [18]. STRING incorporates >80,000 p-p interactions from numerous sources including MINT [24], HPRD [25], BIND [26], DIP [27], BioGrid [28, 29], KEGG [30], and Reactome [31]. These databases contain records from two-hybrid assays, synthetic lethality assays, mass spectrometry, co-Immunoprecipitation, and more. Phylogenetic Profiles We will take the union of evidence from STRING and evidence from RoundUP, which was built by the PI and has greater coverage than STRING’s orthology information (21,000+ unique phylogenetic profiles for more than 30 Eukaryotic organisms [2]). Phylogenetic profiles are commonly used to predict functional relationships between proteins [32, 33]. Gene Ontology (GO) GO [34, 35] contains >923034 unique biological process, function, and cellular component terms. Same process, function, and/or cellular location can be used to predict protein-protein interaction. This has been incorporated into STRING. Co-ExpressionWe will combine data from STRING with our own in house Co-Expression database, ChipperDB [23]. ChipperDB contains a sizable portion of NCBI’s Gene Expression Omnibus [36]. Co-expression is a proven method for predicting shared function and protein-protein interaction [37]. BibliomeStatistically relevant co-occurrences of gene names, and semantically specified interactions found via Natural Language Processing [16].

38 GO Co-Ex P-P intx Bibliome Phylo-profiles Classifier Interaction Core Network core Ataxia Mental Retardation Autism Can we “cloud” it up???

39 Aim 3: comparative network analysis on the cloud Schizophrenia Autism - Find disease filtered interacting partners - Find shortest paths btw candidates - Find minimal subnetworks - Verify and reconstruct networks appropriately

40 Rett Syndrome Angelman Syndrome Mental Retardation Genetic Landscape of Autism

41 Autism Diseaseome

42 Acknowledgments Zak Kohane Matt Huyck Tom Monaghan Todd DeLuca Nieves Mendizabel Paco Esteban Joaquin Goni Alal Eran Michal Galdzicki Lou Kunkel Alexa McCray Leon Peshkin


Download ppt "Comparative network analysis of neurological disorders focuses the genome-wide search for autism genes. Dennis P. Wall, PhD Center for Biomedical Informatics."

Similar presentations


Ads by Google