Presentation on theme: "1 Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays Hsun-Hsien Chang 1, Michael McGeachie 1,2 1 Children’s Hospital."— Presentation transcript:
1 Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays Hsun-Hsien Chang 1, Michael McGeachie 1,2 1 Children’s Hospital Informatics Program, Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School 2 Channing Lab, Brigham and Women Hospital September 3, 2011
2 Genetic Information Flows from DNA to RNA Central dogma of molecular biology. Research goals: –Decipher how genetic variants influence RNA transcript expression, leading to disease formation. –Create clinical tools to perform diagnosis & prognosis, design treatment strategies, etc.
3 Measure Genetic Variants and RNA Abundance by Microarrays Genetic variants are measured by single nucleotide polymorphisms (SNPs). Modeled by discrete (multinomial) random variables. Microarrays can assess 500K SNPs in parallel. RNA abundance is measured by transcriptional expression levels. Modeled by continuous (log-normal) random variables. Microarrays can assess 50K transcripts in parallel.
4 Identify SNP-Transcript Dependence High expression level Low expression level Medium expression level Challenges: –Need an intelligent method to compare pairs of 500K SNPs and 50K transcripts. –Need a network analysis to capture molecular interactions between SNPs and transcripts.
5 Reduce Dimensionality by Phenotypes SNPs microarrays (discrete variables) expression microarrays (continuous variables) Filter by Phenotypes (Bayes factor)
6 Model SNP-Transcript Dependence Reduced SNPs data Reduced expression data S1S1 SMSM G1G1 GNGN A SNP can be influenced by other SNPs. A transcript can be influenced by SNPs and other transcripts.
7 Interplay of Phenotypes, SNPs, and Transcripts Network analysis is performed on the reduced data set. For each variable, find the set of modulating variables with the highest likelihood. Implement a greedy search algorithm to search the best network. Pheno
8 Pediatric Acute Lymphoblastic Leukemia (ALL) Mutation of lymphoblasts leads to acute lymphoblastic leukemia (ALL). Two types of ALL have different responses to chemotherapies: –B-cell precursor ALL (BCP-ALL) –common ALL (C-ALL)
9 A SNP-Transcript Network Distinguishes Pediatric Acute Lymphoblastic Leukemia Database from GEO with access # GSE patients; 8 with BCP-ALL and 20 with C- ALL. Genotyped at 100k SNPs by Affymetrix Human Mapping 100K Set microarrays. Expression patterns of 50k genes were profiled using Affymetrix HG-U133 Plus 2.0 platforms. 96% phenotype classification accuracy.
11 Conclusions Use phenotypes to reduce data dimensionality. Capture genetic flow by modeling SNP-transcript dependence networks. Create phenotype dependent SNP-transcript networks. Apply the analysis to pediatric acute lymphoblastic leukemia.