Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators.

Similar presentations


Presentation on theme: "Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators."— Presentation transcript:

1 Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators & sponsors: Affymetrix, GTC, Mosaic, Aventis, Dupont, Cistran CHI Macroresults through Microarrays 3 George Church 1-May-02 Array quantitation for modeling mutations affecting RNA, protein interactions & cell proliferation.

2 gggatttagctcagtt gggagagcgccagact gaa gat ttg gag gtcctgtgttcgatcc acagaattcgcacca Post- 300 genomes & 3D structures

3 DNA RNA Protein: in vivo & in vitro interactions Metabolites Replication rate Environment Biosystems Measures & Models Microbes Cancer & stem cells Darwinian In vitro replication Small multicellular organisms RNAi Insertions SNPs

4 Functional Genomics Challenges Systems dynamics and optimality modeling. Multiple genetic domains per gene: high density readout of whole genome mutant phenotypes. Multiple RNAs & regulatory proteins per gene. Many causative genes & haplotypes per disease. Polony RNA exon-typing Multiplex in situ RNA & protein analyses Automated differentiation Homologous recombination genome engineering

5 Human Red Blood Cell ODE model 200 measured parameters GLC e GLC i G6P F6P FDP GA3P DHAP 1,3 DPG 2,3 DPG 3PG 2PG PEP PYR LAC i LAC e GL6PGO6PRU5P R5P X5P GA3P S7P F6P E4P GA3PF6P NADP NADPH NADP NADPH ADP ATP ADP ATP ADP ATP NADH NAD ADP ATP NADH NAD K+K+ Na + ADP ATP ADP ATP 2 GSHGSSG NADPHNADP ADO INO AMP IMP ADO e INO e ADE ADE e HYPX PRPP R1P R5P ATP AMP ATP ADP Cl - pH HCO 3 - Jamshidi, Edwards, Fahland, Church, Palsson, B.O. (2001) Bioinformatics 17: 286. (http://atlas.med.harvard.edu/gmc/rbc.html)

6 Modeling suboptimality: Segre, Edwards, Vitkup

7 Wild type, C 0.4-limited CC=0.97 Calculted Flux Calculated & Observed Fluxes in wt Observed Fluxes in wt

8 Replication rate of a whole-genome set of mutants Badarinarayana, et al. (2001) Nature Biotech.19: 1060

9 Replication rate challenge met: multiple homologous domains 123 123 thrA metL 1.16.7 1.8 12 lysC 10.4 probes Selective disadvantage in minimal media

10 Multiple mutations per gene Correlation between two selection experiments Badarinarayana, et al. (2001) Nature Biotech.19: 1060

11 Comparison of selection data with Flux Balance Optimization predictions on 488 genes predictionsnumber of genes negatively selected not negatively selected essential1438063 reduced growth rate 462422 non essential 299119180 P-value Chi Square = 0.004 > < Novel duplicates? Position effects, toxin accumulation, non-opt?

12 DNA RNA Protein: in vivo & in vitro interactions Metabolites Replication rate Environment Biosystems Measures & Models microbes cancer & stem cells In vitro replication small multicellular organisms RNAi Insertions SNPs

13 RNA quantitation issues Small fold changes in RNA are important. Example: 1.5-fold in trisomies. Cross-hybridizing RNAs. Alternative RNAs, gene families. Mixed tissues. In situ hybridization has low multiplex.

14 Gene Expression database Aach, Rindone, Church, (2000) Genome Research 10: 431-445. Microarrays 1 Affymetrix 2 Lynx-MPSS 3, SAGE 4 experiment control R/G ratios R, G values quality indicators ORF PM MM Averaged PM-MM “presence” feature statistics 25-mers Counts of 14-mers sequence tags for each ORF 1 DeRisi, et.al., Science 278:680-686 (1997) 2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996) 3 Brenner et al. Massively Parallel Signature Sequencing, Nat Biotechnol. 18:630-4 (2000) 4 Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995) agactagcag

15 RNA Cluster Analyses: Cell Cycle MCBSCB CLUSTER Number of ORFs Distance from ATG (b.p.) Number of sites Distance from ATG (b.p.) Number of sites Number of ORFs N = 186 Tavazoie, et al. 1999 Nature Genetics 22:281.

16 (homeobox gene Crx-/-) Livesey, Furukawa, Steffen, Church, Cepko (2000) Current Biol. 10:301. sp Combining mouse knockouts with RNA array analysis

17 DNA RNA Protein: in vivo & in vitro interactions Metabolites Replication rate Environment Biosystems Measures & Models microbes cancer & stem cells In vitro replication small multicellular organisms RNAi Insertions SNPs

18 ds-DNAarray HMS: Martha Bulyk, Xiaohua Wang, Martin Steffen MRC: Yen Choo Combinatorial arrays for binding constants Human/Mouse EGR1

19 Combinatorial DNA-binding protein domains ds-DNAarray PhagepVIIIpIII Antibodies Combinatorial arrays for binding constants

20 Phycoerythrin - 2º IgG Combinatorial DNA-binding protein domains ds-DNAarray Martha Bulyk et al Phage Combinatorial arrays for binding constants

21 Isalan et al., Biochemistry (‘98) 37:12026-12033 Interactions of Adjacent Basepairs in EGR1 Zinc Finger DNA Recognition

22 high [DNA] (+) ctrl sequence for wt binding alignment oligos etc. Wildtype EGR1 Microarray

23 WildtypeRSDHLTT RGPDLARREDVLIR LRHNLET TGG 2.8 nM GCG 16 nM 2.5 nM TAT 5.7 nM AAA,AAT,ACT,AGA, AGC,AGT,CAT,CCT, CGA,CTT,TTC,TTT AAT 240 nM KASNLVS Motifs weight all 64 K a app

24 DNA RNA Protein: in vivo & in vitro interactions Metabolites Replication rate Environment Biosystems Measures & Models microbes cancer & stem cells In vitro replication small multicellular organisms RNAi Insertions SNPs

25 Common diseases: billions of “new” alleles plus a millions of balanced polymorphisms 60 new mutations per generation * 5,000 generations since major bottleneck(s) which set up the linkage patterns (=300,000 per genome) Each of the 3 Gbp in the genome exist in all SNP forms: A,C,G,T,  600,000 of each SNP on earth (spread over the common haplotypes). The population frequency will be <0.01%. ( Aach et al, 2001 Nature 409: 856) Functional genomics (FG) may provide better leads for therapies & diagnostics. (Accuracy goal 1 ppb?)

26 Projected costs affect our view of what is possible. In 1985, the dawn of the genome project, $10 per bp, would have been $30B per genome. In 2002, Perlegen or Lynx: $3M (10 3 bits/$, 4 logs) In 2001, the cost of video data collection? 10 13 bits/$ Genotyping & functional genomics demand will probably be as high as permitted by costs.

27 Femtoliter (10 -15 ) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Why lower-cost, high quality “sequencing”? Mitra & Church Nucleic Acids Res. 27: e34 Environmental, food, & biodiversity monitoring Human genome haplotyping RNA splicing & editing immune B&T cell receptor spectra & How ?

28 A’ B B B B B B A Single Molecule From Library B B A’ 1st Round of PCR Primer is Extended by Polymerase B A’ B Primer A has 5’ immobilizing (Acrydite) modification.

29

30 1. Remove 1 strand of DNA. 2. Hybridize Universal Primer. 3. Add Red (Cy3) dTTP. BB’ 3’5’ A G T.. T 4. Wash; Scan Red Channel BB’ 3’5’ G C G.. Sequence polonies by sequential, fluorescent single-base extensions

31 5. Add Green (FITC) dCTP 6. Wash; Scan Green Channel BB’ 3’5’ A G T. T C BB’ 3’5’ G C G.. C Sequence polonies by sequential, fluorescent single-base extensions

32 Polony Template 3’ P’ P 5’ AATACAATTCACACAGGAAACAGCTATGACATTC TATTGTTAAAGTGTGTCCTTTGTCGATACTGGTA…5’ FITC ( C )CY3 ( T ) Mean Intensity: 58, 0.5 40, 6.5 0.3, 48 0.4, 43 Primer Extension 26 cycles, 34 Nucleotides

33 Femtoliter (10 -15 ) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Why lower-cost, high quality “sequencing”? Mitra & Church Nucleic Acids Res. 27: e34 Environmental, food, & biodiversity monitoring Human genome haplotyping RNA splicing & editing immune B&T cell receptor spectra & How ?

34

35 Femtoliter (10 -15 ) scale & low-cost scanners Polymerase DNA colonies (polonies) Fluorescent in situ sequencing (FISSEQ) Why lower-cost, high quality “sequencing”? Mitra & Church Nucleic Acids Res. 27: e34 Environmental, food, & biodiversity monitoring Human genome haplotyping RNA splicing & editing immune B&T cell receptor spectra & How ?

36 RNA Exon typing Single molecules of RNA dispersed. Multiplex polonies spanning all likely variable exons Sequential probing of each exon.

37 Functional Genomics Challenges Systems dynamics and optimality modeling. Multiple genetic domains per gene: high density readout of whole genome mutant phenotypes. Multiple RNAs & regulatory proteins per gene. Many causative genes & haplotypes per disease. Polony RNA exon-typing Multiplex in situ RNA & protein analyses Automated differentiation Homologous recombination genome engineering

38 For more information: arep.med.harvard.edu


Download ppt "Thanks to the Lipper Center for Computational Genetics Government and private grant agencies: NHLBI, NSF, ONR, DOE, DARPA, HHMI, Armenise Corporate collaborators."

Similar presentations


Ads by Google