Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.

Similar presentations


Presentation on theme: "Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006."— Presentation transcript:

1 Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

2 High-throughput DNA Sequencing Gene Model Functional Assignments Basic Understanding/ Applications (e.g. therapeutics) Structure Determination & Experimental Analysis Modeling & Inference From DNA to biological function

3 Developing a gene model Glimmer (Gene Locator and Interpolated Markov ModelER) GlimmerHMM for eukaryotic genomes (more advanced) Genome sequencing Genome assembly Regulatory elements Identification of ORF’s All but the simplest genomes are works in progress. It is estimated that 80% of gene models have errors at present! Comparative genomics should help the process, as will sequencing of expressed sequence tags and other genomics projects Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.

4 Pfam Many others… HYSIELNASLLERGV … HLNIEDNPSCNAMGV … PLNIELNASLNEPGV … WERIELNASLNER--… HQRIEL--SLMMRG-… HLNIEDNPSCNAMGV … PLNIELNASLNEPGV… WERIELNASLNER--… HQRIEL--SLMMRG-… HYSIELNASLLERGV… HLNIEDNPSCNAMGV … WERIELNASLNER--… HQRIEL--SLMMRG-… HLNIEDNPSCNAMGV … PLNIELNASLNEPGV… WERIELNASLNER--… HQRIEL--SLMMRG-… HYSIELNASLLERGV… HLNIEDNPSCNAMGV … PLNIELNASLNEPGV… WERIELNASLNER--… HQRIELK-SLMMRG-… HYSIELNASLLERGV… HLNIEDNPSCNAMGV … PLNIELNASLNEPGV… WERIELNASLNER--… HQRIEL--SLMMRG-… The “sequence-space” of proteins Universe of all protein sequences PSI-BLAST HMM

5 PFAM “domains” Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, Mhairi Marshall, Simon Moxon, Erik L. L. Sonnhammer, David J. Studholme, Corin Yeats and Sean R. Eddym Nucleic Acids Research(2004) Database Issue 32:D138-D141

6 High-throughput DNA Sequencing Gene Model Functional Assignments Basic Understanding/ Applications (e.g. therapeutics) Structure Determination & Experimental Analysis Modeling & Inference Flow of information from DNA to functional understanding

7 X-ray Laboratory

8 Crystallography reveals locations of electron ‘clouds’ of the atoms: And the polypeptide chain can be traced through space

9 Scop Cath The “fold-space” of proteins Universe of all protein structures

10 Murzin et al. http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html

11 Glimpes of the “fold space” of proteins Hou, Sims, Zhang, and Kim, PNAS 100:2386 (2003)

12 High-throughput DNA Sequencing Gene Model Functional Assignments Basic Understanding/ Applications (e.g. therapeutics) Structure Determination & Experimental Analysis Modeling & Inference Flow of information from DNA to functional understanding

13 Connections between sequence and structure Universe of sequencesUniverse of structures

14 Connections between sequence and structure Universe of sequencesUniverse of structures ?

15 At what level of homology can one trust a structural inference? Redfern, Orengo et al., J. Chromatography B 815:97 (2005)

16 What is structural genomics? Experimental determination of key structures (target selection is a key part of the idea) Modeling of family members Inferring function (note “infer”) Making direct use of the new structures

17 Protein Sequences and Folds ~100,000 families of proteins that cannot be reliably modeled at present (modeling families: <30% identity over large fraction to a known structure) ~50% of all domain families can be assigned to a structure under CATH

18 Protein Structure Initiative (PSI) Mission Statement “To make the three-dimensional atomic level structures of most proteins easily available from knowledge of their corresponding DNA sequences.”

19 Genseration of new structures Chandonia and Brenner, Science 311:347 2006.

20 Center for Eukaryotic Structural Genomics Exclusively eukaryotic targets 60% fold-space targets (emphasis on eukaryote-only families 20% disease relevant 20% outreach – targets from the community Overall goals are to reduce the costs of determining structures of proteins from eukaryotes by refining all steps in the pipeline Supported by National Institutes of Health John Markley- PI, George Phillips/Brian Fox Co-PI’s

21 University of Wisconsin’s Center for Eukaryotic Structural Genomics (~75 total, 3/4 unique)

22 How does one clone, express, purify, and solve structures not previously studied? An industry-style pipeline

23 Pipeline details: cell-based and cell-free protein production for X-ray and NMR Note: project involves sequencing, which aids gene modeling!

24 Sesame—integrated LIMS in use at CESG Open access to the public—structures, protocols, reagents, progress… http://www.uwstructuralgenomics.org Zolnai et al., J. Struct. Func. Genomics 4:11 (2003)

25 At1g18200 Mis-annotated prior to our work, but structure led to discovery of function.

26 >>Alignment of GalP_UDP_transf vs 1Z84:A|PDBID|CHAIN|SEQUENCE/15-196 *->kkfsplDhvhrrynpLtlvwilVsphrakRPikqsqsLidlkkeLwq ++ ++ + +r p t +w+ sp+rakRP 1Z84:A|PDB 15 GDSVENQSPELRKDPVTNRWVIFSPARAKRP---------------- 45 gavetpkvptdplhdp.dcysakLcpg........atratgevNPdyest + ++k p+ p p++c+ c g++++ ++ r++ ++ P + 1Z84:A|PDB 46 -TDFKSKSPQNPNPKPsSCP---FCIGreqecapeLFRVP-DHDPNWKLR 90 yvLkspkkftndFyalseDnpyikvsvSNeaIaknplfqlksvrGhelci + +n ++als+ +++ +++++ G +++ 1Z84:A|PDB 91 VI-------ENLYPALSRN---LETQ------------STQPETG--TSR 116 VI...CF......SKPehDptlpalakeeirevvdaWqlcteelGyegre +I + F++ +S P h+ l + i+ ++ a + + 1Z84:A|PDB 117 TIvgfGFhdvvieS-PVHSIQLSDIDPVGIGDILIAYKKRINQIA----- 160 nhpayqnvqIFEmNkGaemGcsnpHPYaYFnEHGQvwatsfiP<-* h + + q+F N Ga G s H H Q a++ +P 1Z84:A|PDB 161 QHDSINYIQVFK-NQGASAGASMSHS------HSQMMALPVVP 196 Pfam B: 13 and 136 matches to #’s 7198 and 11634 http://www.sanger.ac.uk/Software/Pfam/

27 Blind prediction of structure: CASP and At5g18200

28 High-throughput DNA Sequencing Gene Model Functional Assignments Basic Understanding/ Applications (e.g. therapeutics) Structure Determination & Experimental Analysis Modeling & Inference Flow of information from DNA to functional understanding

29 Function space of proteins KEGG = Kyoto Encyclopedia of Genes and Genomes The Gene Ontology project (GO) MetabolismCellular Processes Signal Processing Enzymes Don’t forget protein-protein interactions exist also!

30 At2g17340 Related to a human protein associated with Hallervorden-Spatz syndrome, a neurological disorder?

31 81 protein samples sent to Toronto: 8 solved CESG structures, 73 randomly chosen Generalized assays for: phosphatase, esterase, phospodiesterase, protease, amino acid dehydrogenase, alcohol dehydrogenase, organic acid dehydrogenase, amino acid oxidase, alcohol oxidase, organic acid oxidase, beta-lactamase, beta-galactosidase, arylsulfatase, lipase. Results: - Solid hits: 3 phosphatases, 5 esterases - Weaker hits: 9 more esterases, 6 phosphodiesterases - No hits: all others A. Yakuknin et al. Current Opinion in Chemical Biology, 8:42 (2004) Parallel Enzyme Activity Testing (Collaboration with University of Toronto)

32 Activity AssaySubstrateJR5670 Phosphodiesterasebis-pNPP0.016 DehydrogenaseAmino Acids0.032 DehydrogenaseAcids0.016 DehydrogenaseAlcohols0.022 DehydrogenaseAldehyde-0.045 DehydrogenaseSugars0.003 Thioesterasepalmitoyl-CoA0.108 OxidaseNAD(P)H Ox-0.115 ProteaseProtease Mix0.118 PhosphatasepNPP> 1 Target: At2g17340/JR5670 Absorbance >0.25 is a tentative signal, >0.5 is a strong signal. Initial Assay: Wide-spectrum

33 High-throughput DNA Sequencing Gene Model Functional Assignments Basic Understanding/ Applications (e.g. therapeutics) Structure Determination & Experimental Analysis Modeling & Inference Flow of information from DNA to functional understanding

34 At2g17340 Enzyme of unknown specificity.

35 A functional annotation lesson

36 Functional Annotation by Inference From raw DNA sequences, one looks for genomic features such as promoters, alternative splicing of mRNAs, retrotransposons, pseudogenes, tandem duplications, synteny, and homology. It Is homology, both from sequence and from structure, that allow functional inferences to be made. Prosite, Dali, VAST, FFAS03 Some tool integrate knowledge from many sources into one place, acting a meta-servers of clues.

37 Connections between structure and function Universe of structures Universe of functions

38 Connections between structure and function Universe of structures Universe of functions Convergent evolution

39 Connections between structure and function Universe of structures Universe of functions Divergent evolution

40 At1g18200 Misleading annotation prior to our work, but structure led to discovery of function.

41 High-throughput DNA Sequencing Gene Model Functional Assignments Basic Understanding/ Applications (e.g. therapeutics) Structure Determination & Experimental Analysis Modeling & Inference Flow of information from DNA to functional understanding

42 Summary Structural genomics efforts are gaining momentum and helping to assign new functions to orfs and to fill in the space of all possible protein folds.

43 Administration Madison (Primm, Troestler, Markley, Phillips, Fox) Cloning/sequencing pipeline Madison (Wrobel, Fox) Expression pipeline Madison (Frederick, Fox, Riters) E. coli cell growth pipeline Madison (Sreenath, Burns, Seder, Fox) Cell-Free SystemMadison (Vinarov, Markley, Newman) Protein purification pipeline Madison (Vojtik, Phillips, Fox, Ellefson, Jeon) Mass spectrometry Madison (Aceti, Sabat, Sussman) Madison NMRFAM (Song, Tyler, Cornilescu, Markley) NMR spectroscopy Milwaukee MCW (Peterson, Volkman, Lytle) Crystallization / crystallography Madison (Bingman, Phillips, Bitto, Han, Bae, Meske) Argonne (Advanced Photon Source) BioinformaticsMadison (Bingman, Sun, Phillips, Wesenberg) Indianapolis (Dunker) Milwaukee MCW (Twigger, de la Cruz) Computational supportMadison (Bingman, Ramirez, Phillips) Sesame Madison (Zolnai, Markley, Lee) The Center for Eukaryotic Structural Genomics (supported by NIH GM64598 and GM074901)


Download ppt "Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006."

Similar presentations


Ads by Google