Protein structure, domains, and interactions Curtis Huttenhower 04-11-16 Harvard T.H. Chan School of Public Health Department of Biostatistics.

Slides:



Advertisements
Similar presentations
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Advertisements

UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
©CMBI 2005 Exploring Protein Sequences - Part 2 Part 1: Patterns and Motifs Profiles Hydropathy Plots Transmembrane helices Antigenic Prediction Signal.
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
An Introduction to Bioinformatics Protein Structure Prediction.
Protein structure (Part 2 of 2).
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Bioinformatics on Proteomics Hsueh-Fen Juan April 24, 2003 NTNU.
Biology 224 Dr. Tom Peavy Sept 28 & 30
Protein analysis and proteomics Friday, 27 January 2006 Introduction to Bioinformatics DA McClellan
The Protein Data Bank (PDB)
What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Protein Modules An Introduction to Bioinformatics.
Previous Lecture: Regression and Correlation
Protein Structures.
DNA Motif and protein domain discovery
My contact details and information about submitting samples for MS
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein Structure Prediction and Analysis
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Protein analysis and proteomics (Part 1 of 2). Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan.
Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E: J. Pevsner
Protein Tertiary Structure Prediction
Protein Bioinformatics Course
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Day 2: Protein Sequence Analysis 1.Physico-chemical properties. 2.Cellular localization. 3.Signal peptides. 4.Transmembrane domains. 5.Post-translational.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Laxman Yetukuri T : Modeling of Proteomics Data
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Protein Domain Database
Structural proteomics Handouts. Proteomics section from book already assigned.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Overview of Mass Spectrometry
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Protein Properties Function, structure Residue features Targeting Post-trans modifications BIO520 BioinformaticsJim Lund Reading: Chapter , 11.7,
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
InterPro Sandra Orchard.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt,
Protein analysis and proteomics
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Predicting Active Site Residue Annotations in the Pfam Database
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
Protein Structures.
Protein structure prediction.
Presentation transcript:

Protein structure, domains, and interactions Curtis Huttenhower Harvard T.H. Chan School of Public Health Department of Biostatistics

Outline Identification –2–2D gels (!) –M–MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –M–Modifications (later) –R–Resources: ExPASy –D–DBs: UniProt/SwissProt, Genpept, IPI Structure –P–Primary/secondary/tertiary –C–Crystallography vs. NMR –D–Disordered proteins (cool!) –S–Structure alignment + search: DALI, SALAMI, VAST –P–Prediction (CASP) Homology vs. threading vs. ab initio –D–DBs: PDB, SCOP, CATH, InterPro Domains + families –F–Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –C–Charge, accessibility, and conservation –D–DBs: Pfam, SMART, ProSite Localization –D–Direct assessment vs. prediction TM, secreted, localization signals –D–DBs: PSORT (insanely comprehensive), YPL 2

Identifying proteins: 2D gels 3 pH/isoelectric point (zero charge) SDS-PAGE (kDa) Stain with YFA/D/R/etc. Bioinformatic challenges? Image alignment: warping (commercial, see Berth 2007; MELANIE) Spot quantification Database search (Make2D-DB/WORLD-2DPAGE)

Identifying proteins: MS Single purified sample: run through MS –Common representative technology: MALDI-TOF –Matrix Assisted Laser Desorption/Ionization – Time Of Flight 4

Identifying proteins: MS One protein = one vector of m/z peak abundances –m/z= mass/charge ratio as measured by MS –Peaks= prob. of fracturing protein at location(s) 5 Peptide Mass Fingerprinting (PMF)

Identifying proteins: MS You run the machine – let the computer analyze spectra –This has been going on for a long time now –Most software solutions are commercial and good MSight:ExPASy software for 1D/2D MS spectra OMSSA:Open Mass Spectrometry Search Algorithm Mascot + SEQUEST DB search algorithms SORCERER Proteome Discoverer ProteinPilot 6 $$$$$$$$$$$$$$$$

Identifying proteins: MS/MS What if you don’t have purified protein? –LC/MS:Separate mixture by liquid chromatography first –MS/MS:Separate fragments by MS, then re-fragment –Either approach results in two-dimensional fragment spectra 7

Proteomics resources 8

Proteomics databases 9

Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt, Genpept, IPI Structure –Primary/secondary/tertiary –Crystallography vs. NMR –Disordered proteins (cool!) –Structure alignment + search: DALI, SALAMI, VAST –Prediction (CASP) Homology vs. threading vs. ab initio –DBs: PDB, SCOP, CATH, InterPro Domains + families –Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –Charge, accessibility, and conservation –DBs: Pfam, SMART, ProSite Localization –Direct assessment vs. prediction TM, secreted, localization signals –DBs: PSORT (insanely comprehensive), YPL 10

Protein Structure: The Basics

Protein Structure: Alignment and Search Alignment:find best pairwise 3D match Search:find best global 3D match(es) 12 3D-BLAST ekhidna.biocenter.helsinki.fi/dali_server public.zbh.uni-hamburg.de/salami structure.ncbi.nlm.nih.gov/Structure/VAST Sam 2008

Protein Structure: Prediction Why let the experimentalists have all the fun? –Primary structure prediction Also known as “DNA sequencing” –Secondary structure prediction Typically recognize α-helix, β-sheet, coiled-coil, TM helices, signal peptides, and/or fold classes: search (homology) + seq. features 13

pbil.univ-lyon1.fr

Protein Structure: Prediction Tertiary structure prediction –Comparative/homology modeling Also known as “search” Primary (BLAST/PSI-BLAST) + secondary (as above) sequence search + alignment Local 3D sequences are individually tweaked –Threading Fragmentary/domain-based homology modeling Used when only portions of a protein are recognizable –Ab initio Physical modeling – super slow and hard to get right 16

Protein Structure: Prediction 17 CASP: Critical Assessment of Structure Prediction

Percent of residues (C  ) Distance cutoff (Å) most groups had the right prediction for this structure (but not those at arrow 2) most groups had the wrong prediction for this structure (but arrow 3 did better) half the groups got it wrong (arrow 4), half got it right (arrow 5); the key difference is the multiple sequence alignment

Proteins: Structure and Sequence

The CATH Hierarchy Fig Page 444

Viewing hemoglobin (accession 2H35) at PDB

Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt, Genpept, IPI Structure –Primary/secondary/tertiary –Crystallography vs. NMR –Disordered proteins (cool!) –Structure alignment + search: DALI, SALAMI, VAST –Prediction (CASP) Homology vs. threading vs. ab initio –DBs: PDB, SCOP, CATH, InterPro Domains + families –Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –Charge, accessibility, and conservation –DBs: Pfam, SMART, ProSite Localization –Direct assessment vs. prediction TM, secreted, localization signals –DBs: PSORT (insanely comprehensive), YPL 23

Definitions Family: a group of proteins with overall structural homology can share 1+ domain, typically more Domain: a region of a protein that can adopt a 3D structure a fold examples: zinc finger domain immunoglobulin domain Motif: a short, conserved region of a protein typically 10 to 20 contiguous amino acid residues often a modification or active site Page 390

Definition of a domain According to InterPro at EBI ( /): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART ( A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. Page 390

Varieties of protein domains Page 393 Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times

Page 393 Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins domain

Page 395 ProDom entry for HIV-1 pol shows many related proteins

Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE ( is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. We will encounter profiles in Pfam, ProDom, SMART, and other databases. Page 394

Proteins can have both domains and motifs (patterns) Domain (aspartyl protease) Domain (reverse transcriptase) Motif (several residues) Motif (several residues)

Proteins: Structure and Sequence 31 smart.embl-heidelberg.deprosite.expasy.orgpfam.xfam.org tigrfams/index.cgi

Proteins: Families, Domains, and Motifs 32

Protein Structure: Charge, Accessibility, and Conservation 33 Surface AA charge Surface AA solvent/substrate accessibility Sequence AA conservation

Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt, Genpept, IPI Structure –Primary/secondary/tertiary –Crystallography vs. NMR –Disordered proteins (cool!) –Structure alignment + search: DALI, SALAMI, VAST –Prediction (CASP) Homology vs. threading vs. ab initio –DBs: PDB, SCOP, CATH, InterPro Domains + families –Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –Charge, accessibility, and conservation –DBs: Pfam, SMART, ProSite Localization –Direct assessment vs. prediction TM, secreted, localization signals –DBs: PSORT (insanely comprehensive), YPL 34