Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein structure, domains, and interactions Curtis Huttenhower 04-11-16 Harvard T.H. Chan School of Public Health Department of Biostatistics.

Similar presentations


Presentation on theme: "Protein structure, domains, and interactions Curtis Huttenhower 04-11-16 Harvard T.H. Chan School of Public Health Department of Biostatistics."— Presentation transcript:

1 Protein structure, domains, and interactions Curtis Huttenhower 04-11-16 Harvard T.H. Chan School of Public Health Department of Biostatistics

2 Outline Identification –2–2D gels (!) –M–MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –M–Modifications (later) –R–Resources: ExPASy –D–DBs: UniProt/SwissProt, Genpept, IPI Structure –P–Primary/secondary/tertiary –C–Crystallography vs. NMR –D–Disordered proteins (cool!) –S–Structure alignment + search: DALI, SALAMI, VAST –P–Prediction (CASP) Homology vs. threading vs. ab initio –D–DBs: PDB, SCOP, CATH, InterPro Domains + families –F–Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –C–Charge, accessibility, and conservation –D–DBs: Pfam, SMART, ProSite Localization –D–Direct assessment vs. prediction TM, secreted, localization signals –D–DBs: PSORT (insanely comprehensive), YPL 2

3 Identifying proteins: 2D gels 3 pH/isoelectric point (zero charge) SDS-PAGE (kDa) Stain with YFA/D/R/etc. Bioinformatic challenges? Image alignment: warping (commercial, see Berth 2007; MELANIE) Spot quantification Database search (Make2D-DB/WORLD-2DPAGE)

4 Identifying proteins: MS Single purified sample: run through MS –Common representative technology: MALDI-TOF –Matrix Assisted Laser Desorption/Ionization – Time Of Flight 4

5 Identifying proteins: MS One protein = one vector of m/z peak abundances –m/z= mass/charge ratio as measured by MS –Peaks= prob. of fracturing protein at location(s) 5 http://www.weddslist.com/ms/tandem.html Peptide Mass Fingerprinting (PMF)

6 Identifying proteins: MS You run the machine – let the computer analyze spectra –This has been going on for a long time now –Most software solutions are commercial and good MSight:ExPASy software for 1D/2D MS spectra OMSSA:Open Mass Spectrometry Search Algorithm Mascot + SEQUEST DB search algorithms SORCERER Proteome Discoverer ProteinPilot 6 $$$$$$$$$$$$$$$$

7 Identifying proteins: MS/MS What if you don’t have purified protein? –LC/MS:Separate mixture by liquid chromatography first –MS/MS:Separate fragments by MS, then re-fragment –Either approach results in two-dimensional fragment spectra 7

8 Proteomics resources 8 www.ms-utils.orgwww.expasy.org www.hupo.orgproteomexchange.org

9 Proteomics databases 9 www.uniprot.orgwww.ncbi.nlm.nih.gov/protein http://www.ebi.ac.uk/interpro/

10 Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt, Genpept, IPI Structure –Primary/secondary/tertiary –Crystallography vs. NMR –Disordered proteins (cool!) –Structure alignment + search: DALI, SALAMI, VAST –Prediction (CASP) Homology vs. threading vs. ab initio –DBs: PDB, SCOP, CATH, InterPro Domains + families –Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –Charge, accessibility, and conservation –DBs: Pfam, SMART, ProSite Localization –Direct assessment vs. prediction TM, secreted, localization signals –DBs: PSORT (insanely comprehensive), YPL 10

11 Protein Structure: The Basics 11 www.csb.yale.edu/edens_toc.html www.ccp4.ac.uk www.phenix-online.org www.biop.ox.ac.uk/coot

12 Protein Structure: Alignment and Search Alignment:find best pairwise 3D match Search:find best global 3D match(es) 12 3D-BLAST ekhidna.biocenter.helsinki.fi/dali_server public.zbh.uni-hamburg.de/salami structure.ncbi.nlm.nih.gov/Structure/VAST Sam 2008

13 Protein Structure: Prediction Why let the experimentalists have all the fun? –Primary structure prediction Also known as “DNA sequencing” –Secondary structure prediction Typically recognize α-helix, β-sheet, coiled-coil, TM helices, signal peptides, and/or fold classes: search (homology) + seq. features 13

14

15 pbil.univ-lyon1.fr

16 Protein Structure: Prediction Tertiary structure prediction –Comparative/homology modeling Also known as “search” Primary (BLAST/PSI-BLAST) + secondary (as above) sequence search + alignment Local 3D sequences are individually tweaked –Threading Fragmentary/domain-based homology modeling Used when only portions of a protein are recognizable –Ab initio Physical modeling – super slow and hard to get right 16

17 Protein Structure: Prediction 17 CASP: Critical Assessment of Structure Prediction

18 Percent of residues (C  ) Distance cutoff (Å) most groups had the right prediction for this structure (but not those at arrow 2) most groups had the wrong prediction for this structure (but arrow 3 did better) half the groups got it wrong (arrow 4), half got it right (arrow 5); the key difference is the multiple sequence alignment 2 1 3 4 5

19

20 Proteins: Structure and Sequence 20 www.pdb.orgscop.mrc-lmb.cam.ac.ukwww.cathdb.info www.ebi.ac.uk/interpro

21 The CATH Hierarchy Fig. 11.18 Page 444

22 Viewing hemoglobin (accession 2H35) at PDB

23 Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt, Genpept, IPI Structure –Primary/secondary/tertiary –Crystallography vs. NMR –Disordered proteins (cool!) –Structure alignment + search: DALI, SALAMI, VAST –Prediction (CASP) Homology vs. threading vs. ab initio –DBs: PDB, SCOP, CATH, InterPro Domains + families –Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –Charge, accessibility, and conservation –DBs: Pfam, SMART, ProSite Localization –Direct assessment vs. prediction TM, secreted, localization signals –DBs: PSORT (insanely comprehensive), YPL 23

24 Definitions Family: a group of proteins with overall structural homology can share 1+ domain, typically more Domain: a region of a protein that can adopt a 3D structure a fold examples: zinc finger domain immunoglobulin domain Motif: a short, conserved region of a protein typically 10 to 20 contiguous amino acid residues often a modification or active site Page 390

25 Definition of a domain According to InterPro at EBI ( http://www.ebi.ac.uk/interpro /): A domain is an independent structural unit, found alone or in conjunction with other domains or repeats. Domains are evolutionarily related. According to SMART (http://smart.embl-heidelberg.de): A domain is a conserved structural entity with distinctive secondary structure content and a hydrophobic core. Homologous domains with common functions usually show sequence similarities. Page 390

26 Varieties of protein domains Page 393 Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times

27 Page 393 Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins domain

28 Page 395 ProDom entry for HIV-1 pol shows many related proteins

29 Definition of a motif A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids. Simple motifs include transmembrane domains and phosphorylation sites. These do not imply homology when found in a group of proteins. PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE, a pattern is a qualitative motif description (a protein either matches a pattern, or not). In contrast, a profile is a quantitative motif description. We will encounter profiles in Pfam, ProDom, SMART, and other databases. Page 394

30 Proteins can have both domains and motifs (patterns) Domain (aspartyl protease) Domain (reverse transcriptase) Motif (several residues) Motif (several residues)

31 Proteins: Structure and Sequence 31 smart.embl-heidelberg.deprosite.expasy.orgpfam.xfam.org www.jcvi.org/cgi-bin/ tigrfams/index.cgi

32 Proteins: Families, Domains, and Motifs 32

33 Protein Structure: Charge, Accessibility, and Conservation 33 Surface AA charge Surface AA solvent/substrate accessibility Sequence AA conservation

34 Outline Identification –2D gels (!) –MS (mass/seq) vs. MS/MS (mass/charge) Spectrum search –Modifications (later) –Resources: ExPASy –DBs: UniProt/SwissProt, Genpept, IPI Structure –Primary/secondary/tertiary –Crystallography vs. NMR –Disordered proteins (cool!) –Structure alignment + search: DALI, SALAMI, VAST –Prediction (CASP) Homology vs. threading vs. ab initio –DBs: PDB, SCOP, CATH, InterPro Domains + families –Family (large/2+ domain similarity), domain (local independent unit), motif (small target site) –Charge, accessibility, and conservation –DBs: Pfam, SMART, ProSite Localization –Direct assessment vs. prediction TM, secreted, localization signals –DBs: PSORT (insanely comprehensive), YPL 34

35


Download ppt "Protein structure, domains, and interactions Curtis Huttenhower 04-11-16 Harvard T.H. Chan School of Public Health Department of Biostatistics."

Similar presentations


Ads by Google