Presentation on theme: "Protein Structure Analysis - I"— Presentation transcript:
1Protein Structure Analysis - I PLPTH 890 Introduction to Genomic BioinformaticsLecture 20Protein Structure Analysis - ILiangjiang (LJ) WangApril 8, 2005
2Outline Basic concepts. How protein structures are determined? X-ray crystallography.NMR spectroscopy.Protein structure databases (PDB, MMDB).Protein structure visualization (RasMol, Cn3D, etc).Protein structure classification (SCOP and CATH).
3Structural Bioinformatics A subdiscipline of bioinformatics that focuses on the representation, storage, visualization, prediction and evaluation of structural information.References:Baxevanis and Ouellette Bioinformatics - A practical guide to the analysis of genes and proteins. 3rd edition. Chapter 9 and part of chapter 8.Pevsner Bioinformatics and functional genomics. Chapter 9.Bourne and Weissig Structural bioinformatics.
4Protein Primary Structures (Brandon andTooze, 1998)RProtein Primary StructuresAmino acid sequence of a polypeptide chain.20 amino acids, each with a different side chain (R).Peptide units are building blocks of protein structures.The angle of rotation around the N−Cα bond is called phi (), and the angle around the Cα−C′ bond from the same Cα atom is called psi ().
5Protein Secondary Structures Local substructures as a result of hydrogen bond formation between neighboring amino acids (backbone interactions).The amino acid side chains affect secondary structure formation.Types of secondary structures: helix, sheet,Loop or random coil.
6 Helix Most abundant secondary structure. 3.6 amino acids per turn, and hydrogen bond formed between every fourth residue.Often found on the surface of proteins.
7 Sheet Hydrogen bonds formed between adjacent polypeptide chains. The chain directions can be same (parallel sheet), opposite (anti-parallel), or mixed.
8Loop or Coil Regions between helices and sheets. Various lengths and 3-D configurations.Often functionally significant (e.g., part of an active site).(Brandon and Tooze, 1998)The active site of open /-barrel structures is in a crevice outside the carboxy ends of the strands.
9Protein Tertiary Structure The 3-D structure of a protein is assembled from different secondary structure components.Tertiary structure is determined primarily by hydrophobic interactions between side chains.Different classes of protein structures:Hemoglobin (3HHB)All T cell CD8 (1CD8)All Thermolysin (7TLN)Mixed
10Protein Tertiary Structure (Cont’d) Fold: a certain type of 3-D arrangement of secondary structures.Protein structures evolves more slowly than primary amino acid sequences.E. coli cytochromeb562 (256B)Four-helix bundlesHuman growthhormone (1HUW)Three-helix bundleDrosophila engrailedhomeodomain (1ENH)
11Protein Quaternary Structure Two or more independent tertiary structures are assembled into a larger protein complex.Important for understanding protein-protein interactions.E. coliribosome(1ML5)Horse spleen ferritin (1IES)
12Biological Knowledge from Structures (Bourne, 2004)
13X-Ray Crystallography Basic steps:Expression,purificationX-raydiffractionStructuresolutionCrystallizationGenetargetsProteinsAdvantages:High-resolution structures.Large protein complexes or membrane proteins.Disadvantages:Molecules in a solid-state (crystal) environment.Requirement for crystals.
14Nuclear Magnetic Resonance (NMR) NMR reveals the neighborhood information of atoms in a molecule, and the information can be used to construct a 3-D model of the molecule.Advantages:No requirement for crystals.Proteins in a liquid state (near physiological state).Disadvantages:Limited by molecule size (up to 30 kD).Membrane proteins may not be studied.Inherently less precise than X-ray crystallography.
15Protein Data Bank (PDB) The primary repository for protein structures.Established in 1971 (the first bioinformatics database, set up with 7 protein structures).Contains 30,179 structures by March 22, 2005.Supports services for structure submission, search, retrieval, and visualization.Search options:SearchLite: PDB ID and key word search.SearchFields: advanced search.(PDB can be accessed at
16PDB Content Growth structures year Last updated: 06-Mar-2005 2005 1972 30,0005,000
17Access to Structures through NCBI MMDB (Molecular Modeling Database):Structures obtained from PDB.Data in NCBI’s ASN.1 format.Integrated into NCBI’s Entrez system.Cn3D (“see in 3D”): NCBI’s 3-D protein structure viewer.VAST (Vector Alignment Search Tool): for direct comparison of 3-D protein structures.(NCBI at
18Ramachandran Plot Used to assess the quality of structures. sheetUsed to assess the quality of structures.Good structures – tight clustering patterns.PSI helixThioredoxin (2TRX)PHI(Baxevanis and Ouellette, 2005)
193-D Visualization Tool - RasMol An open source software package, and the most popular tool for viewing 3-D structures.RasMol represented a major break-through in software-driven 3-D structure visualization.Structure file formats supported by RasMol:PDB file format: outdated but human-readable.mmCIF: a new and robust data representation, but supported by few software tools.RasTop: provides a user-friendly graphical interface to RasMol. RasTop is available at
20Cn3D: NCBI’s Structure Viewer Cn3D (“see in 3D”): allows interactive exploration of 3-D structures, sequences and alignments.Can be used to produce high-quality molecular images.Limitation: only accepts structure files in NCBI’s ASN.1 format (from MMDB).Cn3D is available at
21Other 3-D Visualization Tools Chime: a Netscape plug-in for 3-D structure visualization; based on RasMol source code.Protein Explorer (http://www.proteinexplorer.org/):A Chime-based software package.Particularly user friendly and feature-rich.Swiss-Pdb Viewer (Deep View, available atProbably the most powerful, freely available molecular modeling and visualization package.Supports homology modeling, site-directed mutagenesis, structure superposition, etc.
22Protein Structure Comparison Why is structure comparison important?To understand structure-function relationship.To study the evolution of many key proteins (structure is more conserved than sequence).Comparing 3-D structures is much more difficult than sequence comparison.Protein structure classification:SCOP: Structure Classification Of Proteins.CATH: Class, Architecture, Topology and Homology.Protein structure alignment: DALI and VAST.
23SCOP SCOP at http://scop.mrc-lmb.cam.ac.uk/scop/. SCOP is based on expert definition of protein structural similarities, and is manually curated.Classification hierarchy:Class → Fold → Superfamily → FamilySCOP has 7 major classes: all , all , /, +, multi-domain proteins ( and ), membrane and cell surface proteins, and small proteins.Domain is the base unit of the SCOP hierarchy, and proteins with multiple domains may appear at different places in the hierarchy.SCOP at
24An Example of the SCOP Hierarchy SCOP fold definition:Same major secondary structures.Same arrangement.Same topology.(Bourne, 2004)
25CATH CATH at http://www.biochem.ucl.ac.uk/bsm/cath/. Classification hierarchy:Class (C) → Architecture (A) → Topology (T)→ Homologous superfamily (H)Based on secondary structure content (for C), literature (for A), structure connectivity and general shape (for T, using the SSAP algorithm), and sequence similarity (for H).Multi-domain proteins are partitioned into their constituent domains before classification.CATH at
26An Example of the CATH Hierarchy CATH classes:mainly .mainly .mixed and .Few secondary structures.(Pevsner, 2003)
27SummaryProtein structures are important for addressing many biological questions.Protein Data Bank (PDB) is the primary repository for protein structures.Powerful software tools (e.g., RasMol) are available for viewing 3-D protein structures.SCOP and CATH are two manually curated databases for structure classification.Next: structure alignment and prediction.