Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structure Analysis - I

Similar presentations

Presentation on theme: "Protein Structure Analysis - I"— Presentation transcript:

1 Protein Structure Analysis - I
PLPTH 890 Introduction to Genomic Bioinformatics Lecture 20 Protein Structure Analysis - I Liangjiang (LJ) Wang April 8, 2005

2 Outline Basic concepts. How protein structures are determined?
X-ray crystallography. NMR spectroscopy. Protein structure databases (PDB, MMDB). Protein structure visualization (RasMol, Cn3D, etc). Protein structure classification (SCOP and CATH).

3 Structural Bioinformatics
A subdiscipline of bioinformatics that focuses on the representation, storage, visualization, prediction and evaluation of structural information. References: Baxevanis and Ouellette Bioinformatics - A practical guide to the analysis of genes and proteins. 3rd edition. Chapter 9 and part of chapter 8. Pevsner Bioinformatics and functional genomics. Chapter 9. Bourne and Weissig Structural bioinformatics.

4 Protein Primary Structures
(Brandon and Tooze, 1998) R Protein Primary Structures Amino acid sequence of a polypeptide chain. 20 amino acids, each with a different side chain (R). Peptide units are building blocks of protein structures. The angle of rotation around the N−Cα bond is called phi (), and the angle around the Cα−C′ bond from the same Cα atom is called psi ().

5 Protein Secondary Structures
Local substructures as a result of hydrogen bond formation between neighboring amino acids (backbone interactions). The amino acid side chains affect secondary structure formation. Types of secondary structures:  helix,  sheet, Loop or random coil.

6  Helix Most abundant secondary structure.
3.6 amino acids per turn, and hydrogen bond formed between every fourth residue. Often found on the surface of proteins.

7  Sheet Hydrogen bonds formed between adjacent polypeptide chains.
The chain directions can be same (parallel sheet), opposite (anti-parallel), or mixed.

8 Loop or Coil Regions between  helices and  sheets.
Various lengths and 3-D configurations. Often functionally significant (e.g., part of an active site). (Brandon and Tooze, 1998) The active site of open /-barrel structures is in a crevice outside the carboxy ends of the  strands.

9 Protein Tertiary Structure
The 3-D structure of a protein is assembled from different secondary structure components. Tertiary structure is determined primarily by hydrophobic interactions between side chains. Different classes of protein structures: Hemoglobin (3HHB) All  T cell CD8 (1CD8) All  Thermolysin (7TLN) Mixed

10 Protein Tertiary Structure (Cont’d)
Fold: a certain type of 3-D arrangement of secondary structures. Protein structures evolves more slowly than primary amino acid sequences. E. coli cytochrome b562 (256B) Four-helix bundles Human growth hormone (1HUW) Three-helix bundle Drosophila engrailed homeodomain (1ENH)

11 Protein Quaternary Structure
Two or more independent tertiary structures are assembled into a larger protein complex. Important for understanding protein-protein interactions. E. coli ribosome (1ML5) Horse spleen ferritin (1IES)

12 Biological Knowledge from Structures
(Bourne, 2004)

13 X-Ray Crystallography
Basic steps: Expression, purification X-ray diffraction Structure solution Crystallization Gene targets Proteins Advantages: High-resolution structures. Large protein complexes or membrane proteins. Disadvantages: Molecules in a solid-state (crystal) environment. Requirement for crystals.

14 Nuclear Magnetic Resonance (NMR)
NMR reveals the neighborhood information of atoms in a molecule, and the information can be used to construct a 3-D model of the molecule. Advantages: No requirement for crystals. Proteins in a liquid state (near physiological state). Disadvantages: Limited by molecule size (up to 30 kD). Membrane proteins may not be studied. Inherently less precise than X-ray crystallography.

15 Protein Data Bank (PDB)
The primary repository for protein structures. Established in 1971 (the first bioinformatics database, set up with 7 protein structures). Contains 30,179 structures by March 22, 2005. Supports services for structure submission, search, retrieval, and visualization. Search options: SearchLite: PDB ID and key word search. SearchFields: advanced search. (PDB can be accessed at

16 PDB Content Growth structures year Last updated: 06-Mar-2005 2005 1972
30,000 5,000

17 Access to Structures through NCBI
MMDB (Molecular Modeling Database): Structures obtained from PDB. Data in NCBI’s ASN.1 format. Integrated into NCBI’s Entrez system. Cn3D (“see in 3D”): NCBI’s 3-D protein structure viewer. VAST (Vector Alignment Search Tool): for direct comparison of 3-D protein structures. (NCBI at

18 Ramachandran Plot Used to assess the quality of structures.
 sheet Used to assess the quality of structures. Good structures – tight clustering patterns. PSI  helix Thioredoxin (2TRX) PHI (Baxevanis and Ouellette, 2005)

19 3-D Visualization Tool - RasMol
An open source software package, and the most popular tool for viewing 3-D structures. RasMol represented a major break-through in software-driven 3-D structure visualization. Structure file formats supported by RasMol: PDB file format: outdated but human-readable. mmCIF: a new and robust data representation, but supported by few software tools. RasTop: provides a user-friendly graphical interface to RasMol. RasTop is available at

20 Cn3D: NCBI’s Structure Viewer
Cn3D (“see in 3D”): allows interactive exploration of 3-D structures, sequences and alignments. Can be used to produce high-quality molecular images. Limitation: only accepts structure files in NCBI’s ASN.1 format (from MMDB). Cn3D is available at

21 Other 3-D Visualization Tools
Chime: a Netscape plug-in for 3-D structure visualization; based on RasMol source code. Protein Explorer ( A Chime-based software package. Particularly user friendly and feature-rich. Swiss-Pdb Viewer (Deep View, available at Probably the most powerful, freely available molecular modeling and visualization package. Supports homology modeling, site-directed mutagenesis, structure superposition, etc.

22 Protein Structure Comparison
Why is structure comparison important? To understand structure-function relationship. To study the evolution of many key proteins (structure is more conserved than sequence). Comparing 3-D structures is much more difficult than sequence comparison. Protein structure classification: SCOP: Structure Classification Of Proteins. CATH: Class, Architecture, Topology and Homology. Protein structure alignment: DALI and VAST.

SCOP is based on expert definition of protein structural similarities, and is manually curated. Classification hierarchy: Class → Fold → Superfamily → Family SCOP has 7 major classes: all , all , /, +, multi-domain proteins ( and ), membrane and cell surface proteins, and small proteins. Domain is the base unit of the SCOP hierarchy, and proteins with multiple domains may appear at different places in the hierarchy. SCOP at

24 An Example of the SCOP Hierarchy
SCOP fold definition: Same major secondary structures. Same arrangement. Same topology. (Bourne, 2004)

Classification hierarchy: Class (C) → Architecture (A) → Topology (T) → Homologous superfamily (H) Based on secondary structure content (for C), literature (for A), structure connectivity and general shape (for T, using the SSAP algorithm), and sequence similarity (for H). Multi-domain proteins are partitioned into their constituent domains before classification. CATH at

26 An Example of the CATH Hierarchy
CATH classes: mainly . mainly . mixed  and . Few secondary structures. (Pevsner, 2003)

27 Summary Protein structures are important for addressing many biological questions. Protein Data Bank (PDB) is the primary repository for protein structures. Powerful software tools (e.g., RasMol) are available for viewing 3-D protein structures. SCOP and CATH are two manually curated databases for structure classification. Next: structure alignment and prediction.

Download ppt "Protein Structure Analysis - I"

Similar presentations

Ads by Google