Presentation on theme: "Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein."— Presentation transcript:
Web Resources for Bioinformatics Vadim Alexandrov and Mark Gerstein
What is Bioinformatics? (Molecular) Bio - informatics One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical- chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications.
Web Resources : Molecules –Sequence, Structure, Function Algorithms –HMMs –alignments –simulations Databases
0. Good Starting Point http://www.ncbi.nlm.nih.gov/ http://www.rcsb.org/pdb/
1. PDBsum capabilities PDBsum: www.biochem.ucl.ac.uk/bsm/pdbsumwww.biochem.ucl.ac.uk/bsm/pdbsum Starting point for looking at PDB structure Each entry contains: a. View - Schematic pictures of the entry - Interactive views (RasMol/VRML) b. Details - Name, date and description of macromolecules in PDB entry - Authors, resolution and R-factor c. Links - PDB header information - PDB, NDB, SWISSPROT - PQS (protein quaternary structure), MMDB - CATH, SCOP, FSSP - Structure check reports - PROCHECK, WHATIF - Many others – enzyme, PRINTS etc
PDBsum capabilites, continued d. Each chain - CATH classification - Plot of sequence, secondary structure and domain assignments - PROMOTIF analysis - TOPS topology diagram - SAS – annotated FASTA alignment of related sequences in PDB - PROSITE pattern e. Nucleic acid ligands - Base sequence - NUCPLOT diagram of interactions f. Small molecule ligands - Schematic diagram of ligand - LIGPLOT diagram of interactions
2. SAS (Sequence Annotated by Structure): www.biochem.ucl.ac.uk/bsm/sas www.biochem.ucl.ac.uk/bsm/sas Annotation of protein sequences by structural information. a.Input for FASTA search of rest of PDB -PDB code -SWISS-PROT code -Paste sequence -Upload own alignment b. Annotation -Residue type -Ligand contacts -Active site residues -CATH domains -Residue similarity c. Options -Select inclusion in alignment -Colour/b&w, secondary structure d.View 3D structural superposition -coloured by SAS annotation
3. CATH: www.biochem.ucl.ac.uk/bsm/cath www.biochem.ucl.ac.uk/bsm/cath Hierarchical domain classification of protein structures in the PDB. Four basic levels: a. Class (automated): secondary structure composition and packing within structure -mainly- , mainly- , mixed , low secondary structure b.Architecture (manual): overall shape of the domain structure as determined by the orientations of the secondary structures. Connectivity is ignored - e.g. barrel, sandwich etc. c.Topology (semi-automated): fold families determined by shape and connectivity of secondary structures -e.g. Mainly-b two-layer sandwich d.Homologous superfamily (semi-automated): domains of common ancestors determined by sequence and structural similarity e.Sequence family (automated): highly similar structures and function as determined by sequence identity
4. Other classification databases a.Enzyme structures database: www.biochem.ucl.ac.uk/bsm/enzymes -PDB enzymes structures classified by E.C. number b.Protein-DNA database: www.biochem.ucl.ac.uk/bsm/prot_dna/prot_dna.html -PDB complex structures classified by binding motif
5. Protein sequence analysis: www.biochem.ucl.ac.uk/bsm/dbbrowser www.biochem.ucl.ac.uk/bsm/dbbrowser Protein sequence search using protein fingerprints - group of conserved sequence motifs used to characterize a protein family.
7. Atomic-level protein properties a.PROCAT: www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.html -Database of 3D enzyme active sites b.Hydrogen bond atlas: www.biochem.ucl.ac.uk/~mcdonald/atlas -Graphical summary of hydrogen-bonding properties of amino acids c.Atlas of side chain-side chain/side chain-base interactions: www.biochem.ucl.ac.uk/bsm/sidechains -interaction geometries of side chain and side chain-base pairs
8. Publicly available software (protein structure/interaction) a.HBPLUS - calculation of interactions in PDB structures b.LIGPLOT - schematic diagrams of protein-ligand interactions c.NUCPLOT - schematic diagrams of protein-DNA interactions d.PROMOTIF - analyze protein secondary structural motifs e.NACCESS - calculate atomic accessibilities of protein surfaces f.SURFNET - visualization of molecular surfaces, cavities etc g.PROCHECK - check stereochemical quality of protein structures h.THREADER - prediction of protein tertiary structure i.MEMSAT - prediction of transmembrane protein structure j-z BROWSE THE WEB AT YOUR SPARE TIME AND BOOKMARK ‘EM!