©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

The Chemical Nature of Enzyme Catalysis
Review.
Lactate dehydrogenase + 38 ATP + 2 ATP. How does lactate dehydrogenase perform its catalytic function ?
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
Metabolic fuels and Dietary components Lecture - 2 By Dr. Abdulrahman Al-Ajlan.
Bioinformatics databases & sequence retrieval Content of lecture I.Introduction II.Bioinformatics data & databases III.Sequence Retrieval with MRS Celia.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Thomas Blicher Center for Biological Sequence Analysis
Lipids A. Classified based on solubility (like dissolves like) 1. insoluble in polar solvents 2. soluble in nonpolar solvents 3. lipids are hydrophobic.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
©CMBI 2007 Search tools Google, MRS, (SRS). ©CMBI 2007 Search tools Google= Thé best generic search and retrieval system MRS= Maarten’s Retrieval System.
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
©CMBI 2005 Search tools Google, MRS, SRS. ©CMBI 2004 Search tools SRS = Sequence Retrieval System MRS = Maarten’s Retrieval System Google = Thé best generic.
©CMBI 2008 Step 2: Bioinformatics databases & sequence retrieval Content of lecture I.Introduction II.Bioinformatics data & databases III.Sequence Retrieval.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
High Throughput Processing of the Structural Information of the Protein Data Bank Zoltán Szabadka, Vince Grolmusz Department of Computer Science Eötvös.
1 Computational Biology, Part 13 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
1 Computational Biology, Part 11 Retrieving and Displaying Macromolecular Structures Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Motif searching and protein structure prediction May 26, 2005 Hand in written assignments today! Learning objectives-Learn how to read structure information.
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Proteins are polymers of amino acids.
Structure and Function of Proteins Lecturer: Dr. Ora Furman Oct 2009 Winter 2009/10 Teaching Assistants: Miraim Oxsman Sivan Pearl.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Proteins account for more than 50% of the dry mass of most cells
The.pdb file format, and other resources for structural information Topic 5 Chapter 10 & 11, Du and Bourne “Structural Bioinformatics”
©CMBI 2008 Data and Databases Your questions: –Lookup –Compare –Predict.
Protein Sequences. The Genetic Code The natural extension of the genetic code…
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Macromolecular Visualization or… Where to go when ChemDraw just isn’t enough Martin Case Chem
TRNA Activation (charging) by aminoacyl tRNA synthetases Aminoacyl tRNA synthetase Two important functions: 1.Implement genetic code 2.Activate amino acids.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
1.Overall amino acid structure 2.Amino acid stereochemistry 3.Amino acid sidechain structure & classification 4.‘Non-standard’ amino acids 5.Amino acid.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Fig Second mRNA base First mRNA base (5 end of codon) Third mRNA base (3 end of codon)
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
RNA 2 Translation.
1 Protein synthesis How a nucleotide sequence is translated into amino acids.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
Marlou Snelleman 2011 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
Proteins.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
Chapter 3 Proteins.
X-ray detection xray/facilities.html.
Bioinformatics databases & sequence retrieval Content of lecture I.Introduction II.Bioinformatics data & databases III.Sequence Retrieval with MRS Celia.
Bioinformatics A Summary seminar (with many hints for exam questions)
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Useful shell commands head/tail, cut, sort, uniq Virginie Orgogozo March 2011.
Useful shell commands head/tail, cut, sort, uniq Virginie Orgogozo March 2011.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Proteins.
Cathode (attracts (+) amino acids)
Figure 3.14A–D Protein structure (layer 1)
Haixu Tang School of Inforamtics
Aligning Sequences You have learned about: Data & databases Tools
Amino Acids Amine group -NH2 Carboxylic group -COOH
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
How to Test an Assertion
“When you understand the amino acids,
Presentation transcript:

©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential for every database: 1. Unique identifier, or accession code 2. Name of depositor 3. Literature references 4. Deposition date 5. The real data

©CMBI 2008 Quality of Data SwissProt Data is only entered by annotation experts EMBL, PDB “Everybody” can submit data No human intervention when submitted; some automatic checks

©CMBI 2008 SwissProt database Database of protein sequences entries (Oct 2008) Ca. 200 Annotation experts worldwide Keyword-organised flatfile Obligatory deposit of in SwissProt before publication Presently, databases are being merged into UniProt.

©CMBI 2008 Important records in SwissProt (1) ID HBA_HUMAN Reviewed; 142 AA. AC P69905; P01922; Q3MIF5; Q96KF1; Q9NYR7; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DT 23-JAN-2007, sequence version 2. DT 23-SEP-2008, entry version 63. DE RecName: Full=Hemoglobin subunit alpha; DE AltName: Full=Hemoglobin alpha chain; DE AltName: Full=Alpha-globin;

©CMBI 2008 Important records in SwissProt (2) Cross references section: Hyperlinks to all entries in other databases which are relevant for the protein sequence HBA_HUMAN

©CMBI 2008 Important records in SwissProt (3) Features section: post-translational modifications, signal peptides, binding sites, enzyme active sites, domains, disulfide bridges, local secondary structure, sequence conflicts between references etc. etc.

©CMBI 2008 And finally, the amino acid sequence!

©CMBI 2008 Protein Data Bank (PDB) Databank for macromolecular structure data (3-dimensional coordinates). Started ca. 30 years ago (on punched cards!) Obligatory deposit of coordinates in the PDB before publication ~ entries (April 2008) ( ~2500 “unique” structures) PDB file is a keyword-organised flat-file (80 column) 1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent

©CMBI 2008 PDB important records (1) PDB nomenclature Filename= accession number= PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN) HEADER describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN CMPND name of molecule COMPND CRAMBIN SOURCE organism SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED

©CMBI 2008 PDB important records (2) SEQRES Sequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51 SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52 SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53 SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54 SSBOND disulfide bridges SSBOND 1 CYS 3 CYS 40 SSBOND 2 CYS 4 CYS 32

©CMBI 2008 PDB important records (3) and at the end of the PDB file the “real” data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM 1 N THR CRN 70 ATOM 2 CA THR CRN 71 ATOM 3 C THR CRN 72 ATOM 4 O THR CRN 73 ATOM 5 CB THR CRN 74 ATOM 6 OG1 THR CRN 75 ATOM 7 CG2 THR CRN 76 ATOM 8 N THR CRN 77 ATOM 9 CA THR CRN 78 ATOM 10 C THR CRN 79 ATOM 11 O THR CRN 80

©CMBI 2008 MRS home page

©CMBI 2008 MRS Search Steps Select database(s) of choice Formulate your query Hit “Search” The result is a “query set” or “hitlist” Analyze the results

©CMBI 2008 Simply type your keywords in the keyword field and choose SEARCH. If you know the fields of the database you are searching in you can specify your query further But think about your query first!! MRS Search options