Secondary Databases Ansuman sahoo Roll: Y1011009 Bioinformatics Class Presentation 30 Jan 2013.

Slides:

Advertisements

Similar presentations

Bioinformatics Ayesha M. Khan Spring 2013.

Advertisements

Secondary structure prediction from amino acid sequence.

Protein Structure Prediction

The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.

Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:

Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis

Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.

Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.

Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)

Archives and Information Retrieval

The Cell, Central Dogma and Human Genome Project.

The Protein Data Bank (PDB)

Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.

Protein Modules An Introduction to Bioinformatics.

Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.

Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.

EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:

Protein and Function Databases

Protein Tertiary Structure Prediction Structural Bioinformatics.

Protein Structures.

ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.

Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.

Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.

BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD

Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.

Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis

Protein Tertiary Structure Prediction

Bioinformatics.

Development of Bioinformatics and its application on Biotechnology

1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.

Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.

Bioinformatics for biomedicine

Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.

Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.

Biological Databases By : Lim Yun Ping E mail :

Sequence analysis: Macromolecular motif recognition Sylvia Nagl.

Bioinformatics Overview, NCBI & GenBank JanPlan 2012.

Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,

BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.

Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009

Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.

Protein and RNA Families

Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.

Motif discovery and Protein Databases Tutorial 5.

Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.

EB3233 Bioinformatics Introduction to Bioinformatics.

1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.

Protein Domain Database

Structural proteomics Handouts. Proteomics section from book already assigned.

Bioinformatics and Computational Biology

Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.

EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.

Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.

Motif Search and RNA Structure Prediction Lesson 9.

Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,

1 of 28 Evaluating Genes and Transcripts (“Genebuild”)

Protein Tertiary Structure Prediction Structural Bioinformatics.

 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?

The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.

Protein families, domains and motifs in functional prediction May 31, 2016.

Protein databases Henrik Nielsen

Demo: Protein Information Resource

7.3 Translation udent_view0/chapter3/animation__how_translation_work s.html.

There are four levels of structure in proteins

Protein Sequence Analysis - Overview -

Protein Structures.

Protein Sequence Analysis - Overview -

Introduction to Databases

SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.

Presentation transcript:

Secondary Databases Ansuman sahoo Roll: Y Bioinformatics Class Presentation 30 Jan 2013

Whydatabases ? biology has turned into data-rich science High-throughput genomics, proteomics, metabolomics,... Vast amount of data generated in experiments (like MS peptide fragments) need for storing and communicating large datasets has grown tremendously archiving, curation, analysis and interpretation of all of these datasets are a challenge convenient methods for proper storing, searching & retrieving necessary Databases are the means to handle this data overload Why need Database?

What candatabases do? Make biological data available... 1.… to scientists. 2.… in computer-readable form. Analysis (computer based) Handle and share large volumes of data Interface for computer based systems (Algorithms, Web interfaces) Store data Defined formats Automated storage and retrieval of experimental data Link knowledge with external resources What Databases can do?

Database classification I Type of data Nucleotide or protein sequences Protein sequence patterns and motifs Macromolecular 3D structures Gene expression data Metabolic pathways... Data entry and quality control Scientists deposit data directly Appointed curators add and update Type and degree of error checking Consistency, redundancy, conflicts, updates

Database classification II Primary or derived data Primary: experimental results directly into database Secondary: results of analysis of primary databases Technical design Flat-files Relational database (SQL) Object-oriented database Exchange/publication technologies (FTP, HTML, COBRA, XML, SOAP) Maintainer status Large, public institution funded by government (EMBL, NCBI) Academic group or scientist Commercial company

EMBL DDBJDDBJ GenBank sequences submitted directly by scientists and genome sequencing group, and sequences taken from literature and patents. entries in the EMBL, GenBank and DDBJ databases are synchronized on a daily basis accession numbers are managed in a consistent manner comparatively little error checking and fair amount of redundancy. Nucleotide sequence databases

Protein sequence databases UniProt KB mission to provide a comprehensive, high-quality and freely accessible resource of protein sequence and functional information SWISS-PROT is a protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. PIR SWISS-PROT and PIR are different from the nucleotide databases in that they are both curated

Examples of Secondary Databases

Motif:  Super secondary structure level  Simple combination of a few secondary structure elements with specific geometric arrangements Helix-turn-helix is a motif Helix-loop-helix is a motif  Several DNA major groove binding proteins (eg. transcription factors) have helix-turn-helix (NOT helix- loop-helix) motif Source: Tirumala kumar choudhry

Motif: bin/pdbsum/GetPage.pl?pdbcode=n/a&template=doc_promotif.html Motif descriptions: 1. Beta barrels 2. Beta sheets 3. Beta-alpha-beta units 4.Beta hairpins 5. Psi loops 6.Beta bulges 7.Beta strands 8. Helices 9. Helix-helix interactions 10. Beta turns 11. Gamma turns 12.Disulphides PROMOTIF: SCANPROSITE: If you are interested in finding a motif in novel protein go to: or

Domain:  More of a tertiary structure level  Several motifs can arrange three-dimensionally into a domain  In simple terms, a domain is a fundamental unit of tertiary structure  A polypeptide chain that is folded independently into stable structure is a domain  Domains conveniently divide protein structures into discrete subunits, which are frequently classified separately  Knowing the Domain, protein function prediction is possible.  Four major classes: all alpha, beta, alpha/beta, alpha+beta, However, new classes are being added: double barrels, beta rolls are two classes added in 2010 Structural Classification of Proteins (SCOP) is where you have to look for definitions and examples CATH is another database

Source: Tirumala kumar choudhry Four helix bundle is commonly seen in alpha- proteins Several alpha helical proteins show this domain. Ferritin, cytochrome C’ are some examples

pfam A collection of protein Domain families. Each entry is a multiple sequence alignment of a protein domain or a conserved region of interest. Based on Hidden Markov Model Pfam A: initial alignment of protein alignment is carried out by hand. Pfam B: Generated from automatic clustering of ProDom database. > 80% entries are associated with swiss-prot & TrEMBL entries.

Strategies for Secondary structure Prediction if an experimentally determined three- dimensional structure of a closely related protein is known, copy the secondary structure assignment from the known structure rather than attempt to predict it denovo If no related structures are known, use multiple sequence information

If the particular algorithm does not accept MSA as an input, try to predict the secondary structure for the target and a few of its distant homologues and use the consensus pattern of secondary structures as an additional indicator of reliability of the prediction. Run as many good methods as possible and use the agreement between their results to infer a consensus prediction.

Protein fold recognition the representation of the template structures (usually corresponding to proteins from the Protein Data Bank database) the evaluation of the compatibility between the target sequence and a template fold the algorithm to compute the optimal alignment between the target sequence and the template structure the way the ranking is computed and the statistical significance is estimated

Epitope Prediction Epitope: defined as “the chemical structure recognized by specific receptors of the immune system (antibodies, MHC molecules, and/or T cell receptors) Database: Immune Epitope Database andAnalysis Resource (IEDB)

Document Management Overview