Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

BiGCaT Bioinformatics Hunting strategy of the bigcat.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
On line (DNA and amino acid) Sequence Information Lecture 7.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Protein Databases EBI – European Bioinformatics Institute
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
The Protein Data Bank (PDB)
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
How to use the web for bioinformatics Ethan Strauss X 1171
UniProt - The Universal Protein Resource
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Secondary Databases Ansuman sahoo Roll: Y Bioinformatics Class Presentation 30 Jan 2013.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Organizing information in the post-genomic era The rise of bioinformatics.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
How can we find genes? Search for them Look them up.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
What is BLAST? Basic BLAST search What is BLAST?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Gene Annotation with DNA Subway
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics
Basic Local Alignment Search Tool
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Biological Databases Biology outside the lab

Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.

Information flux from data to decision Biology, Chemistry and Pharmaceutical research generate an huge amount of data. Information analysis rate is smaller than data production. Human Genome progect: 22.1 bilion bases sequenced but … what we do really know about it?

Bioinformatics - Building and managing of biological databases (nucleotides, proteins, structures, small molecules, pathways, literature, …) - Data mining and data analysis (Computational Biology) - protein modelling ab initio – Homology modelling – simulations (Molecular Modeling)

Literature databases

Nucleotide databases

Protein databases Uniprot databases: - Swiss-prot: provide a high level of annotation, minimal level of redundancy and high level of integration with other databases - TrEMBL: a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss- Prot. NCBI protein database (meta-database containing sequences from Uniprot entries, PDB derived sequences and translation from predicted ORF in genebank)

Structural Database Protein structures obtained by crystallography or NMR are stored in PDB.

Microarray Databases GEOminibus SMD Stanford Microarray Database Gene expression databases provides rough data of microarray expression. Data originated by different experiments can be merged to obtain previously unidentified results.

EST Databases EST: Expressed Sequence Tags 5’ EST : These regions tend to be conserved across species and do not change much within a gene family 3’ EST: Because these ESTs are generated from the 3' end of a transcript, they are likely to fall within non-coding, or untranslated regions (UTRs), and therefore tend to exhibit less cross-species conservation than do coding sequences. Sequence Tagged Site (STS): help to locate a gene in the genome. 3’EST are a good source of STS Available DBs: Genebank – dbEST – Unigene

Tools ORF finder Blast Multiple alignment Conserved Domain Identification Secondary structure and Folding Prediction

Example 1 A recombinant plasmid containing clone shows an interesting phenotype sequencing -Phylogenetically similar sequences - Conserved Domain Rough sequence ORF identification In-frame sequence Blast

CDS

Example 2

Exampe 2

Example 2 Tune the method a)Increase window size in evaluating score - increase local information integrating “environmental” data - 2 residues window -> 2 frames 3 residues window -> 3 frames …. b) Use degenerate matching methods (based on size, polarity, h-bond behavior, …)