Introduction to Bioinformatics

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Lecture 2.21 Retrieving Information: Using Entrez.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Protein Databases EBI – European Bioinformatics Institute
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Proteins and Protein Function Charles Yan Spring 2006.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
UniProt - The Universal Protein Resource
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
An Introduction to Bioinformatics Molecular Biology Databases.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Introductory Overview
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Biological Databases By : Lim Yun Ping E mail :
UniProt Non-redundant Reference Cluster (UniRef) Databases Swiss Institute of Bioinformatics (SIB) European Bioinformatics Institute (EMBL-EBI)
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Organizing information in the post-genomic era The rise of bioinformatics.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Protein databases Henrik Nielsen
Biological Databases By: Komal Arora.
Archives and Information Retrieval
생물정보학 Bioinformatics.
Genomes and Their Evolution
Department of Genetics • Stanford University School of Medicine
Functional Annotation of the Horse Genome
Mangaldai College, Mangaldai
NCBI What is NCBI? PubMed OMIM Blast Entrez and more.
Genomes and Their Evolution
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
Lesson 3 Bioinformatics Laboratory
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Introduction to Bioinformatics Dr. Lokesh Gambhir Department Of Life Sciences Shri Guru Ram Rai Institute of Technology & Sciences (SGRRITS)

By the end of this course, you will What is bioinformatics? There are many different answers to this. One basic definition is that it is the use of computational methods to analyse biological data. By the end of this course, you will • have knowledge of the many data resources available at the NCBI and EBI, • understand some of the basic principles behind aligning sequences, • understand some key points about different sequence alignment programs, • have experience running some web-based bioinformatics programs, • understand the information returned by some sequence database searching programs, • appreciate some of the practical approaches available for automating bioinformatics. Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Databases There are many freely available data resources. A large number are hosted by large national and international institutions such as the American center, the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Centre (EBI). Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Folding problem and structure prediction Few Concepts to remember DNA Protein/Structure Pattern recognition Folding problem and structure prediction The Twilight Zone Orthologs Paraolgs DNA sequencing Few Concepts to remember ? Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

What can be discovered about a gene by a database search? A little or a lot, depending on the gene Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc. Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc. Structural information: associated protein structures, fold types, structural domains Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc. Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Searching sequence databases Start from sequence, find information about it Many kinds of input sequences Could be amino acid or nucleotide sequence Genomic or mRNA/cDNA or protein sequence Complete or fragmentary sequences Exact matches are rare (even uninteresting in many cases), so often goal is to retrieve a set of similar sequences. Both small (mutations) and large (required for function) differences within “similar” can be interesting. Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

What might we want to know about a sequence? Is this sequence similar to any known genes? How close is the best match? Significance? What do we know about that gene? Genomic (chromosomal location, allelic information, regulatory regions, etc.) Structural (known structure? structural domains? etc.) Functional (molecular, cellular & disease) Evolutionary information: Is this gene found in other organisms? What is its taxonomic tree? Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

NCBI Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

To carry out its diverse responsibilities, NCBI: Conducts research on fundamental biomedical problems at the molecular level using mathematical and computational methods. Maintains collaborations with several NIH institutes, academia, industry, and other governmental agencies Fosters scientific communication by sponsoring meetings, workshops, and lecture series Supports training on basic and applied research in computational biology for postdoctoral fellows through the NIH Intramural Research Program Engages members of the international scientific community in informatics research and training through the Scientific Visitors Program Develops, distributes, supports, and coordinates access to a variety of databases and software for the scientific and medical communities Develops and promotes standards for databases, data deposition and exchange, and biological nomenclature Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Entrez The Entrez Global Query Cross-Database Search System is a federated search engine, or web portal that allows users to search many discrete health sciences databases at the National Center for Biotechnology Information (NCBI) website. Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Classification of biological databases Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Characteristics of entries in the primary nucleotide repositories • The large nucleotide databases are not hand-curated: the quality of the information is largely dependent on the people submitting the sequence. • Records can be updated by the original submitter, or by a third party if the submitter granted them permission and notified the relevant institute (not common). • There are redudant entries in these databases. • Entries can contradict one another. • Predicted or known proteins coded for by the sequence are linked to via their accession number in the Uniprot knowledgebase. • Information from any species, including sequences of unknown origin, can be deposited in the database. Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

GenBank GenBank® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. • Collaboration between NCBI (National Center for Biotechnology Information), EMBL (The European Molecular Biology Laboratory ), EBI (European Bioinformatics Institute), DDBJ (DNA Data Bank of Japan). Each record in GenBank is in a “GenBank flat file format”. • Each record contains information about a sequence type (DNA/protein/RNA……) • source/organism, reference, …… • features • functions of a region on the sequence • The sequence Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

GenBank http://www.ncbi.nlm.nih.gov/genbank/ Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Flat File Format of GenBank Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Abraxane is a chemotherapeutic drug. How will you determine the molecular target of the drug ? Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

EMBL-EBI The roots of the EMBL-EBI lie in the world's first nucleotide sequence database The EMBL Nucleotide Sequence Data Library (now EMBL Bank, part of the European Nucleotide Archive), which was established in 1980 at the European Molecular Biology Laboratory in Heidelberg, Germany. The original goal was to establish a central database of DNA sequences, rather than have scientists submit sequences to journals. Data retrieval is done by employing SRS which connects the primary DNA-Protein databases along with secondary and specialised database MEDLINE is used for reference application Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

UniProt/SWISS-Prot The mission of UniProt is to provide the scientific community with a comprehensive, high quality and freely accessible resource of protein sequence and functional information. UniProt is comprised of four components, each optimised for different uses: The  UniProt Knowledgebase (UniProtKB)  is the central access point for extensive curated protein information, including function, classification, and cross-reference. UniProtKB comprises two sections:  UniProtKB/Swiss-Prot  which is manually annotated and is reviewed UniProtKB/TrEMBL  which is automatically annotated and is not reviewed. The  UniProt Reference Clusters (UniRef)  databases provide clustered sets of sequences from the UniProtKB and selected UniProt Archive records to obtain complete coverage of sequence space at several resolutions while hiding redundant sequences. The  UniProt Archive (UniParc)  is a comprehensive repository, used to keep track of sequences and their identifiers. Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

UniProt/SWISS-Prot Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Flat File Format UniProt/SWISS-Prot Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun

Dr. Lokesh Gambhir, Assistant professor, Department of Life Sciences, SGRRITS, Dehradun