Sequence Databases What are they and why do we need them.

Slides:



Advertisements
Similar presentations
1.1.3 MI.
Advertisements

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
BIOINFORMATICS Ency Lee.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Evidence-Based Information Retrieval in Bioinformatics
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics and Phylogenetic Analysis
Lecture 2.21 Retrieving Information: Using Entrez.
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Bioinformatics & LIS A brief talk for librarians, information scientists, and computer scientists about resources and collaborative opportunities with.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Course Module: Introduction to Bioinformatics – CS 2001 July CS Databases.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
Organizing information in the post-genomic era The rise of bioinformatics.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Gramene Objectives Provide researchers working on grasses and plants in general with a bird’s eye view of the grass genomes and their organization. Work.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Bioinformatics Curriculum Issues, goals, curriculum.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Research Paper on BioInformatics
Introduction to Bioinformatics
Bioinformatics for Research
Archives and Information Retrieval
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Access to Sequence Data and Related Information
Genomes and Their Evolution
Relationship between Genotype and Phenotype
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Biological Databases BI420 – Introduction to Bioinformatics
Relationship between Genotype and Phenotype
1.1.3 MI.
Evolution of Genomes Chapter 21.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Sequence Databases What are they and why do we need them

DNA, RNA and Protein (Amino Acids) What is sequence data? Why do I need it? Evolution Mutation Natural Selection Intra and Inter-species relationships Niche exploitation Ecosystems REALLY?

Phenotypes come from the proteins. Proteins come from the DNA via RNA. Changes in DNA cause changes in proteins. Changes in proteins cause changes in phenotypes. YES! Evolution Mutation Natural Selection Intra and Inter-species relationships Niche exploitation Ecosystems Intra and Inter-species relationships Niche exploitation Ecosystems Phenotypes How do we find those changes? Sequencing

What do Databases let you do? Explore and investigate sequence data  Classify organisms  Assign a possible function to a gene  Verify a sequences identity  Annotate a genome  Design primers for PCR and probe experiments Is the Sequence everything? The sequence itself is not informative; it must be analyzed by comparative methods against existing databases to develop hypothesis concerning relatives and function.

What is a Database? Databases allow us to more easily find what we need

What Databases are there? Ten Important Bioinformatics Databases NameAddressDescription GenBank/DDBJ/EMBLwww.ncbi.nlm.nih.govNucleotide sequences Ensemblwww.ensembl.orgHuman/Mouse genome PubMedwww.ncbi.nlm.nih.govLiterature references NRwww.ncbi.nlm.nih.govProtein sequences SWISS-PROTwww.expasy.chProtein sequences InterProwww.ebi.ac.ukProtein domains OMIMwww.ncbi.nlm.nih.govGenetic diseases Enzymeswww.chem.qmul.ac.ukEnzymes PDBwww.rcsb.org/pdbwww.rcsb.org/pdb/Protein structures KEGGwww.genome.ad.jpMetabolic pathways Many other specialized Databases are available. Bioinformatics for Dummies, 2003

What Database should I use? A.K.A. GenBank

How big is GenBank? 1977 DNA Sequencing 1985 PCR 1987 Automated Sequencing 1997 Capillary Sequencing

Who can put data into GenBank? Sequence data are submitted to GenBank from scientists from around the world. Warning: GenBank does not check the validity or accuracy of sequences submitted. This is left up to the scientific community to verify, like all published scientific data.

How do I use GenBank? Problem 1. You are constructing a phylogeny of Euglenoids and you have determined from the literature that the Beta-tubulin gene is a good gene for this purpose. How do I start???

How do I use GenBank? Euglenozoa AND tubulinNOT kinetoplastida AF182759

How do I use GenBank? Problem 2. You are studying domestication of Sorghum vulgare. From reading about sorghum you find out that it is closely related to Zea mays. You also find out that maize has a wild relative teosinte that forms multiple stocks. Domesticated maize forms a single stock. Domesticated sorghum has a single stock while wild sorghum (Johnsongrass) has multiple stocks.

Sorghum vulgare Sorghum halepense Johnsongrass Wild Broomcorn (Sorghum) Domesticated

How do I use GenBank? Problem 2. Continued Moreover, the paper states that this trait is controlled by a single gene teosinte branched 1 (tb1). You wonder “Does sorghum have this gene?”. The paper does provide a set (Forward and Reverse) PCR primers that where used to isolate and sequence the tb1 gene. Will they work for Sorghum?

Sequencing Sorghum

>Sorghum_vulgare_sequence ATGGACTTACCGCTTTACCAACAACTGCAGCTCAGCCCGCCTTCCCCAAAGCCGGACCAATCAAGCAGCT TCTACTGCTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGCCGCCGACGCCAGCTTTCACCTGAGCTA CCAGATCGGTAGTGCCGCCGCCGCCATCCCTCCACAAGCCGTGATCAACTCGCCGGAGGACCTGCCGGTG CAGCCGCTGATGGAGCAGGCGCCGGCGCCGCCTACAGAGCTTGTCGCCTGCGCCAGTGGTGGTGCACAAG GCGCCGGCGTCAGCGTCAGCCTGGACAGGGCGGCGGCCGCGGCCGCCGCGAGGAAAGACCGGCACAGCAA GATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTGTCCCTTGACGTCGCCCGCAAGTTCTTC GCGCTCCAGGACATGCTTGGCTTCGACAAGGCCAGCAAGACGGTACAATGGCTCCTCAACACGTCCAAGG CCGCCATCCAGGAGATCATGGCCGACGACGTCGACGCGTCGTCGGAGTGCGTGGAGGATGGCTCCAGCAG CCTCTCCGTCGACGGCAAGCACAACCCGGCGGAGCAGCTGGGAGATCAGAAGCCCAAGGGTAATGGCCGC AGCGAGGGGAAGAAGCCGGCCAAGTCAAGGAAGGCGGCGACCACCCCAAAGCCGCCAAGAAAATCGGGGA ATAATGCGCACCCGGTCCCCGACAAGGAGACGAGGGCGAAGGCGAGGGAGAGGGCGAGGGAGCGAACCAA GGAGAAGCACCGGATGCGTTGGGTAAAGCTTGCATCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGTG GCTAGCGACAGGCCGAGCTCGAACCATTTGAACCACCACCACCACTCATCGTCGTCCATGAACATGCCGC GTGCTGCGGAGGCTGAATTGGAGGAGAGGGAGAGGTGCTCATCAACTCTCAACAATAGAGGAAGGATGCA AGAAATCACAGGGGCGAGCGAGGTGGTCCTAGGCTTTGGCAACGGAGGAGGATACGGCGGCGGCAACTAC TACTGCCAAGAACAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGA Does sorghum have the tb1 gene?

Resources at NCBI GenBank – Molecular Databases Nucleotides, Proteins, Structures, Expression (ESTs) and Taxonomy. Literature Databases PubMed, Journals, OMIM, Book, and Citation Matcher. Genomes and Maps – Entrez Map Viewer, UniGene, COGs, Organism-specific, Organelle, Virus, and Plasmids. Tools – Software Engineering BLAST, Sequence Analysis, 3-D Structures, Gene Expression, Literature and Genome Analysis. Education Books, Courses, Public Information. Research Biology, Computers.

Objectives 1.Explain what can you do with sequence data. 2.Explain what a database is. 3.Describe what kinds of data and resources are available. 4.Describe some of the uses of databases.

Other Specialty Databases