NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.

Slides:



Advertisements
Similar presentations
Databases – especially for literature June 17, 2008 Learning objectives: Become aware of the general arrangement of biological data in the public databases.
Advertisements

NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1.
On line (DNA and amino acid) Sequence Information Lecture 7.
BIOINFORMATICS Ency Lee.
Databases – especially for literature June 17, 2008 Learning objectives: Become aware of the general arrangement of biological data in the public databases.
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Literature Databases June 14, 2005 Learning objectives: What is the general arrangement of biological data in the public databases? How does one retrieve.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Sequence Similarity Searching Class 4 March 2010.
Archives and Information Retrieval
Sequence analysis June 20, 2006 Learning objectives-Understand sliding window programs. Understand difference between identity, similarity and homology.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Bioinformatics and Phylogenetic Analysis
Databases – especially for literature June 19, 2007 Learning objectives: Become aware of the general arrangement of biological data in the public databases.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Sequence analysis June 19, 2007 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence analysis June 17, 2003 Learning objectives-Review amino acids structures. Understand sliding window programs. Understand difference between identity,
Alignment methods April 12, 2005 Return Homework (Ave. = 7.5)
Course Summary June 2, 2005 Programming Workshop Overview of course (presentation) Protein modeling, part 2 Instructor evaluations.
Sequence comparisons June 23, 2009 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Introductory Overview
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Sequence Databases What are they and why do we need them.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Literature Databases June 17, 2003 Learning objectives- What is the general arrangement of biological data in the public databases? How does one retrieve.
Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
NCBI Literature Databases: PubMed
Basic Local Alignment Search Tool BLAST Why Use BLAST?
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
EB3233 Bioinformatics Introduction to Bioinformatics.
Computer Storage of Sequences
Welcome to the Bioinformatics Workshop July 28, 2003 Introduction Workshop objectives Module 1: Retrieval of literature dealing with molecular life sciences.
Sequence Alignment.
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Bioinformatics Overview
Research Paper on BioInformatics
Archives and Information Retrieval
생물정보학 Bioinformatics.
What is Bioinformatics?
Mangaldai College, Mangaldai
Genomes and Their Evolution
There are four levels of structure in proteins
Introduction to Bioinformatics
Protein Structures.
Basic Local Alignment Search Tool
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between a research article and review article. Understand the concept of sliding window programs. Understand difference between identity, similarity and homology. Appreciate that proteins can be modular Workshop-Learn how to use OMIM and obtain DNA and proteins sequences associated with diseases. Perform sliding window to compute %(G+C) as a function of position in sequence. Homework due Tuesday, Oct. 2 nd.

Primary public domain bioinformatics servers Public Domain Bioinformatics Facilities European Bioinformatics Institute (EBI) United Kingdom National Center For Biotechnology Information (NCBI) United States Genome Net (KEGG & DDBJ) Japan Databases Analysis Tools Databases Analysis Tools Databases Analysis Tools

NCBI ENTREZ A platform that provides access to and links to databases with biological information ENTREZ PubMed GenBank Protein databases Genomes PopSet Taxonomy OMIM MedLine

NCBI ENTREZENTREZ GenBank Protein databases Genomes PopSet Taxonomy OMIM MedLine Literature Database Database of DNA sequences that have been collected to analyze the evolutionary relatedness of a population. Database of human genes and genetic disorders Database of all publicly available DNA sequences Database of amino acid sequences from Uniprot, Protein Research Foundation, PDB. Database of genomes from organisms and viruses Database of names of organisms with sequences in GenBank.

Literature Databases Medline/Pubmed OMIM CSULA Library Bookshelf (from NCBI) Melvyl (Books at UC Libraries) Other molecular life science databases Science Direct Pub Med Central Free Medical Journals LinkOut Journals Wiley InterScience

OMIM-Online Mendelian Inheritance in Man A catalog of human genes linked to diseases Victor A. McKusick at Johns Hopkins University A good place to start when you want to research a certain disease or biological molecule This database is cross-referenced to PubMed and other NCBI-based databases

Sliding window A sliding window-gathers information about properties of nucleotides or amino acids. GCATATGCGCATATCCCGTCAATACCA A simple example is to calculate the %(G+C) content within a window. Then move the window one nucleotide and repeat the calculation.

Sliding window If the window is too small it is difficult to detect the trend of the measurement. If too large you could miss meaningful data. Large window size Small window size %(G+C) Sequence number

Sliding window Adapted from Zhao et al, BMC Genomics Nov 7;8:403.

Amino acid characteristics

Four levels of protein structure 1) Primary 2) Secondary 3) Tertiary 4) Quaternary Linear sequence- AGHIPLLQ Initial folding patterns- AGHIPLLQ  TTT  Complex folding patterns- Interactions between polypeptides

Kyte-Doolittle Hydropathy – A sliding window software program [J. Mol. Biol. 157: (1982)]. The seven known membrane-spanning regions are numbered 1-7 in red on the plot. Note that this particular software program averaged the hydropathy values in the window ( The original program by Kyte and Doolittle summed the hydropathy values.

Dot Plot with window = 1 Window = 1 Note that 25% of the table will be filled due to random chance. 1 in 4 chance at each position

Dot Plot with window = 3 Window = 3 The larger the window the more noise can be filtered What is the percent chance that you will receive a match randomly? One in (four) 3 chance. (¼) 3 * 100 = 1.56%

Do workshop #2 Answer questions 1-3

Evolutionary Basis of Sequence Alignment 1. Identity: Quantity that describes how much two sequences are alike in the strictest terms. 2. Similarity: Quantity that relates how much two amino acid sequences are alike. 3. Homology: A conclusion drawn from data suggesting that two genes share a common evolutionary history.

Purpose of finding differences and similarities of amino acids in two proteins. Infer structural information Infer functional information Infer evolutionary relationships

Modular nature of proteins Proteins possess local regions of similarity. Proteins can be thought of as assemblies of modular domains.

Two proteins that are similar in certain regions Tissue plasminogen activator (PLAT) Coagulation factor 12 (F12). Baxevanis and Ouellette, Bioinformatics, Wiley-Interscience, New York, 2001

The Dotter Program Program consists of three components: Sliding window A table that gives a score for each amino acid match A graph that converts the score to a dot of certain density (the higher the dot density the higher the score)

Dot plot of sequence alignment highlighting Kringle domain alignments. Adapted from Baxevanis, Ouellette: Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, 2nd Edition.