Bioinformatics Resources and Tools on the Web: A Primer.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
BIOINFORMATICS Ency Lee.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Archives and Information Retrieval
Protein structure (Part 2 of 2).
Bioinformatics and Phylogenetic Analysis
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
How to use the web for bioinformatics Ethan Strauss X 1171
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Bioinformatics.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Part I: Identifying sequences with … Speaker : S. Gaj Date
11 Overview Paracel GeneMatcher2. 22 GeneMatcher2 The GeneMatcher system comprises of hardware and software components that significantly accelerate a.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Protein Secondary Structure, Bioinformatics Tools, and Multiple Sequence Alignments Finding Similar Sequences Predicting Secondary Structures Predicting.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Function preserves sequences
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Bioinformatics and Computational Biology
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Sequence Alignment.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Bioinformatics Shared Resource Bioinformatics : How to… Bioinformatics Shared Resource Kutbuddin Doctor, PhD.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
What is BLAST? Basic BLAST search What is BLAST?
Bio/Chem-informatics
Basics of BLAST Basic BLAST Search - What is BLAST?
Demo: Protein Information Resource
Genome Annotation Continued
Genome Center of Wisconsin, UW-Madison
Sequence Based Analysis Tutorial
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bioinformatics Resources and Tools on the Web: A Primer

Outline Introduction: What is bioinformatics? The basics –The five sites that all biologists should know Some examples –Using the tools in a somewhat less-than-naïve manner Questions/comments are welcome at all points Much of this material comes from the Boston University course: BF527 Bioinformatic Applications (

What is bioinformatics?

Examples of Bioinformatics Database interfaces –Genbank/EMBL/DDBJ, Medline, SwissProt, PDB, … Sequence alignment –BLAST, FASTA Multiple sequence alignment –Clustal, MultAlin, DiAlign Gene finding –Genscan, GenomeScan, GeneMark, GRAIL Protein Domain analysis and identification –pfam, BLOCKS, ProDom, Pattern Identification/Characterization –Gibbs Sampler, AlignACE, MEME Protein Folding prediction –PredictProtein, SwissModeler

Things to know and remember about using web server-based tools You are using someone else’s computer You are (probably) getting a reduced set of options or capacity Servers are great for sporadic or proof-of- principle work, but for intensive work, the software should be obtained and run locally

Five websites that all biologists should know NCBI (The National Center for Biotechnology Information; – EBI (The European Bioinformatics Institute) – The Canadian Bioinformatics Resource – SwissProt/ExPASy (Swiss Bioinformatics Resource) – PDB (The Protein Databank) –

NCBI ( Entrez interface to databases –Medline/OMIM –Genbank/Genpept/Structures BLAST server(s) –Five-plus flavors of blast Draft Human Genome Much, much more…

EBI ( SRS database interface –EMBL, SwissProt, and many more Many server-based tools –ClustalW, DALI, …

SwissProt ( Curation!!! –Error rate in the information is greatly reduced in comparison to most other databases. Extensive cross-linking to other data sources SwissProt is the ‘gold-standard’ by which other databases can be measured, and is the best place to start if you have a specific protein to investigate

A few more resources to be aware of Human Genome Working Draft – TIGR (The Institute for Genomics Research) – Celera – (Model) Organism specific information: –Yeast: –Arabidopis: –Mouse: –Fruitfly: –Nematode: Nucleic Acids Research Database Issue – (First issue every year)

Example 1: Searching a new genome for a specific protein Specific problem: We want to find the closest match in C. elegans of D. melanogaster protein NTF1, a transcription factor First- understanding the different forms of blast

The different versions of BLAST

1 st Step: Search the proteins blastp is used to search for C. elegans proteins that are similar to NTF1 Two reasonable hits are found, but the hits have suspicious characteristics –besides the fact that they weren’t included in the complete genome !

2 nd Step: Search the nucleotides tblastn is used to search for translations of C. elegans nucleotide that are similar to NTF1 Now we have only one hit –How are they related?

Conclusion: Incorrect gene prediction/annotation The two predicted proteins have essentially identical annotation The protein-protein alignments are disjoint and consecutive on the protein The protein-nucleotide alignment includes both protein-protein alignments in the proper order Why/how does this happen?

Final(?) Check: Gene prediction Genscan is the best available ab initio gene predictor – Genscan’s prediction spans both protein- protein alignments, reinforcing our conclusion of a bad prediction

Ab initio vs. similarity vs. hybrid models for gene finding Ab initio: The gene looks like the average of many genes –Genscan, GeneMark, GRAIL… Similarity: The gene looks like a specific known gene –Procrustes,… Hybrid: A combination of both –Genomescan (

A similar example: Fruitfly homolog of mRNA localization protein VERA Similar procedure as just described –Tblastn search with BLOSUM45 produces an unexpected exon Conclusion: Incomplete (as opposed to incorrect) annotation –We have verified the existence of the rare isoform through RT- PCR

Another example: Find all genes with pdz domains Multiple methods are possible The ‘best’ method will depend on many things –How much do you know about the domain? –Do you know the exact extent of the domain? –How many examples do you expect to find?

Some possible methods if the domain is a known domain: SwissProt –text search capabilities –good annotation of known domains –crosslinks to other databases (domains) Databases of known domains: –BLOCKS ( –Pfam ( –Others (ProDom, ProSite, DOMO,…)

Determination of the nature of conservation in a domain For new domains, multiple alignment is your best option –Global: clustalw –Local: DiAlign –Hidden Markov Model: HMMER For known domains, this work has largely been done for you –BLOCKS –Pfam

If you have a protein, and want to search it to known domains Search/Analysis tools –Pfam –BLOCKS –PredictProtein ( )

Different representations of conserved domains BLOCKS –Gapless regions –Often multiple blocks for one domain PFAM –Statistical model, based on HMM –Since gaps are allowed, most domains have only one pfam model

Conclusions We have only touched small parts of the elephant Trial and error (intelligently) is often your best tool Keep up with the main five sites, and you’ll have a pretty good idea of what is happening and available