Vertebrate natural history in the 21 st century: genetics, ecology, and evolution Andrew DeWoody Purdue University.

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
BLAST Sequence alignment, E-value & Extreme value distribution.
Types of homology BLAST
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics and Phylogenetic Analysis
Slide 1 EE3J2 Data Mining Lecture 20 Sequence Analysis 2: BLAST Algorithm Ali Al-Shahib.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Sequence alignment, E-value & Extreme value distribution
Sequence comparison: Local alignment
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Metagenomics Binning and Machine Learning
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Automatic methods for functional annotation of sequences Petri Törönen.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Metagenomic Analysis Using MEGAN4
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BLAST benchmarks George Coulouris NCBI/NLM/NIH June 2005.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Denovo genome assembly and analysis
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
Team Conoscenza Bioinformatics Tan Jian Wei ~ Tan Fengnan.
RNA Sequencing I: De novo RNAseq
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Using SWARM service to run a Grid based EST Sequence Assembly Karthik Narayan Primary Advisor : Dr. Geoffrey Fox 1.
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Having a Blast! on DiaGrid Carol Song Rosen Center for Advanced Computing December 9, 2011.
Bioinformatics Lecture to accompany BLAST/ORF finder activity
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
De novo assembly validation
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
SRB Genome Assembly and Analysis From 454 Sequences HC70AL S Brandon Le & Min Chen.
What is BLAST? Basic BLAST search What is BLAST?
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Heuristic Methods for Sequence Database Searching BMI/CS 776 Mark Craven February 2002.
Rick Westerman Purdue Genomics
What is BLAST? Basic BLAST search What is BLAST?
Basics of BLAST Basic BLAST Search - What is BLAST?
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Local alignment
Genome Center of Wisconsin, UW-Madison
This tutorial is designed to be used in a “follow along” fashion
Gene Annotation with DNA Subway
Identify D. melanogaster ortholog
Comparative Genomics.
The ability of the SOP to sequence and identify unknown samples.
Basic Local Alignment Search Tool
Parallel System for BLAST
Basic Local Alignment Search Tool (BLAST)
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Sequence alignment, E-value & Extreme value distribution
Schematic representation of a transcriptomic evaluation approach.
Comparison of species and function profiles with ultradeep sequencing data. Comparison of species and function profiles with ultradeep sequencing data.
Presentation transcript:

Vertebrate natural history in the 21 st century: genetics, ecology, and evolution Andrew DeWoody Purdue University

BLASTed computers! Some research from the DeWoody lab

Scale we are looking for genes which underlie traits of evolutionary interest in non-models – e.g., osmoregulatory genes in kangaroo rats – genes involved in salamander metamorphosis – MHC genes RNA-seq/transcriptomics – so far, mostly 454 data… – small in terms of genome projects, but large enough to be computationally problematic (for us) this is NOT a how-to talk!

Nick: BLAST annotation of a de novo transcriptome assembly kangaroo rat transcriptome sequences were assembled from 454 runs (i.e., RNA-seq) yielded 20,484 contigs for kidney tissue and 23,376 contigs for spleen tissue conducted BLASTx search to compare the sequences against the nr database on NCBI in the program Blast2GO ® (Götz et al. 2008) – to find known proteins that match our cDNA reads

Time required Blast2GO ® sends out queries in batches of 5 sequences – search settings included a cutoff of <1e -6 for the minimum e-value of a match and returning only the top 5 hits i.e., we get 25 hits from each batch query – used Genomics server we setup the search 1 week ago (2 7pm) for kidney contigs AND for spleen contigs – they were each 85% finished as of 11am today a separate search is necessary for each additional database (e.g., Swiss-Prot )

Nick’s wish list increase the number of queries possible at any given time (i.e., >>5) while retaining flexibility allow the user to specify more options – e.g., limiting the BLAST database to specific taxonomic groups allow the user to specify multiple databases in the same query (e.g. return the top BLAST hit from a gene for both the Swiss-Prot and NCBI’s nr databases during the same BLAST search) – i.e., search in parallel

Kendra: Kangaroo rat singletons Goal: To isolate MHC genes in kangaroo rats we considered ~80,000 sequences that were not assembled into contigs – BLASTn (cutoff e -15 ) – analysis took 10 days on PC using Java Applet over the internet we know, not the best approach! – yielded 300 hits

Kendra’s wish list would like to harvest hits for best e-value and max length – rank correlation? would also like to know the top hit categories/descriptions and number of times they occur – i.e., would like a more precise tool than the blunt offerings of BLAST2GO… probably beyond Carol’s scope

Bamboo’s BLAST Search Summary ServerProgramAverage length of query sequences Number of query sequences Database/Letters in Database Time Genomics Server (cys. genomics.purdue.ed u/gln.genomics.pur due.edu) BLASTn1491nucleotide (nt): GenBank+EMBL+D DBJ+PDB/ 39,038,493,869 ~3 min ~17 hr 13921Contigs of E51K/ 2,379,034 < 1 min Kangaroo rat kidney; query sequences in the 1st and 3rd row are randomly chosen transposable elements from Repbase.