Presentation on theme: "Vertebrate natural history in the 21 st century: genetics, ecology, and evolution Andrew DeWoody Purdue University."— Presentation transcript:
Vertebrate natural history in the 21 st century: genetics, ecology, and evolution Andrew DeWoody Purdue University
BLASTed computers! !@#$%&* Some research from the DeWoody lab
Scale we are looking for genes which underlie traits of evolutionary interest in non-models – e.g., osmoregulatory genes in kangaroo rats – genes involved in salamander metamorphosis – MHC genes RNA-seq/transcriptomics – so far, mostly 454 data… – small in terms of genome projects, but large enough to be computationally problematic (for us) this is NOT a how-to talk!
Nick: BLAST annotation of a de novo transcriptome assembly kangaroo rat transcriptome sequences were assembled from 454 runs (i.e., RNA-seq) yielded 20,484 contigs for kidney tissue and 23,376 contigs for spleen tissue conducted BLASTx search to compare the sequences against the nr database on NCBI in the program Blast2GO ® (Götz et al. 2008) – to find known proteins that match our cDNA reads
Time required Blast2GO ® sends out queries in batches of 5 sequences – search settings included a cutoff of <1e -6 for the minimum e-value of a match and returning only the top 5 hits i.e., we get 25 hits from each batch query – used Genomics server we setup the search 1 week ago (2 Dec @ 7pm) for kidney contigs AND for spleen contigs – they were each 85% finished as of 11am today a separate search is necessary for each additional database (e.g., Swiss-Prot )
Nick’s wish list increase the number of queries possible at any given time (i.e., >>5) while retaining flexibility allow the user to specify more options – e.g., limiting the BLAST database to specific taxonomic groups allow the user to specify multiple databases in the same query (e.g. return the top BLAST hit from a gene for both the Swiss-Prot and NCBI’s nr databases during the same BLAST search) – i.e., search in parallel
Kendra: Kangaroo rat singletons Goal: To isolate MHC genes in kangaroo rats we considered ~80,000 sequences that were not assembled into contigs – BLASTn (cutoff e -15 ) – analysis took 10 days on PC using Java Applet over the internet we know, not the best approach! – yielded 300 hits
Kendra’s wish list would like to harvest hits for best e-value and max length – rank correlation? would also like to know the top hit categories/descriptions and number of times they occur – i.e., would like a more precise tool than the blunt offerings of BLAST2GO… probably beyond Carol’s scope
Bamboo’s BLAST Search Summary ServerProgramAverage length of query sequences Number of query sequences Database/Letters in Database Time Genomics Server (cys. genomics.purdue.ed u/gln.genomics.pur due.edu) BLASTn1491nucleotide (nt): GenBank+EMBL+D DBJ+PDB/ 39,038,493,869 ~3 min 4675347~17 hr 13921Contigs of E51K/ 2,379,034 < 1 min Kangaroo rat kidney; query sequences in the 1st and 3rd row are randomly chosen transposable elements from Repbase.