Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel
What can you do with the sequence? Gene prediction Motif identification Promoter identification Survey gene expression across tissues Full length gene isolation
NCBI Tools National Center for Biotechnology National Library of Medicine, NIH Created in 1988 to develop information systems for molecular biology. Provides data retrieval systems and computational resources.
Database Resources Database retrieval tools BLAST family of sequence-similarity search programs. Resources for gene-level sequences Resources for genome-scale analysis
Database Retrieval Tools Entrez-for DNA and protein sequences PubMed Central-for literature Taxonomy-organisms and associated sequences LocusLinks-provides links from sequence info to map and other information.
BLAST family Basic local alignment search tool Sequence similarity search against various databases in GenBank Gapped alignments with links to various other databases such as unigene or locuslink.
BLAST pairwise alignment but can do multiple alignments with “query-anchored” feature. each alignment has a statistical significance (e-value) Accounts for amino acid sequence Outputs a list of matches including start, stop, score, and e-value.
5 BLAST Programs BLASTN – Nucleotide vs. Nucleotide BLASTP – Protein vs. Protein BLASTX – Protein vs. nucleotide translation TBLASTN – Nucleotide translation vs. Protein TBLASTX – Nucleotide translation vs. nucleotide translation.
BLAST family BLAST2Sequences-dot plot of alignment MegaBLAST-nearly exact matches PSI-BLAST – match to protein that reduces false positive hits Blink – Allows display of alignments by taxonomic criteria, database origin, relation to a complete genome, relation to a 3D protein structure or conserved domain.
Gene-Level Sequences UniGene – Identifies a non-redundant set of EST based on GenBank sequences. ProtEST – displays pre-computed BLAST alignments between protein sequences from model organisms and the 6-frame translation of the UniGene nucleotide sequences.
Gene-Level Sequences HomoloGene – Curated and calculated gene lrthologs and homologs for 14 organsisms. RefSeq – Curated reference sequences for mRNAs, genomic sequences, etc. ORF Finder – 6-frame translation with graph of ORF position. ePCR – locates sequence tagged sites. dbSNP – Contains SNP and InDel
Genome-Scale Analysis Entrez Genomes – taxonomic, genome or chromosome view of the current sequence data for an organism. COGs – List of orthologous protein groups from completely sequenced organisms. Retroviroal genotyping tools – Important in viral genetic diversity, tracking outbreaks, and vaccine development.
Genome-Scale Analysis Eukaryotic Genomic Resources – location of Plant Genomes Central with information from various plant genome projects. Map Viewer – Displays genome assemblies using chromosome map views. Model Maker (MM) – Generates transcript models using exon data from prediction or from GenBank alignments.
Genome-Scale Analysis Evidence Viewer – Graphical summary of alignments relative to contigs including insertion/deletion or mismatches. Human-Mouse Homology Maps – List of genes in homologous segments. Cancer Chromosome Aberration Project – List of recurrent chromosome aberrations associated with cancer.
Gene Expression/Phenotype SAGEmap – A way to look at SAGE data inlcuding two-way mapping between SAGE tag and UniGene. Gene Expression Omnibus (GEO) – Data repository and retrieval system for expression data from all sources. OMIM – Catalog of human genes and genetic disorders including phenotypes and polymorphism information.
MMDB, CDDB, CDART Molecular Modeling Database Based on Protein Data Bank Conserved Domain Database PSI-BLAST-derived scores indicating domains in the protein data bank. Conserved Domain Architecture Retrieval Tool – Identifies conserved domains and displays their structure.
Sequence Analysis References Korf, Yandell, and Bedell. 2003. An Essential Guide to the Basic Local Alignment Search Tool: BLAST. O’Reilly & Associates, Sebastopol, CA. Markel and Leon. 2003. Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases. O’Reilly & Associates, Sebastopol, CA.
Sequence Analysis References Baxevanis and Ouellette. 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley Interscience, New York. Mount. 2000. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York.