An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.

An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon

Basic Reference Information “Gene Finding in Novel Genomes” Written by Ian Korf BMC Bioinformatics Published in May of 2004 http://www.biomedcentral.com/1471-2105/5/59

Purpose of Gene Finding Given a genome, we would like to predict which areas actually code for proteins and which areas do not This is important because we can then focus on the areas that actually code for something Can also point us at places in the genome to look for unknown genes

Gene Finding Techniques Gene Finding is very difficult to do accurately Current methods employ Hidden Markov Models to discover genes We are able to recognize patterns by training our HMM with test data where we already know which areas are genes and which are not

Gene finding in new Genomes The problem is that we are sequencing genomes faster than we can research them and therefore we have a lack of training sets to create good HMMs Currently, the best way to find genes in new genomes is to use a program designed for a different genome and hope it gives a good approximation

SNAP – Korf’s Approach Korf believes that the current approach does not provide a good approximation for finding genes in new genomes Designed SNAP, which runs several other gene finding programs and estimates parameters based on their results SNAP also uses a Hidden Markov Model

SNAP HMM State Diagram E: Exon State I: Intron State N: Intergenic

Methods of Testing Used genomes from A. thaliana, O. sativa, C. elegans, and D. melanogaster. Simple genomes Compared his software to other leading gene finding software including Genescan, Genefinder, HMMGene, and Augustus Compared how well the programs performed

Data Used in Testing Table 1. Data set characteristics At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa.Arabidopsis thalianaCaenorhabditis elegansDrosophila melanogasterOryza sativa GenomeSequenceGenesGCSingle-exon GenesMean ExonMean Intron At 1.89 Mb63137.3%19.8%230 bp157 bp Ce 3.02 Mb62636.1%2.2%220 bp334 bp Dm 3.66 Mb60243.6%24.9%394 bp948 bp Os 1.55 Mb42444.5%22.9%237 bp350 bp

Performance of SNAP

Parameters taken from other species

Analysis of parameters that his program used and demonstration of how they would be better suited for new genomes

Next Steps Since he used a relatively simple genome, the next step is to analyze larger genomes to see if he gets similar results Gene finding is still very difficult and additional research will be made regarding how to better estimate HMM parameters

My Opinions Results were very clear and organized Program is available free online Needed a better explanation of how his program took results from other programs and used that information Better documentation for his program so that more people are able to use and specialize it for specific genomes

An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.

Similar presentations

Presentation on theme: "An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.

Similar presentations

Presentation on theme: "An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon."— Presentation transcript:

Similar presentations

About project

Feedback