Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon.

Similar presentations


Presentation on theme: "An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon."— Presentation transcript:

1 An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon

2 Basic Reference Information “Gene Finding in Novel Genomes” Written by Ian Korf BMC Bioinformatics Published in May of 2004 http://www.biomedcentral.com/1471-2105/5/59

3 Purpose of Gene Finding Given a genome, we would like to predict which areas actually code for proteins and which areas do not This is important because we can then focus on the areas that actually code for something Can also point us at places in the genome to look for unknown genes

4 Gene Finding Techniques Gene Finding is very difficult to do accurately Current methods employ Hidden Markov Models to discover genes We are able to recognize patterns by training our HMM with test data where we already know which areas are genes and which are not

5 Gene finding in new Genomes The problem is that we are sequencing genomes faster than we can research them and therefore we have a lack of training sets to create good HMMs Currently, the best way to find genes in new genomes is to use a program designed for a different genome and hope it gives a good approximation

6 SNAP – Korf’s Approach Korf believes that the current approach does not provide a good approximation for finding genes in new genomes Designed SNAP, which runs several other gene finding programs and estimates parameters based on their results SNAP also uses a Hidden Markov Model

7 SNAP HMM State Diagram E: Exon State I: Intron State N: Intergenic

8 Methods of Testing Used genomes from A. thaliana, O. sativa, C. elegans, and D. melanogaster. Simple genomes Compared his software to other leading gene finding software including Genescan, Genefinder, HMMGene, and Augustus Compared how well the programs performed

9 Data Used in Testing Table 1. Data set characteristics At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa.Arabidopsis thalianaCaenorhabditis elegansDrosophila melanogasterOryza sativa GenomeSequenceGenesGCSingle-exon GenesMean ExonMean Intron At 1.89 Mb63137.3%19.8%230 bp157 bp Ce 3.02 Mb62636.1%2.2%220 bp334 bp Dm 3.66 Mb60243.6%24.9%394 bp948 bp Os 1.55 Mb42444.5%22.9%237 bp350 bp

10 Performance of SNAP

11 Parameters taken from other species

12 Analysis of parameters that his program used and demonstration of how they would be better suited for new genomes

13 Next Steps Since he used a relatively simple genome, the next step is to analyze larger genomes to see if he gets similar results Gene finding is still very difficult and additional research will be made regarding how to better estimate HMM parameters

14 My Opinions Results were very clear and organized Program is available free online Needed a better explanation of how his program took results from other programs and used that information Better documentation for his program so that more people are able to use and specialize it for specific genomes


Download ppt "An Analysis of “Gene Finding in Novel Genomes” Michael Sneddon."

Similar presentations


Ads by Google