Eukaryotic Gene Prediction Rui Alves
How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein
How are eukaryotic genes different? DNA RNA Pol Ryb Protein mRNA Spliceosome mRNA Correctly Identifying Splicing sites is not a trivial task
How do we predict splicing sites? By Homology Ab initio –SS motifs –Codon usage –Exonic Splicing Enhancers –Intronic Splicing Enhancers –Exonic Splicing Silencers –Intronic Splicing Silencers
Homology Splice Site Prediction KnownsplicedgenePredictedsplicedgene
Splice Site Motifs
Exonic Splicing Enhancers
Exonic Splicing Silencers Genes & Development 18:
Interaction between SE and SI
Rules for Splicing 3’ end likely target for repression Distance between SE and 3’ end < 100bp Splicing efficiency p(interaction SEC-3’ end)
Methods for splicing detection Training set of know spliced genes Algorithm Test set of know spliced genes Set of know spliced genes GA, NN, HMM Bayesian GA, NN, HMM Bayes,ME Test set Predictions
A Genetic Algorithm Method MotifDM1… AM i …EM DM1 AM p(i) EM IM Shuffle lines and columns k times and each time calculate the probability of a given combination of motifs getting spliced Select m best combinations and continue to evolve the algorithm until it predicts training set
A Neural Net Method Weight Table for splice elements Hidden Nodes Sequences Predicted Splicing Corrected Weight Table for splice elements
Summary Eukaryotic genes have exons Biological rules combined with mathematical and statistical approaches can be used to predict the boundaries for the exons and to predict the splice variants
How to find what genes a string of DNA contains Rui Alves
Simple steps Go to a known gene prediction server (or google for one) Input sequence and wait for prediction Get prediction(s), either as cDNA or as a tranlated protein sequence and do homology searches to identify them in a known database (e.g. NCBI or SWISSPROT)
Simple steps a) Go to a known gene prediction server (or google for one) Input sequence and wait for prediction Get prediction(s), either as cDNA or as a translated protein sequence and do homology searches to identify them
Paper Presentation The human genome (Science) vs. The human genome (Nature) Nature : Pages 875 to 901 Science: Pages Compare the differences in methods and results for the annotation DO NOT SPEND TIME TALKING ABOUT THE SEQUENCING OR ASSEMBLY ITSELF Do not go into the comparative genome analysis