Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotation of eukaryotic genomes

Similar presentations


Presentation on theme: "Annotation of eukaryotic genomes"— Presentation transcript:

1 Annotation of eukaryotic genomes
Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA Comparative gene prediction translation Nascent polypeptide folding Active enzyme Functional identification Function Reactant A Product B

2 Genome analysis overview: C.elegans

3 Gene finding: ab initio
What features of a ORF can we use? Size - large open reading frames DNA composition - codon usage / 3rd position codon bias Other features: Kozak sequence CCGCCAUGG Ribosome binding sites Termination signal (stops) Splice junction boundaries

4 Gene finding: comparative
Use knowledge of known coding sequences to identify region of genomic DNA by similarity transcribed DNA sequence peptide sequence related genomic sequence

5 Annotation of eukaryotic genomes
Genomic DNA ab initio gene prediction transcription Unprocessed RNA RNA processing Mature mRNA Gm3 AAAAAAA Comparative gene prediction translation Nascent polypeptide folding Active enzyme Functional identification Function Reactant A Product B

6

7 Artemis display for S.pombe cosmid

8

9

10 Methods for searching Needleman & Wunsch - global alignment
Pairwise alignments: matching a query sequence against a database of subject sequences Needleman & Wunsch - global alignment Smith-Waterman - local alignment FastA BLAST Others: SSAHA, WABA see Chapter 7 Developing Bionformatics Computer Skills

11 BLAST - local similarity searches
BLAST (Basic Local Alignment Search Tool) is the workhorse of genome annotation due to it’s early optimisation for the UNIX platform Underlies most of the web-based servers world-wide Comes in many flavours: BLASTN - DNA against DNA BLASTX - DNA against Protein BLASTP - Protein against Protein TBLASTN - Protein against DNA TBLASTX - DNA against DNA at the peptide level

12 BLAST - results BLAST returns high-scoring pairs (HSPs) with a score and p-value. Blast output files can be large and difficult to interpret. Hence we need tools to make sense of the data - both to filter/process the file and to visualise the resulting multiple sequence alignments. MSPcrunch - a post-processor for BLAST with a number of different output types. BioPerl - modules for handling sequences and BLAST output

13 Standard similarity searches for first-pass annotation
genomic DNA v transcript data BLASTN / EST_GENOME TBLASTX genomic DNA v genomic DNA BLASTN genomic DNA v non-redundant protein data BLASTX

14 Data for gene prediction
EST/mRNA - intra-species matches TBLASTX - inter-species matches BLASTX - intra-species matches BLASTX - inter-species matches Coding measures - genefinder, hexamer Splice sites - consensus sequences

15

16 Multiple Sequence alignments in ACEDB

17 Manual review of gene predictions
Check concordance with transcript data Check concordance with peptide similarity data Check splice site usage (intron / exon boundaries) Set of human appraised gene predictions. The translations of the CDS sequences are used for protein feature analysis and initial assignment (ID, function)

18


Download ppt "Annotation of eukaryotic genomes"

Similar presentations


Ads by Google