Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.

Similar presentations


Presentation on theme: "Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De."— Presentation transcript:

1 Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De

2 The RNAseq Genome Annotation Assessment Project Introduction and a summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes The RGASP aims to assess the current progress of automatic gene building using RNAseq as its primary dataset. More specifically we aim to evaluate the status of computational methods to map human RNAseq data, assemble them into transcripts and quantify the abundance of that transcript in particular datasets. Promising transcript predictions not covered by Gencode annotation will be validated by experimental methods   

3 The RNAseq Genome Annotation Assessment Project Introduction and a summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes 3 species: human, worm and fly. Multiple RNA-seq daasets for each organism. 15 submitters. 304 submissions

4 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Analysis methodology 1.we carried out independent evaluation for the coding portions of the mRNA transcripts (CDS focused) and the mRNA transcripts as a whole (mRNA focused). 2.Analysis was carried out at multiple levels: 1.Nucleotide level 2.Exon level 3.Transcript level 3.For each of the levels, we calculated the sensitivity and specificity of the predictions (as discussed later). As a summary measure we also reported the average of the two statistic.

5 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Annotation set Prediction set True positives False positives False negatives Sensitivity = Number of annotated nucleotides correctly predicted Number of annotated nucleotides in the annotation set Specificity = Number of predicted nucleotides correctly also annotated Number of predicted nucleotides in the annotation set Nucleotide level analysis

6 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Nucleotide level analysis Points to note: 1.Nucleotide predictions had to be on the same strand as the annotations to be considered as correct. 2.Individual nucleotides present in multiple transcripts in either the annotation or the predictions are considered only once. 3.As a summary measure, we also calculated the arithmetic average of specificity and sensitivity.

7 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Nucleotide level analysis (H. sapiens) 93.308

8 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Nucleotide level analysis (D.melanogaster)

9 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Nucleotide level analysis (C.elegans)

10 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Annotation set Prediction set True positives False positives False negatives Exon level analysis Sensitivity = Number of annotated exons correctly predicted Number of annotated exons in the annotation set Specificity = Number of predicted exons correctly also annotated Number of predicted exons in the annotation set

11 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Exon level analysis Points to note: 1.An exon in the prediction must have identical start and end coordinates and also the same strand as an exon in the annotation to be counted correct. 2.If an exon is present in multiple transcripts in either the annotation or the predictions, it is counted only once. 3.As a summary measure, we also calculated the arithmetic average of specificity and sensitivity.

12 The RNAseq Genome Annotation Assessment Project Exon level analysis (H.sapiens) Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes

13 The RNAseq Genome Annotation Assessment Project Exon level analysis (D.melanogaster) Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes

14 The RNAseq Genome Annotation Assessment Project Exon level analysis (C.elegans) Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes

15 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Annotation set Prediction set True positives False positives False negatives Transcript level analysis Sensitivity = Number of annotated transcripts correctly predicted Number of annotated transcripts in the annotation set Specificity = Number of predicted transcripts correctly also annotated Number of predicted transcripts in the annotation set

16 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Transcript level analysis Points to note: 1.We consider a transcript accurately predicted if the number of exons in a transcript and their boundaries match exactly between the annotation and the prediction. 2.for the CDS-focused evaluation if the beginning and end of translation are correctly annotated and each of the 5' and 3' splice sites for the coding exons are correct we consider the transcript to be correctly predicted. 3.for the mRNA evaluation, a transcript is counted correct if all of the exons from the start of transcription to the end of transcription match perfectly between the annotation and prediction sets.

17 The RNAseq Genome Annotation Assessment Project Transcript level analysis Human, (CDS-focused) Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes

18 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Annotation set Prediction set True positives False positives False negatives Relaxed Transcript level analysis Sensitivity = Number of annotated transcripts correctly predicted Number of annotated transcripts in the annotation set Specificity = Number of predicted transcripts correctly also annotated Number of predicted transcripts in the annotation set

19 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Relaxed Transcript level analysis Points to note: 1.We consider a transcript ‘accurately’ predicted if the number of exons in a transcript match exactly between the annotation and the prediction, and their boundaries differ by no more than 5bp. 2.All other criteria remain same as that of Transcript-level analysis.

20 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Annotation set Prediction set True positives False positives False negatives Very relaxed Transcript level analysis Sensitivity = Number of annotated transcripts correctly predicted Number of annotated transcripts in the annotation set Specificity = Number of predicted transcripts correctly also annotated Number of predicted transcripts in the annotation set

21 The RNAseq Genome Annotation Assessment Project Very relaxed Transcript level analysis Worm, (exon-focused) Points to note: 1.We consider a transcript ‘accurately’ predicted if 1.the number of exons in a transcript differ by no more than two (terminal exons only) between the annotation and prediction, and 2. the boundaries of all equivalent exons differ by no more than 5bp between the annotation and the prediction. 2.All other criteria remain same as that of Transcript-level Analysis. Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes

22 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes 'missing exons' (MEs:): the annotated exons that have no overlap with predicted exons by at least 1 bp 'wrong exons' (WEs): the predicted exons not overlapping annotated exons by at least 1 bp. Annotation set Prediction set Missed exons Wrong exons 'wrong exons' (WEs) that are predicted independently by more than two predictors are recorded, and some of them will be tested experimentally.

23 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Annotation set Prediction set Dubious wrong exons ’Dubious wrong exons' (WEs) that are predicted independently by more than two predictors are reported. Screen shot of the list of dubious wrong exons. 15704 dubious wrong exons in the whole human genome. 17678 dubious wrong exons in the whole worm genome.

24 The RNAseq Genome Annotation Assessment Project Introduction and summary of submissions Analysis methodology Nucleotide level analysis Exon level analysis Transcript level analysis Missing and wrong genes Acknowledgement Jen Harrow Felix Kokocinski Tim Hubbard The RGASP community


Download ppt "Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De."

Similar presentations


Ads by Google