Presentation is loading. Please wait.

Presentation is loading. Please wait.

closing in on the set of human genes. The ENCODE project.

Similar presentations


Presentation on theme: "closing in on the set of human genes. The ENCODE project."— Presentation transcript:

1 closing in on the set of human genes. The ENCODE project.
roderic guigó i serra Bioinformàtica UPF Curs 2004/2005 2/23/2019 Bioinformàtica UPF Març 2005

2 2/23/2019 Bioinformàtica UPF Març 2005

3 gene number estimates (ii) the genome era
from Harrisson et al. (2002) 2/23/2019 Bioinformàtica UPF Març 2005

4 gene classes under represented in the current gene sets
intronless genes fast evolving genes genes with atypical coding content low or rare transcripts transcripts of unknown function (TUFs) genes undergoing non-canonical splicing selenoproteins 2/23/2019 Bioinformàtica UPF Març 2005

5 fast evolving genes 4-helical cytokine family: conservation of exonic structure in absence of sequence conservation 2/23/2019 Bioinformàtica UPF Març 2005

6 non canonical splicing
Two major exceptions to the almost univeral U2 GT-AG rule U2 GC-AG introns U12 AT-AC introns But a pletora of other minor exceptions 2/23/2019 Bioinformàtica UPF Març 2005

7 SelU: a novel selenoprotein family, Castellano et al
SelU: a novel selenoprotein family, Castellano et al., EMBO reports 2004 Selenoproteins are proteins that incorporate the aminoacid selenocysteine Sec, the 21st amino acid. Sec is encoded by UGA. Recodification of the UGA mediated by the SECIS element 2/23/2019 Bioinformàtica UPF Març 2005

8 alternative splicing 2/23/2019 Bioinformàtica UPF Març 2005

9 transcription not associated with known genes
2/23/2019 Bioinformàtica UPF Març 2005

10 2/23/2019 Bioinformàtica UPF Març 2005

11 2/23/2019 Bioinformàtica UPF Març 2005

12 ENCODE pilot phase 1% of the genome. 44 regions
target selection. commitee to select sequence targets manual targets – a lot of information radom targets – stratified by non exonic conservation with mouse gene density 2/23/2019 Bioinformàtica UPF Març 2005

13 2/23/2019 Bioinformàtica UPF Març 2005

14 gene prediction in ENCODE a collaboration between HAVANA and ENCODE
gene prediction in ENCODE a collaboration between HAVANA and ENCODE. GOAL:identify all protein coding genes in the ENCODE regions Roderic Guigó, IMIM Stylianos Antonarakis, Geneve Alexandre Reymond Ewan Birney, EBI Michael Brent, WashU Lior Pachter, Berkeley Manolis Dermitzkakis, Sanger Jennifer Ashurst, Tim Hubbard 2/23/2019 Bioinformàtica UPF Març 2005

15 experimental validation of genes annotated in VEGA 13 first regions:
138 49 6 Experimental validation of the single exon annotated 5'RACEs to obtain full length mRNA(s) RT-PCRs to check the 99 junctions in process 40 in process in process 59 done => 9 positive Bidirectionnal RACEs to obtain full length mRNAs 2/23/2019 Bioinformàtica UPF Març 2005 in process

16 13 first regions annotated in VEGA
1 to 34 transcripts per locus (34 :RP11-353C18.2, RNPC2, ENr333) 6.86 1.67 Whole genome (known):1.68 1 to 44 exons per transcript (44: RP11-167N , NUP188, Enr232) 2.51 7.59 Whole genome (known):9.65 2/23/2019 Bioinformàtica UPF Març 2005

17 experimental validation of genes annotated in VEGA
99 RT-PCRs performed to check introns from 49 novel transcripts/ putative: => results for 59 RT-PCRs: 9 positive --> 40 other RT-PCRs in process 2/23/2019 Bioinformàtica UPF Març 2005

18 gene predictions outside of VEGA
Gene predictions from 6 computational gene prediction programs and 3 EST-based methods: computational EST-based 2/23/2019 Bioinformàtica UPF Març 2005

19 Gene predictions outside of VEGA annotations
In 13 ENCODE regions, 1255 unique predicted introns (by one or more of the 9 methods) are not annotated in VEGA: - 380 (30%) extend VEGA objects (1) - 530 (42%) are in introns of VEGA objects (2) - 11 (1%) link exons from distinct VEGA objects (3) - 334 (27%) are completely outside of VEGA annotations (4) VEGA: Predictions: (1) (2) (3) (4) 2/23/2019 Bioinformàtica UPF Març 2005

20 Gene predictions outside of VEGA annotations
RT-PCR on intron junctions (exon pairs) 1255 predicted intron junctions tested 44 successfully amplified (but 20 provided intron lengths different from those expected) only 15 out of the 44 are in new loci, and only 5 are not overlapping pseudogenes overall only about 3.5% tested positive, and only as little as 0.5% may correspond to novel genes. 2/23/2019 Bioinformàtica UPF Març 2005

21 chimeras 2/23/2019 Bioinformàtica UPF Març 2005

22 KUA and UEV, Thomson et al., Genome Research 2000
2/23/2019 Bioinformàtica UPF Març 2005

23 EST based prediction of chimeras
human mouse total non-overlapping 14,959 15,106 adjacent in the same orientation 7,679 7,865 linked by ESTs maintaining the ORF 56 37 including no new intervening exons 42 26 rtp-pcr positive 11 2/23/2019 Bioinformàtica UPF Març 2005

24 systematic search for functional chimeras in ENCODE
321 non-overlapping transcripts. 165 adjacent pairs in the same orientation. force GENEID to predict single complete transcripts expanding the two genes. 2/23/2019 Bioinformàtica UPF Març 2005

25 126 predictions obtained 98 tested 4 positives 2/23/2019
Bioinformàtica UPF Març 2005

26 one example

27 Junction validated by RT-PCR
one novel transcript appears to produce a chimeric form with a known gene Novel transcript (RP4-614O4.5) Junction validated by RT-PCR Known gene (ITGB4BP) 5' 3' ENr333 2/23/2019 Bioinformàtica UPF Març 2005

28 chimeric genes results in the ENCODE regions indicates that chimerism could affect at least 5% of tandem human genes. chimerism could be a means to create additional gene diversity. challenges the concept of gene (a more dynamic view of the genome) we need to validate their functional meaning: proteomics data comparative genomics analysis after learning from the ENCODE regions, extrapolate to the whole genome. 2/23/2019 Bioinformàtica UPF Març 2005

29 http://genome.imim.es/gencode IMIM (Barcelona) Berkeley Roderic Guigo
France Denoeud Julien Lagarde Eduardo Eyras Jan-Jaap Wesselink Robert Castelo Genis Parra Noura Dabouseh University of Geneva, Stylianos Antonarakis Alexandre Reymond Catherine Ucla EBI Ewan Birney Damian Keefe Washington University Michael Brent Michael Stevens Berkeley Lior Pachter Bernd Sturmfels Nicolas Bray Marta Casanellas Sourav Chatterji Colin Dewey Mathias Drton Nicholas Eriksson Sagi Snir The Wellcome Trust Sanger Institute Population and comparative genomics Manolis Dermitzakis Informatics (HAVANA annotation group) Jennifer Ashurst Tim Hubbard Adam Frankish David Swarbreck James Gilbert


Download ppt "closing in on the set of human genes. The ENCODE project."

Similar presentations


Ads by Google