Presentation is loading. Please wait.

Presentation is loading. Please wait.

This publication represents the views of the Authors, not the EC. The EC is not liable for any use that may be made of the information. EADGENE and SABRE.

Similar presentations


Presentation on theme: "This publication represents the views of the Authors, not the EC. The EC is not liable for any use that may be made of the information. EADGENE and SABRE."— Presentation transcript:

1 This publication represents the views of the Authors, not the EC. The EC is not liable for any use that may be made of the information. EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Animal Sciences Group, Wageningen UR, Lelystad SIGENAE probe annotation Pipe-line

2 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Probe annotation Who are we? What is re-annotation about? Chicken oligos Initial state New annotation Differences between both annotations Pig cDNAs Initial state New annotation Differences between both annotations

3 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Who are we ? Engineers in bioinformatic

4 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad What is our job ? EST clustering Clone annotation Clone selection Micro-array data storage

5 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad What is re-annotation about ? ● Knowledge about transcript is evolving ● Annotation related to gene is evolving, (homologous genes, GO, pathways,...) ‏  Initial annotation probe has to evolve.

6 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Chicken oligo : Initial state More than 100 columns in the Chicken_20K_annot.csv file This file is available on the ftp website of ark-genomics. Focus on Ensembl gene id Ensembl gene used for oligo design (ens 30). Blast search of the Sequence used for the design against ens 42 genebuild. The oligo subset = 791 oligos

7 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Chicken oligo: Initial state ?  Ensembl gene used for oligo design : 571‏  Blast search of the Sequence used for the design against ens42 genebuild: 552 hits related to a gene.‏  New file : blast with ens42 by default and ens 30 if no information. 670 Ensembl genes linked to an oligo.

8 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad What is our strategy to give a gene name ?  We try to associate a gene name for each oligo  And we give a quality for this association (oligo - gene)  Databases choice :  Ensembl 50 genbuild (jul 2008)‏  Gga Unigene built 41 (sep 2008)‏  Swissprot (oct 2008)‏

9 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Gene name annotation pipe-line Oligos Blast Ensembl Genebuild Function (gene)‏ & location Blast Swissprot Function protein Blast Unigene Function (gene)‏ & location

10 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Gene name annotation pipe-line exon Ext en de d Utr Intr on

11 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Alignment quality

12 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Kane criteria 80% of similarity Contigus block of 15Contigus block of 20 74% of similarity

13 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Alignment result examples

14 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Classes based on transcript alignment Hit => >75% of similarity Noise => continuous block > 15

15 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad First step : Annotation with Ensembl transcripts Alignment against Ensembl 50 transcripts No hit > 75% -No noise 187 (-)‏ -Many noise65 (- ) -1 noise <30 bp 108 (*)‏ -1 noise >30 bp 5 (*)‏ 1 hit > 75% -No noise 194 (*)‏ -With noise210 (*)‏ More than 1 hit -With & without noise 22 (*) => can be annotated. Total: 517 (-) => 5&6 grey Total: 252

16 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad How to get more genes in this part?

17 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad How to find new annotation thanks to unigene? Gga.2894 ENSGALT00000014678 olig o 0393 6 4114 417 8 600 0

18 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Extending the UTR using unigene

19 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad What is happening in the introns?

20 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Extreme case! }

21 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Using the number of ESTs to select the probes

22 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Unigene pipeline for introns Unigene Alignment Ensembl API GetAllGene sequence Alignment Olig o &gen e

23 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Unigene strategy for introns  Unigene hit : percent_id > 75.  Ensembl genes are fetched thanks to Unigene names.  gene sequence alignment against the oligo.  Setting of the new oligo category (if required).

24 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Unigene pipeline for extended UTR Unigene Alignment Ensembl API GetAllGe ne GetAllTranscri pt Extend Transcript sequence (with utr)‏ Alignment Olig o &gen e

25 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Unigene strategy for extended UTR  Unigene hit : percent_id > 75.  Ensembl genes are fetched thanks to Unigene names.  Ensembl transcripts are fetched thanks to the Ensembl gene.  Transcript sequence extraction ( extension of 1000 base pairs).  Extended transcript sequence alignment against the oligo.  Setting of the new oligo category (if required).

26 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Unigene annotation impact... 10 8 6565 18 7 10 1 6262 16 7 97 59 165

27 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Annotation pipeline with swissprot ? Swissprot Alignment Biomart query BlastX filtering Olig o getEnsemblGen e Oligo swissprot name &ensembl gene

28 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad BlastX filtering. Filtering on chicken protein For each protein hit of each oligo Keeping of the protein name, and the first e-value. What are hybridisations criteria for blastX ?

29 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Impact of the new annotation

30 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad  GO Molecular Function : 230 (327 Gene Index Tentative Consensus) Biological Process : 184 (260 Gene Index Tentative Consensus)‏ Cellular Component: 175 (211 Gene Index Tentative Consensus)‏  Ortholog Human : 394 Mouse : 399 Rat : 370  Xref Uniprot/SWISSPROT64 (54 design ens)‏ UniGene 342 (216 design ens) (491 blast unigene)‏ RefSeq_peptide143 (204 design ens)‏ HGNC 228 Extra annotation thanks to Ensembl API

31 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Extra annotation thanks to KEGG API  Use of hgnc from the human ortholog.  Ko : 183 probes  Pathway : 63 probes

32 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Data formatting and access

33 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Pig clone – Initial state The clone set : 241 probes 4 columns: Clone name, Genbank accession, Hugo name and definition. 237 Genbank accessions 216 HGNC names

34 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Pig clone : hgnc annotation pipe line Clone names Get Contig Alignment RefSeq RNA Clone names GenBank acc contig name hit name Hugo name Sigenae Sus scrofa contigs database GenBank acc Sigenae EST assembly database

35 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Impact of the new Hugo name annotation

36 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Conclusions Re-annotation is an important task to have up-to-date information available. The re-annotation pipeline structure has to evolve with the available databases. The users have to take care of the quality of the annotations (categories). The users feedback is very important to produce relevant information

37 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Acknowledgements WU (Pieter Neerincx, Haisheng Nie) Users (LI Jiang, Frédéric Lecerf, Gwenola Tosser, Sandrine lagarrigue, Yannick Faulconnier) EADGENE (Caroline channing, Sandrine Ayuso)

38 EADGENE and SABRE Post-Analyses Workshop 12-14th Nov 2008, Animal Sciences Group, Wageningen UR, Lelystad Thank you for your attention.


Download ppt "This publication represents the views of the Authors, not the EC. The EC is not liable for any use that may be made of the information. EADGENE and SABRE."

Similar presentations


Ads by Google