Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.

Similar presentations


Presentation on theme: "Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes."— Presentation transcript:

1 Data Mining in Ensembl with EnsMart

2 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes associated with SNPs Possible queries…

3 3 of 24 Specific queries Disease related genes between markers D10S255 and D10S259 Transmembrane proteins with an Ig-MHC domain (IPR003006) on chromosome 2 Genes with associated coding SNPs on chromosomal band 5q35.3 Mouse homologues for human disease genes.

4 4 of 24 Human genes with upstream regions conserved w.r.t. mouse Upstream sequence for all Ensembl genes mapped to U95A chip (similarly, complete genomic annotation of MG_U74). Genomic location and description of all mouse, rat and fugu homologues of all human genes, with transmembrane domains, expressed in cardiovascular system and have non-synonymous SNPs. More specific queries

5 5 of 24 EnsMart – vertical and horizontal data integration Ensembl Genes EST Genes Vega Genes SNPs Zebrafish Human MouseAnophelesFugu Rat

6 6 of 24 Genes EST Markers Diseases Protein Annotation SNPs Homology Expression Ensembl data sets

7 7 of 24 Data retrieval tool Query builder interface Gene or SNP lists Associated features or sequences Various output formats EnsMart

8 8 of 24 SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION REFSEQ INTERPRO GO SWISSPROT EMBL AFFY REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION FASTA FILE EXCEL TEXT GTF HTML startfilteroutput Information flow

9 9 of 24 Species and focus

10 10 of 24 Restrict your query

11 11 of 24 Restrict your query

12 12 of 24 Select output options

13 13 of 24 Select output options

14 14 of 24 Output formats HTML

15 15 of 24 Obtaining sequences

16 16 of 24 Normalised Each data point stored only once Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining questions Ensembl core database

17 17 of 24 De-normalised Tables with ‘redundant’ information Query-optimised Fast and flexible Ideal for data mining Mart database

18 18 of 24 Mart database Arek Kasprzyk Damian Keefe Damian Smedley Darin London Craig Meslopp User interface (MartView) Will Spooner Data and general support The entire Ensembl team Acknowledgements


Download ppt "Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes."

Similar presentations


Ads by Google