Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

Similar presentations


Presentation on theme: "1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine."— Presentation transcript:

1 1 of 38 Data Mining in Ensembl with BioMart

2 2 of 38 Simple Text-based Search Engine

3 3 of 38 ‘Mouse Gene’ Gives Us Results

4 4 of 38 A More Complex Query is Not as Useful

5 5 of 38 BioMart- Data mining BioMart is a search engine that can find multiple terms and put them into a table format. Such as: human gene (IDs), chromosome and base pair position No programming required!

6 6 of 38 General or Specific Data-Tables All the genes for one species Or… only genes on one specific region of a chromosome Or… genes on one region of a chromosome associated with a disease

7 7 of 38 BioMart Data Sets Ensembl genes Vega genes SNPs Markers Phenotypes Gene expression information Gene ontology Homology predictions Protein annotation

8 8 of 38 Web Interface With BioMart, quickly extract gene-associated information from the Ensembl databases.

9 9 of 38 Information Flow Choose the species of interest (Dataset) Decide what you would like to know about the genes (Attributes) (sequences, IDs, description…) Decide on a smaller geneset using Filters. (enter IDs, choose a region …)

10 10 of 38 Web Interface Three main stages: Dataset, Attributes and Filters. Choose the species of interest Choose what information to view. Choose the gene set using what we know.

11 11 of 38 The First Step: Choose the Dataset Homo sapiens genes are the default.

12 12 of 38 The Second Step: Attributes Attributes are what we want to know about the genes. Four output pages.

13 13 of 38 The SNP Attribute Page Output variation information such as SNP reference ID and alleles.

14 14 of 38 Filters Allow Gene Selection Choose the gene set by region, gene ID(s), protein/domain type.

15 15 of 38 Export Sequence or Tables Genes and attributes are exported as sequence (Fasta format) or tables.

16 16 of 38 Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: what we want to know. Filters: what we know

17 17 of 38 Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: what we want to know. Filters: what we know

18 18 of 38 Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: what we want to know. Filters: what we know

19 19 of 38 A Brief Example Change dataset to mouse Mus musculus

20 20 of 38 A Brief Example Dataset has changed.

21 21 of 38 Attributes (Output Options) Click Attributes. Attributes allow us to choose what we wish to know. IDs are found in the ‘Features’ page. Click on ‘GENE’.

22 22 of 38 Default options selected: Ensembl Gene ID and Transcript ID Attributes (Output Options) Ensembl Gene ID is selected

23 23 of 38 Scroll down to select MGI symbol. Also select the accession number. Attributes (Output Options) ‘Markersymbol ID’ will give us the MGI ID

24 24 of 38 ‘Results’ give us Gene IDs for all mouse genes in the Ensembl database. The Results Table

25 25 of 38 Select a Smaller Gene Set Select ‘Filters’ Expand the REGION panel Instead of all mouse genes, select protein coding genes on chromosome 10.

26 26 of 38 Select Genes on Chromosome 10 Select chromosome 10 Instead of all mouse genes, select protein coding genes on chromosome 10.

27 27 of 38 Select Protein Coding Genes Filters are set to chromosome 10 and protein-coding genes. Genes must meet BOTH criteria to be in the result table. Gene type: protein coding

28 28 of 38 Results (Preview) This is a preview- if you are happy with the table, click ‘Go’. For the full result table: Go

29 29 of 38 Full Result Table Ensembl Gene ID Transcript ID MGI symbol MGI Accession Number

30 30 of 38 Original Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: columns in the Result Table Filters: what we know

31 31 of 38 Other Export Options (Attributes) Sequences: UTRs, flanking sequences, cDNA and peptides, etc Gene IDs from Ensembl and external sources (MGI, Entrez, etc.) Microarray data Protein Functions/descriptions (Interpro, GO) Orthologous gene sets SNP/ Variation Data

32 32 of 38 Central Server www.biomart.org

33 33 of 38WormBase

34 34 of 38HapMap Population frequencies Inter- population comparisons Gene annotation

35 35 of 38 DictyBase

36 36 of 38 Uniprot, MSD

37 37 of 38 GRAMENE Rice, Maize, Arabidopsis genomes…

38 38 of 38 How to Get There Either www.biomart.org/biomart/martview www.biomart.org/biomart/martview Or click on ‘BioMart’ from Ensembl

39 Q & A Thanks Arek Kasprzyk Benoît Ballester Syed Haider Richard Holland Damian Smedley


Download ppt "1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine."

Similar presentations


Ads by Google