1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.

Slides:



Advertisements
Similar presentations
1 / 30 Data Mining with BioMart
Advertisements

Genomic Innovations- Orthology Paralogy. Genomic innovation.
Working with gene lists: Finding data using GEO & BioMart June 5, 2014.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Gene Expression Omnibus (GEO)
Biological Annotation in R Manchester R, 13th Nov, 2013 Nick Burgoyne Bioinformatician, fiosgenomics
Copyright OpenHelix. No use or reproduction without express written consent1.
BioC 2009 Database mining with biomaRt Steffen Durinck Illumina Inc.
1 The Genome Browser allows you to –Browse the Rice-Japonica, Maize and Arabidopsis genomes. –View the location of a particular feature on the rice genome.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Copyright OpenHelix. No use or reproduction without express written consent1.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Copyright OpenHelix. No use or reproduction without express written consent1.
BioMart Databases made easy Richard Holland European Bioinformatics Institute Helsinki, September 2006.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Use cases for Tools at the Bovine Genome Database Apollo and Bovine QTL viewer.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Managing Data Modeling GO Workshop 3-6 August 2010.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Module 4: Understanding KO designs Mark Thomas Wellcome Trust Sanger Institute.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
A Comparative Genomic Mapping Resource for Grains.
Data Mining in Ensembl with BioMart Nov,
Copyright OpenHelix. No use or reproduction without express written consent1.
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
I NTRODUCTION TO DATABASES - P RACTICAL. Q UERY S EQUENCE >my weird new protein MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRT.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Copyright OpenHelix. No use or reproduction without express written consent1.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
ID Mapping to accessions from different databases. COST Functional Modeling Workshop April, Helsinki.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Search Functions Simple Search Advanced Search.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Workshop practical Helsinki Workshop September 2006.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
Welcome to Gramene’s RiceCyc (Pathways) Tutorial RiceCyc allows biochemical pathways to be analyzed and visualized. This tutorial has been developed for.
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
E-utilities: Short course. The Entrez Query System at NCBI.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Getting GO annotation for your dataset
Gramene Technical Improvements
Data Mining with BioMart
Large Scale Annotation of Genomic Datasets with Genephony
ID Mapping tools: Converting Accessions between Databases
Ensembl Genome Repository.
Step-by-step demo of using BioMart to extract SNP information
Welcome to the Quantitative Trait Loci (QTL) Tutorial
Welcome to the GrameneMart Tutorial
Gene Safari (Biological Databases)
Welcome - webinar instructions
Presentation transcript:

1 of 38 Data Mining in Ensembl with BioMart

2 of 38 Simple Text-based Search Engine

3 of 38 ‘Mouse Gene’ Gives Us Results

4 of 38 A More Complex Query is Not as Useful

5 of 38 BioMart- Data mining BioMart is a search engine that can find multiple terms and put them into a table format. Such as: human gene (IDs), chromosome and base pair position No programming required!

6 of 38 General or Specific Data-Tables All the genes for one species Or… only genes on one specific region of a chromosome Or… genes on one region of a chromosome associated with a disease

7 of 38 BioMart Data Sets Ensembl genes Vega genes SNPs Markers Phenotypes Gene expression information Gene ontology Homology predictions Protein annotation

8 of 38 Web Interface With BioMart, quickly extract gene-associated information from the Ensembl databases.

9 of 38 Information Flow Choose the species of interest (Dataset) Decide what you would like to know about the genes (Attributes) (sequences, IDs, description…) Decide on a smaller geneset using Filters. (enter IDs, choose a region …)

10 of 38 Web Interface Three main stages: Dataset, Attributes and Filters. Choose the species of interest Choose what information to view. Choose the gene set using what we know.

11 of 38 The First Step: Choose the Dataset Homo sapiens genes are the default.

12 of 38 The Second Step: Attributes Attributes are what we want to know about the genes. Four output pages.

13 of 38 The SNP Attribute Page Output variation information such as SNP reference ID and alleles.

14 of 38 Filters Allow Gene Selection Choose the gene set by region, gene ID(s), protein/domain type.

15 of 38 Export Sequence or Tables Genes and attributes are exported as sequence (Fasta format) or tables.

16 of 38 Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: what we want to know. Filters: what we know

17 of 38 Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: what we want to know. Filters: what we know

18 of 38 Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: what we want to know. Filters: what we know

19 of 38 A Brief Example Change dataset to mouse Mus musculus

20 of 38 A Brief Example Dataset has changed.

21 of 38 Attributes (Output Options) Click Attributes. Attributes allow us to choose what we wish to know. IDs are found in the ‘Features’ page. Click on ‘GENE’.

22 of 38 Default options selected: Ensembl Gene ID and Transcript ID Attributes (Output Options) Ensembl Gene ID is selected

23 of 38 Scroll down to select MGI symbol. Also select the accession number. Attributes (Output Options) ‘Markersymbol ID’ will give us the MGI ID

24 of 38 ‘Results’ give us Gene IDs for all mouse genes in the Ensembl database. The Results Table

25 of 38 Select a Smaller Gene Set Select ‘Filters’ Expand the REGION panel Instead of all mouse genes, select protein coding genes on chromosome 10.

26 of 38 Select Genes on Chromosome 10 Select chromosome 10 Instead of all mouse genes, select protein coding genes on chromosome 10.

27 of 38 Select Protein Coding Genes Filters are set to chromosome 10 and protein-coding genes. Genes must meet BOTH criteria to be in the result table. Gene type: protein coding

28 of 38 Results (Preview) This is a preview- if you are happy with the table, click ‘Go’. For the full result table: Go

29 of 38 Full Result Table Ensembl Gene ID Transcript ID MGI symbol MGI Accession Number

30 of 38 Original Query: For all mouse genes on chromosome 10 that are protein coding, I would like to know the IDs in both Ensembl and MGI. In the query: Attributes: columns in the Result Table Filters: what we know

31 of 38 Other Export Options (Attributes) Sequences: UTRs, flanking sequences, cDNA and peptides, etc Gene IDs from Ensembl and external sources (MGI, Entrez, etc.) Microarray data Protein Functions/descriptions (Interpro, GO) Orthologous gene sets SNP/ Variation Data

32 of 38 Central Server

33 of 38WormBase

34 of 38HapMap Population frequencies Inter- population comparisons Gene annotation

35 of 38 DictyBase

36 of 38 Uniprot, MSD

37 of 38 GRAMENE Rice, Maize, Arabidopsis genomes…

38 of 38 How to Get There Either Or click on ‘BioMart’ from Ensembl

Q & A Thanks Arek Kasprzyk Benoît Ballester Syed Haider Richard Holland Damian Smedley