Presentation is loading. Please wait.

Presentation is loading. Please wait.

Http://www.biomart.org/ “BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the.

Similar presentations


Presentation on theme: "Http://www.biomart.org/ “BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the."— Presentation transcript:

1 “BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the European Bioinformatics Institute (EBI).” Open Source – LGPL * Perl API → Web Interface, Web Services Interface, REST API * Java API → Mart Explorer GUI, MartShell * 3rd Party Software → Bioclipse, biomaRt-BioConductor, Cytoscape, Galaxy, Taverna, WebLab Databases in Biomart format: Ensembl HapMap HTGT HGNC Dictybase Wormbase Gramene Europhenome UniPro Rat Genome Database DroSpeGe ArrayExpress DW Eurexpress GermOnLine PRIDE PepSeeker VectorBase Pancreatic Expression Database Reactome EU Rat Mart Paramecium DB

2 A Mart is a collection of datasets (~=Database).
Marts are optimised for querying. A Dataset has a main table, with an entry (and Primary Key) for each of the items of interest in that dataset (eg Mouse Transcripts). Related bits of information about these items are hung off the table in dimension tables (eg. Affy Ids corresponding to this gene)‏ More Info:

3 Ensembl annotates everything at the transcript level:
AffyID Ensembl_transcript_1 HUGO Symbol Ensembl_transcript_2 Ensembl_transcript_3 1939_at ENST 1939_at ENST 1939_at ENST TP53 Affy Ids are mapped by Ensembl. If there is no clear match then that probe is not assigned to a gene.

4 Web Interface: http://www.biomart.org/biomart/martview/
Choose a Database (mart) to query (eg Ensembl)‏ Choose a Dataset from that mart to query (eg Mus Musculus Genes)‏

5 Filters Use filters to select the members of the dataset in which you're interested eg. Limit to miRNA genes from Chr1

6 Attributes Use attributes to define what bits of information you want to retrieve about the members of the dataset eg. Gene ID, Transcript ID, Start, End and Status:

7 Results:

8

9 www.bioconductor.org source("http://bioconductor.org/biocLite.R")‏
“Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data.” source("http://bioconductor.org/biocLite.R")‏ #Default package set biocLite()‏ #OR biocLite(“someBiocPkg”)‏ biocLite(groupName=”pkgGroupName”)‏

10 Core Packages: affy, affydata, affyPLM, annaffy, annotate, Biobase, Biostrings, DynDoc, gcrma, genefilter, geneplotter, hgu95av2.db, limma, marray, matchprobes, multtest, ROC, vsn, xtable, affyQCReport. Alternative Package Groups lite, affy, graph, all Full Package Listing (software)‏ Full Package Listing (annotation)‏

11 Querying biomart from R:
# Install library source(“http://www.bioconductor.org/biocLite.R”)‏ biocLite(“biomaRt”)‏ # Load library library(biomaRt)‏ listMarts()‏ # result is just a data.frame, so you can subset it: listMarts()[1:5,] # or search it: grep('ensembl', listMarts()[,1], value=TRUE)‏

12 mart <- useMart('ensembl', dataset='mmusculus_gene_ensembl')‏
# Select a mart mart <- useMart('ensembl')‏ # List the available datasets (returns data.frame)‏ listDatasets(mart)‏ # Select a dataset mart <- useDataset('mmusculus_gene_ensembl', mart=mart)‏ # Both in one: mart <- useMart('ensembl', dataset='mmusculus_gene_ensembl')‏

13 # Available Filters (returns data.frame)‏ listFilters(mart)‏
# Available Attributes (returns data.frame)‏ listAttributes(mart)‏ # A Simple Query getBM(filters=c('ensembl_gene_id'), values=c('ENSMUSG ','ENSMUSG '), attributes=c('ensembl_gene_id', 'ensembl_transcript_id', 'transcript_start', 'transcript_end'), mart=mart)‏ ensembl_gene_id ensembl_transcript_id transcript_start transcript_end 1 ENSMUSG ENSMUST 2 ENSMUSG ENSMUST 3 ENSMUSG ENSMUST 4 ENSMUSG ENSMUST 5 ENSMUSG ENSMUST 6 ENSMUSG ENSMUST 7 ENSMUSG ENSMUST 8 ENSMUSG ENSMUST 9 ENSMUSG ENSMUST 10 ENSMUSG ENSMUST 11 ENSMUSG ENSMUST 12 ENSMUSG ENSMUST 13 ENSMUSG ENSMUST 14 ENSMUSG ENSMUST

14 # If using multiple filters, values should be a list
# If chromosome_name, start and end filters used they are auto # interpreted as 'search within this region' getBM(filters=c('chromosome_name', 'start', 'end' ), values=list(10, , ), attributes= c('ensembl_gene_id', 'start_position','end_position'), mart=mart)‏ ensembl_gene_id start_position end_position 1 ENSMUSG 2 ENSMUSG 3 ENSMUSG 4 ENSMUSG

15 # Attributes and filters are organised into categories
# To get a list of the categories: attributeSummary(mart)‏ filterSummary(mart)‏ # You can then list attributes and filters limited to a # specified category: listAttributes(mart, category='Variations')‏ # Filters can be either numeric, string or boolean. # Boolean filters need a TRUE or FALSE value # Determine type of filter with: filterType('with_unigene', mart)‏

16 # Older versions of ensembl are archived, useful if you've
# got genome positions to a previous build old.mart <- useMart('ensembl_mart_46', dataset='mmusculus_gene_ensembl', archive=TRUE)‏

17 Retrieving Sequences:
# can get complicated with getBM. Use the getSequence wrapper # Genome Sequences always 5'-3' but... # Web-Services mode (default): Strand is context dependant # MySQL mode: Always top strand #eg... # BRCA1 peptide sequence from gene symbol getSequence(id="BRCA1", type="mgi_symbol", seqType="peptide", mart = mart)‏ # REST transcript 20 bases upstream getSequence(id='ENSMUST ', type='ensembl_transcript_id', seqType='transcript_flank', upstream=20, mart=mart)‏ # Chromosome 4 100,000, ,000,010 getSequence(chromosome=4, start= , end= , mart=mart, seqType="gene_exon", type="ensembl_gene_id")‏

18 seqTypes: Note that any of the _flank types need an 'upstream' or 'downstream' argument to determine the size of the flanking region. At the moment, you can't specify both.

19 Exporting Sequences: # The exportFASTA function provides a quick way of saving # sequences in FASTA format: res <- getSequence(id="BRCA1", type="mgi_symbol", seqType="peptide", mart = mart)‏ exportFASTA(res, file='sequence.fa')‏ Ensembl mart deals with everything in terms of gene Ids, so getSequence doesn't let you just get arbitrary bits of sequence. It lets you get gene sequences, exon sequences, transcript sequences If you're dealing in genome positions rather than gene Ids, have a look at the

20 Linking Datasets... # Make mart connections for each of the datasets: mouse.mart<-useMart('ensembl', dataset="mmusculus_gene_ensembl")‏ people.mart<-useMart('ensembl', dataset='hsapiens_gene_ensembl')‏ # In Ensembl, datasets are made of transcripts # from a single species. # Linking datasets amounts to homology #eg. Get pos of mouse homolog to human 'TP53' gene getLDS(attributes = c("hgnc_symbol","chromosome_name", "start_position"), filters = "hgnc_symbol", values = "TP53", mart = people.mart, attributesL = c("chromosome_name","start_position"), martL = mouse.mart)‏ } V1 V V3 V V5 1 TP Ensembl mart deals with everything in terms of gene Ids, so getSequence doesn't let you just get arbitrary bits of sequence. It lets you get gene sequences, exon sequences, transcript sequences If you're dealing in genome positions rather than gene Ids, have a look at the

21 Pretty HTML Output: library(annotate)‏ # Provides the htmlpage function. Salient args are: # genelist – a list or dataframe of IDs to be made into links # filename # title – for the table # othernames – a list of other things to add to the table as is # table.head – a character vector of col headers for the table. # repository – a list of repositories to use for creating links ids <- c('ENSMUSG ','ENSMUSG ')‏ genelist <- getBM(attributes=c('uniprot_swissprot_accession', 'entrezgene'), filters='ensembl_gene_id', values=ids, output='list', na.value=' ', mart=mart)‏ othernames <- getBM(attributes=c('ensembl_gene_id','mgi_symbol', 'description'), filters='ensembl_gene_id', values=ids, output='list', na.value='&nsbp;',mart=mart)‏ htmlpage(genelist=genelist, othernames=othernames, title='Some Genes', table.head=c('Uniprot', 'Entrezgene', 'Ensembl','Name', 'Description'), repository=list('sp', 'en'), filename='genes.html')‏ # Note that all the lists are expected to be in the right order Ensembl mart deals with everything in terms of gene Ids, so getSequence doesn't let you just get arbitrary bits of sequence. It lets you get gene sequences, exon sequences, transcript sequences If you're dealing in genome positions rather than gene Ids, have a look at the

22

23 More Info... Slides & examples:
Bioconductor Mailing List: biomaRt Users' Guide: vignette('biomaRt')‏ Biomart Website Ensembl mart deals with everything in terms of gene Ids, so getSequence doesn't let you just get arbitrary bits of sequence. It lets you get gene sequences, exon sequences, transcript sequences If you're dealing in genome positions rather than gene Ids, have a look at the Slides & examples:


Download ppt "Http://www.biomart.org/ “BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the."

Similar presentations


Ads by Google