Introduction to the BioMart API. BioMart APIs ● Biomart_plib - Objected Oriented Perl interface.

Slides:



Advertisements
Similar presentations
Web forms and CGI scripts Dr. Andrew C.R. Martin
Advertisements

“BioMart is a query-oriented data management system developed jointly by the Ontario Institute for Cancer Research (OICR) and the.
CC SQL Utilities.
Cognos Web Services Business Intelligence. SOA SOA (Service Oriented Architecture) The SOA approach involves seven key principles: -- Coarse -grained.
1 / 30 Data Mining with BioMart
Java Script Session1 INTRODUCTION.
Links and Comments.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
28/1/2001 Seminar in Databases in the Internet Environment Introduction to J ava S erver P ages technology by Naomi Chen.
A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.
Introduction to Web Interface Technology (CSE2030)
Python and Web Programming
CGI Programming: Part 1. What is CGI? CGI = Common Gateway Interface Provides a standardized way for web browsers to: –Call programs on a server. –Pass.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Copyright © 2003 by Prentice Hall Module 4 Database Management Systems 1.What is a database? Data hierarchy and data organization Field, record, file,
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Using JavaBeans and Custom Tags in JSP Lesson 3B / Slide 1 of 37 J2EE Web Components Pre-assessment Questions 1.The _____________ attribute of a JSP page.
ADO.NET A2 Teacher Up skilling LECTURE 3. What’s to come today? ADO.NET What is ADO.NET? ADO.NET Objects SqlConnection SqlCommand SqlDataReader DataSet.
DateADASS How to Navigate VO Datasets Using VO Protocols Ray Plante (NCSA/UIUC), Thomas McGlynn and Eric Winter NASA/GSFC T HE US N ATIONAL V IRTUAL.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
BioMart A Federated Query Architecture Arek Kasprzyk European Bioinformatics Institute 26 April 2004.
CaDSR Java API Tutorial by John Kramer Tutorial Document by Manny Roble with assistance from Gene Levinson, and John Kramer.
Copyright OpenHelix. No use or reproduction without express written consent1.
MySQL Connection using ADO.Net Connecting to MySQL from.NET Languages.
BioMart and CHADO Arek Kasprzyk GMOD meeting 16 May 2005.
WDK Overview How the WDK implements MVC and provides a base from which custom sites can be created.
JSTL Lec Umair©2006, All rights reserved JSTL (ni) Acronym of  JavaServer Pages Standard Tag Library JSTL (like JSP) is a specification, not an.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
Chapter 6 Server-side Programming: Java Servlets
JDBC Java and Databases. RHS – SOC 2 JDBC JDBC – Java DataBase Connectivity An API (i.e. a set of classes and methods), for working with databases in.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Variables and ConstantstMyn1 Variables and Constants PHP stands for: ”PHP: Hypertext Preprocessor”, and it is a server-side programming language. Special.
Data Mining in Ensembl with BioMart Nov,
Object Oriented Software Development 10. Persistent Storage.
Topic 1 Object Oriented Programming. 1-2 Objectives To review the concepts and terminology of object-oriented programming To discuss some features of.
Oracle Data Integrator Agents. 8-2 Understanding Agents.
Unit - III. Providing a Caching Proxy Server (1) A caching proxy server is software that stores (caches) frequently requested internet objects such as.
Greenstone Internals How to Build a Digital Library Ian H. Witten and David Bainbridge.
JS (Java Servlets). Internet evolution [1] The internet Internet started of as a static content dispersal and delivery mechanism, where files residing.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Database Access Using JDBC BCIS 3680 Enterprise Programming.
Workshop practical Helsinki Workshop September 2006.
ICM – API Server & Forms Gary Ratcliffe.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Introduction to CGI PROG. CGI stands for Common Gateway Interface. CGI is a standard programming interface to Web servers that gives us a way to make.
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
HTML III (Forms) Robin Burke ECT 270. Outline Where we are in this class Web applications HTML Forms Break Forms lab.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Business rules.
WWW and HTTP King Fahd University of Petroleum & Minerals
Introduction to Classes and Objects
CARA 3.10 Major New Features
Data Mining with BioMart
Node.js Express Web Services
Data File Import / Export
MapServer In its most basic form, MapServer is a CGI program that sits inactive on your Web server. When a request is sent to MapServer, it uses.
Azure Machine Learning & ML Studio
Using a Database with JDBC
Getting Started With Solr
COMPONENTS – WHY? Object-oriented source-level re-use of code requires same source code language. Object-oriented source-level re-use may require understanding.
Welcome to the GrameneMart Tutorial
Plug-In Architecture Pattern
Presentation transcript:

Introduction to the BioMart API

BioMart APIs ● Biomart_plib - Objected Oriented Perl interface

Biomart_plib Architecture ● Object Oriented Perl Based API to BioMart Datasets ● Uses XML configuration shared by all BioMart Software

Query logic

Configuration logic

Initializing API script my $confFile = "/home/user/martRegistryFile"; my $initializer = BioMart::Initializer->new(‘registryFile’=>$confFile); my $registry = $initializer->getRegistry; Optional Initializer parameters: ‘action’ => ‘clean’ - replace the dataset configurations stored on the local file-system with those from the database and build a new, clean registry object ‘action’ => ‘update’ - replace any file-system dataset configurations modified since the last retrieval with the database copies and build a new registry object Default behaviour with no action specified is to generate the registry object using the cached file-system configurations if they exist, otherwise retrieve them from the database.

Initializing API script Optional Initializer parameters (cont) ‘mode’ => ‘lazyload’ - only keep a certain number of dataset configurations in memory at once for low memory machines and future scalability Default behaviour with no mode specified is to keep all configurations in memory.

Building Query my $query = BioMart::Query->new(‘registry’ => $registry ‘virtualSchemaName’ => ‘default’); $query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id'); or with optional virtualSchema and interface settings: $query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id’, ’default’,’default’); $query->addFilter('hsapiens_gene_ensembl','chromosome_name',['1']); $query->addFilter('hsapiens_gene_ensembl','hgnc_symbol',['FGFR1','IL2','DERL3']);

Executing query and printing results my $query_runner = BioMart::QueryRunner->new(); $query_runner->execute($query); $query_runner->printResults;

Executing query and printing results Print formatted header: $query_runner->printHeader; Print just first 20 results: $query_runner->printResults(20); Change the formatter from tab-separated default before execute the query: $query->formatter(‘FASTA’); The formatter has to have a corresponding module in lib/BioMart/Formatter implementing the FormatterI.pm interface (eg) CSV, TXT, GTF, XLS etc

Multi dataset queries my $query = BioMart::Query->new('registry'=>$registry, 'virtualSchemaName'=>'default'); $query->addAttribute('hsapiens_gene_ensembl','ensembl_gene_id'); $query->addAttribute('hsapiens_gene_ensembl','ensembl_transcript_id'); $query->addAttribute('mmusculus_gene_ensembl','ensembl_gene_id'); $query->addAttribute('mmusculus_gene_ensembl','ensembl_transcript_id'); This is the equivalent of picking human as the main dataset in the web interface and mouse as the optional second dataset (ie) the human attributes appear first in the result table followed by the mouse attributes. Note that BioMart queries are currently restricted to two datasets maximum for performance reasons and query planning technical difficulties.

Web services type access ● To support GRID projects such as Taverna and other third party users who want to federate mart data without leaving a port to the database server openly accessible.

Web services type access

Web services type access Change format from default tab-separated format:

Web services type access Get count instead:

Web services type access Multi-dataset query:

Web services type access (1) Recover the registry file: (2) Recover the datasets available for a mart: virtualSchema=default&mart=ensembl (3) Recover the filters available for a dataset: virtualSchema=default&dataset=hsapiens_gene_ensembl (4) Recover the attributes available for a dataset: virtualSchema=default&dataset=hsapiens_gene_ensembl

MartJ ● Java Interface to Biomart Datasets ● Uses XML configuration shared by all BioMart Software

RegistryDSConfigAdaptor import org.ensembl.mart.lib.config.RegistryDSConfigAdaptor; URL confURL = null; try { confURL = InputSourceUtil.getURLForString(“data/defaultMartRegistry.xml”); } catch (MalformedURLException e) { throw new ConfigurationException("Warning, could not load " + “data/defaultMartRegistry.xml” + " file\n"); } RegistryDSConfigAdaptor adaptor = new RegistryDSConfigAdaptor(confURL, false, false, false);

DatasetConfig import org.ensembl.mart.lib.config.DatasetConfig; DatasetConfig config = adaptor.getDatasetConfigByDatasetInternalName( "hsapiens_gene_ensembl", "default" );

Query import org.ensembl.mart.lib.Query; Query query = new Query(); //query needs some information from the DatasetConfig query.setDataSource(config.getAdaptor().getDataSource()); query.setMainTables(config.getStarBases()); query.setPrimaryKeys(config.getPrimaryKeys());

FieldAttribute/AttributeDescription Import org.ensembl.mart.lib.config.AttributeDescription; import org.ensembl.mart.lib.FieldAttribute; AttributeDescription adesc = config.getAttributeDescriptionByInternalName("gene_stable_id"); query.addAttribute(new FieldAttribute( adesc.getField(), adesc.getTableConstraint(), adesc.getKey() ) );

Filter/FilterDescription There are three types of Filter that can be added to the query, both are created using the attributes of a FilterDescription A. BasicFilter B. BooleanFilter (but watch for the two boolean 'flavors') C. IDListFilter

FilterDescription import org.ensembl.mart.lib.config.FilterDescription; FilterDescription fdesc = config.getFilterDescriptionByInternalName(“chr_name”);

BasicFilter import org.ensembl.mart.lib.BasicFilter; //The config system actually masks alot of complexity //with regard to filters by requiring the internalName //again when calling the getXXX methods query.addFilter(new BasicFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), "=", "22" ) );

BooleanFilter import org.ensembl.mart.lib.BooleanFilter; //note there are different types of BooleanFilter //"boolean" and "boolean_num" if (fdesc.getType(name).equals("boolean")) query.addFilter(new BooleanFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), BooleanFilter.isNULL ) ); else //”boolean_num” query.addFilter(new BooleanFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), BooleanFilter.isNotNULL_NUM ) );

IDListFilter import org.ensembl.mart.lib.IDListFilter; String[] ids = new String[] { “ENSG ”, “ENSG ”, “ENSG ”, “ENSG ” }; query.addFilter(new IDListFilter( fdesc.getField(name), fdesc.getTableConstraint(name), fdesc.getKey(name), ids ) );

Engine import org.ensembl.mart.lib.Engine; import org.ensembl.mart.lib.FormatSpec; Engine engine = new Engine(); engine.execute( query, new FormatSpec(FormatSpec.TABULATED, "\t"), System.out );

Future of MartJ In the future, MartJ will be refactored to use the more flexible Architecture that we developed for the perl based software.