Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

Analysis of High-Throughput Screening Data C371 Fall 2004.
Project JUNIOR: Drug Discovery using Azure Project JUNIOR: Drug Discovery using Azure April 6-7, 2010 Redmond, Washington.
Message Queues COMP3017 Advanced Databases Dr Nicholas Gibbins –
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Science Cloud Paul Watson Newcastle University, UK
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
The Microsoft Cloud Azure Platform This presentation incorporates some content from Microsoft.
1 The Case for Versatile Storage System NetSysLab The University of British Columbia Samer Al-Kiswany, Abdullah Gharaibeh, Matei Ripeanu.
The MashMyData project Combining and comparing environmental science data on the web Alastair Gemmell 1, Jon Blower 1, Keith Haines 1, Stephen Pascoe 2,
The Real Time Cloud Paul Watson Newcastle University, UK.
Chapter 2 Database Environment Pearson Education © 2014.
Windows Azure for scalable compute and storage SQL Azure for relational storage for the cloud AppFabric infrastructure to connect the cloud.
Modeling Public Pensions with Mathematica and Python II
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
SaaS, PaaS & TaaS By: Raza Usmani
Platform as a Service (PaaS)
Cloud Computing Systems Lin Gu Hong Kong University of Science and Technology Sept. 21, 2011 Windows Azure—Overview.
Sergey Belov, Tatiana Goloskokova, Vladimir Korenkov, Nikolay Kutovskiy, Danila Oleynik, Artem Petrosyan, Roman Semenov, Alexander Uzhinskiy LIT JINR The.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
Towards auto-scaling in Atmosphere cloud platform Tomasz Bartyński 1, Marek Kasztelnik 1, Bartosz Wilk 1, Marian Bubak 1,2 AGH University of Science and.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
Cloud MapReduce : a MapReduce Implementation on top of a Cloud Operating System Speaker : 童耀民 MA1G Authors: Huan Liu, Dan Orban Accenture.
Cloud Computing Saneel Bidaye uni-slb2181. What is Cloud Computing? Cloud Computing refers to both the applications delivered as services over the Internet.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Cloud Computing By Mihir Chitnis.
Geographic Information Systems Cloud GIS. ► The use of computing resources (hardware and software) that are delivered as a service over the Internet ►
Larisa kocsis priya ragupathy
Software Architecture
Microsoft Research Faculty Summit Paul Watson Professor of Computer Science Newcastle University, UK.
07:44:46Service Oriented Cyberinfrastructure Lab, Introduction to BOINC By: Andrew J Younge
Esri UC 2014 | Technical Workshop | Esri Roads and Highways: Integrating and Developing LRS Business Systems Tom Hill.
A semi autonomic infrastructure to manage non functional properties of a service Pierre de Leusse Panos Periorellis Paul Watson Theo Dimitrakos UK e-Science.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
© 2008 Progress Software Corporation1 SOA-33: Transactions in a SOA World What happens next? Flight Booking Hotel Booking Car Booking (3) Calls (2) Change.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
A Technical Overview Bill Branan DuraCloud Technical Lead.
GOOGLE APP ENGINE By Muktadiur Rahman. Contents  Cloud Computing  What is App Engine  Why App Engine  Development with App Engine  Quote & Pricing.
User Scenarios in VENUS-C Focus on Structural Analysis Ignacio Blanquer I3M - UPV.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
Use of Machine Learning in Chemoinformatics
(re)-Architecting cloud applications on the windows Azure platform CLAEYS Kurt Technology Solution Professional Microsoft EMEA.
Building web applications with the Windows Azure Platform Ido Flatow | Senior Architect | Sela | This session.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
Sausalito: An Application Server for RESTful Services in the Cloud Matthias Brantner & Donald Kossmann 28msec Inc.
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Energy Management Solution
Platform as a Service (PaaS)
Connected Infrastructure
Platform as a Service (PaaS)
By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani
Connected Maintenance Solution
Parcel Tracking Solution Parcel Tracking What to look for Architecture
Connected Maintenance Solution
Peer-to-peer networking
Software Design and Architecture
Introduction to R Programming with AzureML
Connected Infrastructure
Energy Management Solution
Chapter 2 Database Environment Pearson Education © 2009.
Powerful Microsoft Azure Platform Hosts the Cloud-Based Student Portal Built on Office 365 “Microsoft Azure has revolutionized our software and our offering.
Building a Database on S3
Lecture 1: Multi-tier Architecture Overview
Orchestration and data movement with Azure Data Factory v2
Presentation transcript:

Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud Futures Conference, Redmond, 8 th April 2010 The team: David Leahy, Jacek Cala, Hugo Hiden, Dominic Searson, Vladimir Sykora, Martyn Taylor, Simon Woodman With thanks to: -Microsoft External Research for their financial support for Project Junior -Christophe Poulain, Savas Parastatidis

Chemists want to know: Q1. What are the properties of this molecule? Toxicity Solubility Biological Activity Q2. What molecule would have aqueous solubility of 0.1 μg/mL?

Answering the Question by performing experiments..... time consuming, expensive, ethical Issues

Quantitative Structure Activity Relationship - predict properties based on similar molecules An alternative to experimentation: QSAR Activity ≈ f( ) quantifiable structural attributes, e.g. #atoms logp shape.....

New/ Improved Models New Data or Model-Builders Data Model- Builders Model Generation Models Generating the models - Discovery Bus (Leahy et al)

Selected Descriptors + Responses Separate Training & Test Data Chemical Structures & their Activities Calculate Descriptors from Structures Training Data Combine Descriptors Descriptors + Responses Filter Descriptors Combined Descriptors + Responses Multiple Linear Regression Neural Network Partial Least Squares Classification Trees..... Build & Test Models Independently Select Best Models Add to Model Database Test Data

Increasing amounts of data for model building... CHEMBL : data on 622,824 compounds, collected from 33,956 publications WOMBAT-PK:data on 1230 compounds, for over 13,000 clinical measurements WOMBAT :data on 251,560 structures, for over 1,966 targets All contain structure information & numerical activity data More models Better models  Computationally expensive: 5 years for new datasets on existing server

JUNIOR Project Aim Use Azure to generate models in weeks not years.... using as much of the available data as possible.... make models available on so that researchers can generate predictions for their own molecules

Selected Descriptors + Responses Separate Training & Test Data Chemical Structures & their Activities Calculate Descriptors from Structures Training Data Combine Descriptors Descriptors + Responses Filter Descriptors Combined Descriptors + Responses Multiple Linear Regression Neural Network Partial Least Squares Classification Trees..... Build & Test Models Independently Select Best Models Add to Model Database Test Data Potential for concurrency...

Approach avoid rewriting all existing Discovery Bus software move existing Discovery Bus to Amazon Cloud – without parallelisation move critical tasks to run concurrently on Azure base solution around e-Science Central....

Clouds to the rescue? Building scalable, dependable, science applications is still hard..... e-Science Central – Science Cloud Platform

Science Cloud Options Cloud Infrastructure: Storage & Compute Science Cloud Platform Science App 1 Science App Science App n Science App n Users Cloud Infrastructure: Storage & Compute Cloud Infrastructure: Storage & Compute Science App 1 Science App Science App n Science App n Users 

Cloud Infrastructure: Storage & Compute Science Cloud Platform Science App 1 Science App Science App n Science App n Users e-Science Central Science Cloud Platform for developers Science as a Service for users

What should the Science Cloud Platform Include? Store Analyse Automate Share Data (instruments, experimental data, sensors...) Identify the common needs of our e-Science Users

Science Cloud Platform App.... Workflow Enactment App API Social Networking Security Processing e-Science Central Storage App Analysis Services Cloud Infrastructure Provenance Metadata

Workflow Enactment App API Social Networking Security Processing e-Science Central Storage Analysis Services Azure Provenance Metadata Discovery Bus Planner Amazon

Discovery Bus invokes e- Science Central Workflow via API Workflow decomposed to Message Plan Temporary workflow storage assigned, Message Plan queued for execution. Workflow temporary storage 4 4 Messages sent in sequence Call Message Response Message Internal Service Azure Service NFS HTTP Call Message Response Message RMI / JMS HTTP Post 5 5 Results data stored in e- Science Central folder Workflow Execution Completes 5 5 Discovery Bus notified with results Message Plan

Task Web Node Worker Node Queue e-Science Central Azure Blob Storage Results

Running across up to 100 Azure nodes Azure utilisation increasing (average ~60% over runs) Moving more admin tasks to Azure / e-Science Central – model validation – co-ordination Current Status – Work in Progress

CPU Utilization

Summary Discovery Bus exemplifies a good Cloud pattern –large, variable, bursty requirements –proposal to apply to software verification clouds do NOT make it easier to build complex, scalable, dependable distributed systems –we need higher-level “Science Cloud Platforms” –e-Science Central is our attempt at this using the Azure Cloud we have a scalable system that can handle large new datasets the models are being made freely available –