Domain-Specific Languages for Composing Signature Discovery Workflows Ferosh Jacob*, Adam Wynne+, Yan Liu+, Nathan Baker+, and Jeff Gray* *Department of.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
DSM Workshop, October 22 OOPSLA 2006 Model-Based Workflows Leonardo Salayandía University of Texas at El Paso.
Transparent Robustness in Service Aggregates Onyeka Ezenwoye School of Computing and Information Sciences Florida International University May 2006.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
A FRAMEWORK BASED ON WEB SERVICES ORCHESTRATION FOR BIOINFORMATICS WORKFLOW MANAGEMENT Laboratory for Bioinformatics (LBI), Institute of Computing (IC)
Satzinger, Jackson, and Burd Object-Orieneted Analysis & Design
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Ch 12 Distributed Systems Architectures
Chapter 1: Overview of Workflow Management Dr. Shiyong Lu Department of Computer Science Wayne State University.
Software Issues Derived from Dr. Fawcett’s Slides Phil Pratt-Szeliga Fall 2009.
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Mining Metamodels From Instance Models: The MARS System Faizan Javed Department of Computer & Information Sciences, University of Alabama at Birmingham.
Presenter: Joshan V John Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan & Tien N. Nguyen Iowa State University, USA Instructor: Christoph Csallner 1 Joshan.
Software Process and Product Metrics
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
MODELING OF HIGH PERFORMANCE PROGRAMS TO SUPPORT HETEROGENEOUS COMPUTING FEROSH JACOB Department of Computer Science The University of Alabama Ph. D. defense.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
INTRODUCTION TO WEB DATABASE PROGRAMMING
SOA, BPM, BPEL, jBPM.
Data Mining Chun-Hung Chou
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
Software Engineering Reuse.
WPS Application Patterns at the Workshop “Models For Scientific Exploitation Of EO Data” ESRIN, October 2012 Albert Remke & Daniel Nüst 52°North Initiative.
1 The Architectural Design of FRUIT: A Family of Retargetable User Interface Tools Yi Liu, H. Conrad Cunningham and Hui Xiong Computer & Information Science.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
Key Challenges for Modeling Language Creation by Demonstration Hyun Cho, Jeff Gray Department of Computer Science University of Alabama Jules White Bradley.
Chapter 1: Overview of Workflow Management Dr. Shiyong Lu Department of Computer Science Wayne State University.
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Assessing the Frequency of Empirical Evaluation in Software Modeling Research Workshop on Experiences and Empirical Studies in Software Modelling (EESSMod)
Web Services interoperability and standards. Infrastructure Challenge ● Applied bioinformatics need various computer resources ● The amount and size of.
1 Sergio Maffioletti Grid Computing Competence Center GC3 University of Zurich Swiss Grid School 2012 Develop High Throughput.
Yazd University, Electrical and Computer Engineering Department Course Title: Advanced Software Engineering By: Mohammad Ali Zare Chahooki 1 Machine Learning.
Wrapping Scientific Applications As Web Services Using The Opal Toolkit Wrapping Scientific Applications As Web Services Using The Opal Toolkit Sriram.
Understanding the Human Network Martin Kruger LCDR Jodie Gooby November 2008.
Integrated Systems Division Service-Oriented Programming Guy Bieber, Lead Architect Motorola ISD C4I 2000 OOPSLA Jini Pattern Language Workshop Guy Bieber,
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Weaving a Debugging Aspect into Domain-Specific Language Grammars SAC ’05 PSC Track Santa Fe, New Mexico USA March 17, 2005 Hui Wu, Jeff Gray, Marjan Mernik,
Sep 13, 2006 Scientific Computing 1 Managing Scientific Computing Projects Erik Deumens QTP and HPC Center.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Software Engineering Introduction.
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
Matthew Farrellee Computer Sciences Department University of Wisconsin-Madison Condor and Web Services.
Application Specific Module Tutorial Zoltán Farkas, Ákos Balaskó 03/27/
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Holding slide prior to starting show. Lessons Learned from the GECEM Portal David Walker Cardiff University
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Review of Definitions Software life cycle: –Set of activities and their relationships to each other to support the development of a software system Software.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
GridWay Overview John-Paul Robinson University of Alabama at Birmingham SURAgrid All-Hands Meeting Washington, D.C. March 15, 2007.
Overview on the work performed during EPIKH Training Faiza MEDJEK /INFN, CATANIA 1.
M&CML: A Monitoring & Control Specification Modeling Language
Spark Presentation.
Unified Modeling Language
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Presentation transcript:

Domain-Specific Languages for Composing Signature Discovery Workflows Ferosh Jacob*, Adam Wynne+, Yan Liu+, Nathan Baker+, and Jeff Gray* *Department of Computer Science, University of Alabama, AL +Pacific Northwest National Laboratory, Richland, WA 1

Signature Discovery Intiative (SDI) The most widely understood signature is the human fingerprint Biomarkers can be used to indicate the presence of disease or identify a drug resistance Anomalous network traffic is often an indicator of a computer virus or malware Combinations of line overloads that may lead to a cascading power failure A signature is a unique or distinguishing measurement, pattern or collection of data that identifies a phenomenon (object, action or behavior) of interest 2

SDI high-level goals Anticipate future events by detecting precursor signatures, such as combinations of line overloads that may lead to a cascading power failure Characterize current conditions by matching observations against known signatures, such as the characterization of chemical processes via comparisons against known emission spectra Analyze past events by examining signatures left behind, such as the identity of cyber hackers whose techniques conform to known strategies and patterns 3

SDI Analytic Framework (AF) Solution: Analytic Framework (AF) Legacy code in a remote machine is wrapped and exposed as web services, Web services are orchestrated to create re-usable tasks that can be retrieved and executed by users 4 Challenge: An approach that can be applied across a broad spectrum to efficiently and robustly construct candidate signatures, validate their reliability, measure their quality and overcome the challenge of detection

Challenges for scientists in using AF Accidental complexity of creating service wrappers  In our system, manually wrapping a simple script that has a single input and output file requires 121 lines of Java code (in five Java classes) and 35 lines of XML code (in two files). Lack of end-user environment support  Many scientists are not familiar with service-oriented software technologies, which force them to seek the help of software developers to make Web services available in a workflow workbench. 5

A domain-specific modeling approach We applied Domain-Specific Modeling (DSM) techniques to Model the process of wrapping remote executables. The executables are wrapped inside AF web services using a Domain- Specific Language (DSL) called the Service Description Language (SDL). Model the SDL-created web services The SDL-created web services can then be used to compose workflows using another DSL, called the Workflow Description Language (WDL). 6

Example application: BLAST execution Submit BLAST job in a cluster Check the status of the job Download the output files upon completion of the job. 7

Output generated as Taverna workflow executable 8 1.Submit job 2. Check status 3. Download files

Example application: BLAST execution Service description (SDL) for BLAST submission Workflow description (WDL) for BLAST 9

Implementation Script metadata (e.g., name, inputs) SDL (e.g., blast.sdl) WDL (e.g., blast.wdl) Inputs Outputs Web services (e.g., checkJob) Web services (e.g., checkJob) Taverna workflow (e.g., blast.t2flow) Taverna workflow (e.g., blast.t2flow) Workflow Web services (Taverna engine) Retrieve documents (AF framework) Apply templates (Template engine) 10

Related works Compared to domain-independent workflows like JBPM and Taverna, our framework has the advantage that it is configured only for scientific signature discovery workflows. Most of these tools assume that the web services are available. Our framework configures the workflow definition file that declares how to compose services wrappers created by our framework. 11

Questions ? 12

Xtext grammar for WDL 13

An overview of SDL code generation No Service Utils/Script{Inputs (type)] [Outputs(type)] LOCTotal LOC (files) 1echoStringecho[0][1 (doc)] (4) 2echoFileecho[1 (String)] [1 (doc) ] (4) 3aggregatecat[1(List doc) ] [1 (doc) ] (4) 4classifier_TrainingR[2 (doc), 1 (String) ] [1 (doc) ] (4) 5classifier_TestingR[3 (doc), 1 (String) ] [1 (doc) ] (4) 6accuracyR[1 (doc) ] (4) 7submitBlastSLURM, sh[3 (doc) ] [2 (String) ] (5) 8jobStatusSLURM, sh[1 (String) ] (4) 9blastResultcp[1 (String) ] [1 (doc) ] (4) 10mafft [1 (doc) ] (4) 14

Taverna classification workflow 15