What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.

Slides:



Advertisements
Similar presentations
Computer Systems & Architecture Lesson 2 4. Achieving Qualities.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Sensor Network Platforms and Tools
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
CS540 Software Design Lecture 1 1 Lecture 1: Introduction to Software Design Anita S. Malik Adapted from Budgen (2003) Chapters 1.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
System Partitioning Kris Kuchcinski
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Workload Management Massimo Sgaravatto INFN Padova.
An Introduction to Social Simulation Andy Turner Presentation as part of Social Simulation Tutorial at the.
Lecture Nine Database Planning, Design, and Administration
New Challenges in Cloud Datacenter Monitoring and Management
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Database Systems: Design, Implementation, and Management Ninth Edition
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Striving for Quality Using continuous improvement strategies to increase program quality, implementation fidelity and durability Steve Goodman Director.
What is Software Engineering? the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software”
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
ITEC224 Database Programming
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Computer Science Open Research Questions Adversary models –Define/Formalize adversary models Need to incorporate characteristics of new technologies and.
1 Minggu 9, Pertemuan 17 Database Planning, Design, and Administration Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
SAMANVITHA RAMAYANAM 18 TH FEBRUARY 2010 CPE 691 LAYERED APPLICATION.
David S. Ebert David S. Ebert Visual Analytics to Enable Discovery and Decision Making: Potential, Challenges, and.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Ahmad Al-Ghoul. Learning Objectives Explain what a project is,, list various attributes of projects. Describe project management, discuss Who uses Project.
Session-8 Data Management for Decision Support
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
Workshop on the Future of Scientific Workflows Break Out #2: Workflow System Design Moderators Chris Carothers (RPI), Doug Thain (ND)
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Human-computer interaction: users, tasks & designs User modelling in user-centred system design (UCSD) Use with Human Computer Interaction by Serengul.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
 Problem Definition  Presented by Sushant and Alex Overview of the problem space Scenario Issues Example (plant care example) Discussion conclusion open.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
ATLAS Grid Requirements A First Draft Rich Baker Brookhaven National Laboratory.
Programmability Hiroshi Nakashima Thomas Sterling.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
MSF and MAGE: e-Science Middleware for BT Applications Sep 21, 2006 Jaeyoung Choi Soongsil University, Seoul Korea
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Why is Design so Difficult? Analysis: Focuses on the application domain Design: Focuses on the solution domain –The solution domain is changing very rapidly.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
CS220:INTRODUCTION TO SOFTWARE ENGINEERING CH1 : INTRODUCTION 1.
OPERATING SYSTEMS CS 3502 Fall 2017
Action Breakout Session
Crossing the gap between multimedia data and semantics
Model-Driven Analysis Frameworks for Embedded Systems
Systems Engineering for Mission-Driven Modeling
Operating System Overview
Presentation transcript:

What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved via shared memory (ii) in DA fundamental change in encoding e.g. by generating files The main differences are in design tradeoffs that are driven by different constraints and requirements (latency, size, semantics, scheduling, storage access, …) There are fine scale events that may need to be taken into account at the workflow level (differently for DA and SA) Crossing system boundaries (physical/administrative) is a major challenge that is common between DA and IS We need to distinguish between complex systems that share resource on the same execution space possibly with shared data spaces from simply a single program linking a set of libraries

Are there common needs/problems/interfaces that could serve as the basis (or as stepping stones) along a path to (some reasonable level of) convergence? Dealing with unreliable resources has a greater emphasis in DA but it (becoming) a common problem General problem in interfacing workflows especially if heterogeneous with a common, complete description Difficult coordination between workflows and sub-workflows that may not work properly

Examples of steering workflows that require a human in the loop Simulation (titan) – in situ workflow for analysis (blue waters) – database at USC is populated for analysis by scientists. This is currently done under one workflow. In situ workflow: combustion simulation -> analytics for feature detection (extinction/reignition), interactive choice of parameters -> localized UQ. Observational data (light source) – local analysis – processing/reconstruction – access by scientists LLNL example: simulation where the additivity is changed based on the judgement call of a scientist during execution. Run an experiment in South Korea (KSTAR), the data is streamed in the US where the rest of the team is analyzing the data and providing feedback. The next experiment is based on the feedback. Initially, provide feedback in 30 minutes before next “shot”. The target is to provide the feedback within 10 seconds.

Are there applications that bridge the IS/DA worlds? -Where does the workflow “runs”? We do not always have a “master” workflow. We can have federated workflows. -Where do we observe the crossing of the boundaries between IS and DA workflows? -It is important to develop a common interface/language that allows workflows to communicate. This is particularly necessary at the distributed level where different facilities may adopt different solutions. -Communication may be particularly difficult among workflow systems that are specialized for different tasks

Workflow execution state Describe degree of progress when queried All of the provenance of the processes and data that have been affected by the workflow Provide enough information to recover a workflow that fails Communication of this information across layers of execution of the workflow Components may have minimal requirements in terms of how the describe their state

Feedback, steering : Possible events or humans in the loop Provenance information is more difficult to collect Authentication issues Planning/scheduling becomes more difficult Usability and User Interfaces become more important Increased emphasis towards interactivity instead of automation There are at least two levels (i) parameter tuning and (ii) change the structure of the workflow

Reliability. Level of fault tolerance is probably expected at different levels in IS and DA Automatic resubmission of jobs on HPC systems is unusual In DA there as already an assumption of lack of reliability and therefore greater emphasis on fault tolerance is embedded For example mem-to-mem copy is more efficient and probably more reliable short term but does not allow for recovery if something goes wrong Moving across the IS­-DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults.

Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so there is need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. How can the performance managed when crossing the workflow interfaces, or performance of the global workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times?

Summary findings/recommendations -We need to collect more use cases that combine IS/DA -Focus on abstractions that maximize similarity -Problems in managing federated resources -The human in the loop will increase productivity but will increase unpredictability of the system -Crossing the boundaries between workflows is one of the major challenges: common language for describing provenance, performance requirements/estimates, resource access, security, -Heterogeneous scheduling of workflows

END

Moving across the IS­DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults. How can the performance of crossing the interface, or performance of the global IS­DA workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times? Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so the need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. Adaptation Climate use case: data in a location A but execute at B. Data has to be fetched from A to B. Can we use multiple locations B and C for the same ensemble? Security policies (authentications) make it difficult to sun at multiple locations. It is important to be able to “transfer” credentials to allow wide area scheduling and planning.

Moving across the IS­DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults. How can the performance of crossing the interface, or performance of the global IS­DA workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times? Feedback, steering : Possible events Human in the loop Provenance information is more difficult to collect Authentication issues Planning/scheduling becomes more difficult Usability and User Interfaces become more important Increased emphasis towards interactivity instead of automation There are at least two levels (i) parameter tuning and (ii) change the structure of the workflow Reliability. Level of fault tolerance is probably expected at different levels in IS and DA. Automatic resubmission of jobs on HPC systems is unusual In DA there as already an assumption of lack of reliability and therefore greater emphasis on fault tolerance is embedded For example mem-to-mem copy is more efficient and probably more reliable short term but does not allow for recovery if something goes wrong Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so the need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. Adaptation Climate use case: data in a location A but execute at B. Data has to be fetched from A to B. Can we use multiple locations B and C for the same ensemble? Security policies (authentications) make it difficult to sun at multiple locations. It is important to be able to “transfer” credentials to allow wide area scheduling and planning.

Moving across the IS­DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults. How can the performance of crossing the interface, or performance of the global IS­DA workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times? Feedback, steering : Possible events Human in the loop Provenance information is more difficult to collect Authentication issues Planning/scheduling becomes more difficult Usability and User Interfaces become more important Increased emphasis towards interactivity instead of automation There are at least two levels (i) parameter tuning and (ii) change the structure of the workflow Reliability. Level of fault tolerance is probably expected at different levels in IS and DA. Automatic resubmission of jobs on HPC systems is unusual In DA there as already an assumption of lack of reliability and therefore greater emphasis on fault tolerance is embedded For example mem-to-mem copy is more efficient and probably more reliable short term but does not allow for recovery if something goes wrong Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so the need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. Adaptation Climate use case: data in a location A but execute at B. Data has to be fetched from A to B. Can we use multiple locations B and C for the same ensemble? Security policies (authentications) make it difficult to sun at multiple locations. It is important to be able to “transfer” credentials to allow wide area scheduling and planning.

Examples of steering workflows that require a human in the loop 3Dprinting example Initial cad model Add geometric constraints, Add physics Steer because visual is important Two examples by Dan Laney Simulation where the additivity is changed based on the judgement call of a scientist In situ workflow (combustion simulation, analytics for features and UQ) Run an experiment in South Korea (KSTAR), the data is streamed in the US where the rest of the team is analyzing the data and providing feedback. The next experiment is based on the feedback. Initially, provide feedback in 30 minutes before next “shot”. The target is to provide the feedback within 10 seconds.