Presentation is loading. Please wait.

Presentation is loading. Please wait.

What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.

Similar presentations


Presentation on theme: "What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved."— Presentation transcript:

1 What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved via shared memory (ii) in DA fundamental change in encoding e.g. by generating files The main differences are in design tradeoffs that are driven by different constraints and requirements (latency, size, semantics, scheduling, storage access, …) There are fine scale events that may need to be taken into account at the workflow level (differently for DA and SA) Crossing system boundaries (physical/administrative) is a major challenge that is common between DA and IS We need to distinguish between complex systems that share resource on the same execution space possibly with shared data spaces from simply a single program linking a set of libraries

2 Are there common needs/problems/interfaces that could serve as the basis (or as stepping stones) along a path to (some reasonable level of) convergence? Dealing with unreliable resources has a greater emphasis in DA but it (becoming) a common problem General problem in interfacing workflows especially if heterogeneous with a common, complete description Difficult coordination between workflows and sub-workflows that may not work properly

3 Examples of steering workflows that require a human in the loop Simulation (titan) – in situ workflow for analysis (blue waters) – database at USC is populated for analysis by scientists. This is currently done under one workflow. In situ workflow: combustion simulation -> analytics for feature detection (extinction/reignition), interactive choice of parameters -> localized UQ. Observational data (light source) – local analysis – processing/reconstruction – access by scientists LLNL example: simulation where the additivity is changed based on the judgement call of a scientist during execution. Run an experiment in South Korea (KSTAR), the data is streamed in the US where the rest of the team is analyzing the data and providing feedback. The next experiment is based on the feedback. Initially, provide feedback in 30 minutes before next “shot”. The target is to provide the feedback within 10 seconds.

4 Are there applications that bridge the IS/DA worlds? -Where does the workflow “runs”? We do not always have a “master” workflow. We can have federated workflows. -Where do we observe the crossing of the boundaries between IS and DA workflows? -It is important to develop a common interface/language that allows workflows to communicate. This is particularly necessary at the distributed level where different facilities may adopt different solutions. -Communication may be particularly difficult among workflow systems that are specialized for different tasks

5 Workflow execution state Describe degree of progress when queried All of the provenance of the processes and data that have been affected by the workflow Provide enough information to recover a workflow that fails Communication of this information across layers of execution of the workflow Components may have minimal requirements in terms of how the describe their state

6 Feedback, steering : Possible events or humans in the loop Provenance information is more difficult to collect Authentication issues Planning/scheduling becomes more difficult Usability and User Interfaces become more important Increased emphasis towards interactivity instead of automation There are at least two levels (i) parameter tuning and (ii) change the structure of the workflow

7 Reliability. Level of fault tolerance is probably expected at different levels in IS and DA Automatic resubmission of jobs on HPC systems is unusual In DA there as already an assumption of lack of reliability and therefore greater emphasis on fault tolerance is embedded For example mem-to-mem copy is more efficient and probably more reliable short term but does not allow for recovery if something goes wrong Moving across the IS­-DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults.

8 Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so there is need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. How can the performance managed when crossing the workflow interfaces, or performance of the global workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times?

9 Summary findings/recommendations -We need to collect more use cases that combine IS/DA -Focus on abstractions that maximize similarity -Problems in managing federated resources -The human in the loop will increase productivity but will increase unpredictability of the system -Crossing the boundaries between workflows is one of the major challenges: common language for describing provenance, performance requirements/estimates, resource access, security, -Heterogeneous scheduling of workflows

10 END

11

12 Moving across the IS­DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults. How can the performance of crossing the interface, or performance of the global IS­DA workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times? Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so the need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. Adaptation Climate use case: data in a location A but execute at B. Data has to be fetched from A to B. Can we use multiple locations B and C for the same ensemble? Security policies (authentications) make it difficult to sun at multiple locations. It is important to be able to “transfer” credentials to allow wide area scheduling and planning.

13

14 Moving across the IS­DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults. How can the performance of crossing the interface, or performance of the global IS­DA workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times? Feedback, steering : Possible events Human in the loop Provenance information is more difficult to collect Authentication issues Planning/scheduling becomes more difficult Usability and User Interfaces become more important Increased emphasis towards interactivity instead of automation There are at least two levels (i) parameter tuning and (ii) change the structure of the workflow Reliability. Level of fault tolerance is probably expected at different levels in IS and DA. Automatic resubmission of jobs on HPC systems is unusual In DA there as already an assumption of lack of reliability and therefore greater emphasis on fault tolerance is embedded For example mem-to-mem copy is more efficient and probably more reliable short term but does not allow for recovery if something goes wrong Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so the need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. Adaptation Climate use case: data in a location A but execute at B. Data has to be fetched from A to B. Can we use multiple locations B and C for the same ensemble? Security policies (authentications) make it difficult to sun at multiple locations. It is important to be able to “transfer” credentials to allow wide area scheduling and planning.

15 Moving across the IS­DA interface entails not only moving data products, but control information about the state of the workflow. This state can include capturing the location(s) in the workflow graph where execution left off. Additionally, fault tolerance information may need to be included. Crossing the interface may also incur additional potential for faults. How can the performance of crossing the interface, or performance of the global IS­DA workflow be predicted and measured? What if the workflow includes one or more feedback cycles, and the interface must be crossed in the opposite direction or multiple times? Feedback, steering : Possible events Human in the loop Provenance information is more difficult to collect Authentication issues Planning/scheduling becomes more difficult Usability and User Interfaces become more important Increased emphasis towards interactivity instead of automation There are at least two levels (i) parameter tuning and (ii) change the structure of the workflow Reliability. Level of fault tolerance is probably expected at different levels in IS and DA. Automatic resubmission of jobs on HPC systems is unusual In DA there as already an assumption of lack of reliability and therefore greater emphasis on fault tolerance is embedded For example mem-to-mem copy is more efficient and probably more reliable short term but does not allow for recovery if something goes wrong Performance: predicted and actual behavior, estimating resource needs, monitoring, DA and IS tend to work at different time scales or task granularity so the need to make decisions at different levels. Planning and estimating before execution. This information should be communicated across workflows so that global planning is based on estimated performed by local workflows that know the system best. Adaptation Climate use case: data in a location A but execute at B. Data has to be fetched from A to B. Can we use multiple locations B and C for the same ensemble? Security policies (authentications) make it difficult to sun at multiple locations. It is important to be able to “transfer” credentials to allow wide area scheduling and planning.

16

17 Examples of steering workflows that require a human in the loop 3Dprinting example Initial cad model Add geometric constraints, Add physics Steer because visual is important Two examples by Dan Laney Simulation where the additivity is changed based on the judgement call of a scientist In situ workflow (combustion simulation, analytics for features and UQ) Run an experiment in South Korea (KSTAR), the data is streamed in the US where the rest of the team is analyzing the data and providing feedback. The next experiment is based on the feedback. Initially, provide feedback in 30 minutes before next “shot”. The target is to provide the feedback within 10 seconds.


Download ppt "What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved."

Similar presentations


Ads by Google