Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context.

Similar presentations


Presentation on theme: "National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context."— Presentation transcript:

1 National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context of those activities Adopt the Rube Goldberg view Rube Goldberg

2 National Center for Supercomputing Applications Grand challenge: systems-scale science Observation and modeling of multiple systems at multiple scales Linking data and tools from different disciplines to get a valid global result! “... modeling complex systems will be a major research challenge for the 21 st century” - National Science Foundation

3 National Center for Supercomputing Applications Building current practices up isn't working Heterogeneous tools, data formats Little global coordination of research Little funding for sustained stewardship of tools and data M.C. Escher, “Tower of Babel” (1928)‏

4 National Center for Supercomputing Applications Proposed solutions aren't working e-Journals – not machine-interpretable Collaboration tools  scientists just use email like everyone else Portals and digital libraries – typically:  centralized  domain-specific The Grid – can orchestrate complex processing jobs, but that's not science

5 National Center for Supercomputing Applications Only networks work at scale Single researcher  Ad hoc data mgt, single-user apps Community  Community tools, resources, control Global  No global practice, tools, control Desktop Workgroup Network

6 National Center for Supercomputing Applications How do we get there? e-Science means managing  Process, and  Data Current approaches favor one or the other Information is getting lost model refine observe predict data critical interface

7 National Center for Supercomputing Applications Trends: process data DataSemantics Batch Metadata Interactive Workflow * mainframes * digital libraries * portals * ontologies * provenance * desktop apps * formats * e-notebooks * the grid process data * rules

8 National Center for Supercomputing Applications Key technologies Semantic web: data/metadata  Provides means of merging descriptive information even if it only partially agrees (e.g., comes from two different communities)‏ Workflow: process  Describes complex procedures independently of how they are executed Provenance: process + data/metadata  Links workflow, data, and any ancillary descriptive information (e.g., attribution)‏

9 National Center for Supercomputing Applications Semantics: data to knowledge Data Information Knowledge Concrete Abstract Aggregation, annotation Learning, inference Streams, arrays, swaths, etc. (a.k.a. files)‏ Collections, tags, attributes, etc. (a.k.a. metadata)‏ Ontologies, rules, models, etc. (a.k.a. semantics)‏ (cf Reagan Moore)‏

10 National Center for Supercomputing Applications Semantic web: RDF triple Declarative: asserts a fact Subject and object URI's identify arbitrary entities (things, people, concepts, events)‏ Predicate identifies the relationship between them subjectobject predicate

11 National Center for Supercomputing Applications Triples form an open network Subject nodes aren't “owned” by any single agent or container Any actor can add arcs to the implicit, total, world graph Any two graphs can be joined hasBreed

12 National Center for Supercomputing Applications Non satis non scire (to know is not enough)‏ Semantic web “layer cake” Where do we manage process?  User interface?  Applications? “Semantic Grid” (D. DeRoure, C. Goble)‏ (source: World Wide Web Consortium)‏

13 National Center for Supercomputing Applications Workflow: process description Describe complex operations as networks of simpler operations Abstract operation execution from description Can be shared (but may not be portable)‏ (Taverna)‏ (Kepler)‏

14 National Center for Supercomputing Applications Anatomy of a workflow Declarative: says what do to Modules identify arbitrary procedures Arcs identify flow of control and/or data (data flow is usually implicit)‏ “Module” Control flow Execution model (usu. implicit)‏

15 National Center for Supercomputing Applications Workflow systems Modules representing units of computation Language for specifying WF  modules  control flow Engine for executing WF D2K (source: NCSA)‏

16 National Center for Supercomputing Applications Work vs. workflow systems Scientists are not WF modules Science work also involves  social organization incl. funding  field and “wet lab” manual work  discourse: review, validation (source: CNRS/UCSD)‏

17 National Center for Supercomputing Applications Provenance: what happened Answers critical questions  What led to this result?  When and how were observations made, conclusions reached? Is a causal network of events

18 National Center for Supercomputing Applications Complementary incomplete notions of provenance Artifact-centric (e.g., digital libraries)‏  “lineage”= events in lifecycle of artifact e.g., custody  IR's focus on curation events (not antecedent processes)‏ Process-centric (e.g., workflow)‏  computational events (e.g., service invocations)‏  control flow  artifacts are either not mentioned or opaque (tool-specific)‏

19 National Center for Supercomputing Applications Provenance Challenges 1 & 2 IPAW 2006, HPDC 2007 20 teams, 1 workflow, 9 queries  major players Interoperability?  lots of manual work required  call for standards (source: gridprovenance.org)‏

20 National Center for Supercomputing Applications Artifact + process provenance = “open provenance” Can describe any process, not just WF execution (e.g., science!)‏ Allows alternate accounts by different observers Rules for inferring transitive causal relationships (source: Luc Moreau et al)‏

21 National Center for Supercomputing Applications Open Provenance Model 3 node types – artifact, process, agent 5 arc types – used, generated, triggered, derived, controlled – and inference rules Generic – extensibility via annotation Choice of granularity and focus (e.g., artifact or process-centric)‏ (source: Luc Moreau et al)‏

22 National Center for Supercomputing Applications NCSA Provenance Infrastructure Open Provenance Model Tupelo Semantic Content Repository Context OPM toolkit Store OPM toolkit Visualization, interaction Tracking, modeling, presentation Abstraction, inference, storage destkop, portal, etc.

23 National Center for Supercomputing Applications Tupelo: semantic content Abstracts content from storage impls (e.g., Sesame, Mulgara)‏ Provides location-independent addressing of content and metadata Supports transparent mirroring, caching, failover, etc. (tupeloproject.org)‏

24 National Center for Supercomputing Applications CyberIntegrator: workflow by example Records what users do as provenance  source, intermediate, and final artifacts  steps and parameters Can re-enact interaction as a workflow

25 National Center for Supercomputing Applications MAEviz: analaysis/viz app, workflow “behind the scenes” GIS app. platform Earthquake hazard analysis plug-in Data catalog  built environment  fragility/hazard models Driven by workflow -> provenance

26 National Center for Supercomputing Applications CyberCollaboratory: collaboration + provenance User interaction with tools generates events Events are captured using the OPM and published to Tupelo Non-portal apps can browse / use provenance

27 National Center for Supercomputing Applications Summary “The way things go” is critical to e-Science at scale Provenance is an open causal network New infrastructure supports provenance

28 National Center for Supercomputing Applications Resources / acknowledgements Grid Provenance Challenge  http://twiki.gridprovenance.org/ NCSA technologies  Tupelo: http://tupeloproject.org/  CyberIntegrator: http://isda.ncsa.uiuc.edu/  MAEviz: http://maeviz.cee.uiuc.edu/  CyberCollaboratory: http://ecid.ncsa.uiuc.edu/cybercollab/ Acknowledgements:  Jim Myers, Luc Moreau, Juliana Friere, Patrick Paulson, Simon Miles, Bob McGrath, and more...


Download ppt "National Center for Supercomputing Applications The Way Things Go e-Science is a complex activity Scientific knowledge is comprehensible only in the context."

Similar presentations


Ads by Google