Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern.

Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern California Apache Software Foundation

2 Objectives Concepts u What is domain-specific software engineering (DSSE) u The “Three Lampposts” of DSSE: Domain, Business, and Technology u Domain Specific Software Architectures Product Lines Relationship between DSSAs, Product Lines, and Architectural Styles Examples of DSSE at work

3 Objectives Concepts u What is domain-specific software engineering (DSSE) u The Three Key Factors of DSSE: Domain, Business, and Technology u Domain Specific Software Architectures Product Lines Relationship between DSSAs, Product Lines, and Architectural Styles Examples of DSSE at work

4 Domain-Specific Software Engineering The traditional view of software engineering shows us how to come up with solutions for problems de novo But starting from scratch every time is infeasible u This will involve re-inventing many wheels Once we have built a number of systems that do similar things, we gain critical knowledge that lets us exploit common solutions to common problems u In theory, we can simply build “the difference” between our new target system and systems that have come before

5 Examples of Domains Compilers for programming languages Consumer electronics Electronic commerce system/Web stores Video game Business applications u Basic/Standard/“Pro” We can subdivide, too: u Avionics systems Boeing Jets  Boeing 747-400 Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

6 Traditional Software Engineering One particular problem can be solved in innumerable ways Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

7 Architecture-Based Software Engineering Given a single problem, we select from a handful of potential architectural styles or architectures, and go from these into specific implementations Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

8 Domain-Specific Software Engineering We map regions of the problem space (domains) into domain- specific software architectures (DSSAs) These are specialized into application-specific architectures These are implemented Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

9 Three Key Factors of DSSE Domain u Must have a domain to constrain the problem space and focus development Technology u Must have a variety of technological solutions—tools, patterns, architectures & styles, legacy systems—to bring to bear on a domain Business u Business goals motivate the use of DSSE Minimizing costs: reuse assets when possible Maximize market: develop many related applications for different kinds of end users

10 DomainBusiness Technology Three Key Factors Domain u Must have a domain to constrain the problem space and focus development

11 DomainBusiness Technology Three Key Factors Technology u Must have a variety of technological solutions—tools, patterns, architectures & styles, legacy systems—to bring to bear on a domain

12 DomainBusiness Technology Three Key Factors Business u Business goals motivate the use of DSSE Minimizing costs: reuse assets when possible Maximize market: develop many related applications for different kinds of end users

13 DomainBusiness Technology Three Key Factors Domain + Business “Corporate Core Competencies” u Domain expertise augmented by business acumen and knowledge of the market

14 DomainBusiness Technology Three Key Factors Domain + Technology “Application Family Architectures” u All possible technological solutions to problems in a domain u Uninformed and unconstrained by business goals and knowledge

15 DomainBusiness Technology Three Key Factors Business + Technology “Domain independent infrastructure” u Tools and techniques for constructing systems independent of any particular domain u E.g., most generic ADLs, UML, compilers, word processors, general- purpose PCs

16 DomainBusiness Technology Three Key Factors Domain + Business + Technology “Domain-specific software engineering” Applies technology to domain-specific goals, tempered by business and market knowledge

17 DomainBusiness Technology Three Key Factors Product-Line Architectures A specific, related set of solutions within a broader DSSE More focus on commonalities and variability between individual solutions

18 Becoming More Concrete Applying DSSE means developing a set of artifacts more specific than an ordinary software architecture u Focus on aspects of the domain u Focus on domain-specific solutions, techniques, and patterns These are u A domain model and u A domain-specific software architecture (DSSA)

19 Domain Model A domain model is a set of artifacts that capture information about a domain u Functions performed u Objects (also known as entities) that perform the functions, and on which the functions are performed u Data and information that flows through the system Standardizes terminology and semantics Provides the basis for standardizing (or at least normalizing) descriptions of problems to be solved in the domain

21 Domain Model Defines vocabulary for the domain Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

22 Domain Model Describes the entities and data in the domain Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

23 Domain Model Defines how entities and data combine to provide features Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

24 Domain Model Defines how data and control flow through entities Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

25 Info Model: Context Info Diagram Defines high- level entities Defines what is considered inside and outside the domain (or subdomains) Defines relationships and high-level flows Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

26 Info Model: Entity-Relationship Diagram Defines entities and cardinal relationships between them Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

27 Info Model: Object Diagram Defines attributes and operations on entities Closely resembles class diagram in UML but may be more abstract Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

28 Feature Model: Use Case Diagram Defines use cases within the domain Similar to use case models in UML Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

29 Operational Model: State Transition Diagram Focuses on states of systems and transitions between them Resembles UML state diagrams Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

30 Reference Requirements Mandatory u Must display the current status of the Lunar Lander (horizontal and vertical velocities, altitude, remaining fuel) u Must indicate points earned by player based on quality of landing Optional u May display time elapsed Variant u May have different levels of difficulty based on pilot experience (novice, expert, etc) u May have different types of input depending on whether Auto Navigation is enabled Auto Throttle is enabled u May have to land on different celestial bodies Moon Mars Jupiter’s moons Asteroid Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

31 Domain-Specific Software Architecture Definition: Definition. A domain-specific software architecture (DSSA) comprises: u a reference architecture, which describes a general computational framework for a significant domain of applications; u a component library, which contains reusable chunks of domain expertise; and u an application configuration method for selecting and configuring components within the architecture to meet particular application requirements. (Hayes-Roth)

32 Reference Architecture Definition. Reference architecture is the set of principal design decisions that are simultaneously applicable to multiple related systems, typically within an application domain, with explicitly defined points of variation. Reference architectures are still architectures (since they are also sets of principal design decisions) u Distinguished by the presence of explicit points of variation (explicitly “unmade” decisions)

33 Different Kinds of Reference Architecture Complete single product architecture u A fully worked out exemplar of a system in a domain, with optional documentation as to how to diversify Can be relatively weak due to lack of explicit guidance and possibility that example is a ‘toy’ Incomplete invariant architecture u Points of commonality as in ordinary architecture, points of variation are indicated but omitted Invariant architecture with explicit variation u Points of commonality as in ordinary architecture, specific variations indicated and enumerated

34 Example Reference Architecture Structural view of Lunar Lander DSSA Invariant with explicit points of variation u Satellite relay u Sensors Software Architecture: Foundations, Theory, and Practice; Richard N. Taylor, Nenad Medvidovic, and Eric M. Dashofy; (C) 2008 John Wiley & Sons, Inc. Reprinted with permission.

More Specific Example OODT (Object Oriented Data Technology ) framework Catalog and Archive Service component 35

File Management 9/30/2014 ESIP-SUMMER10 36 Managing the locations and ancillary information about files, and collections of files u Ancillary information is metadata What’s a product? u A collection of some set of files, and/or collections of files So, you could have collections of other collections u Along with metadata about the product

File Management: Core Terminology 9/30/2014 ESIP-SUMMER10 37

File Management: Features (1/4) 9/30/2014 ESIP-SUMMER10 38 Persisting archived files using dynamic metadata and flexible, adaptable policies based on product types u rather than the monolithic and inflexible old method of ProductTypeRepository/ProductName/ProductVersion/ as the filesystem location to store products for all product types. Clearly separating out the Workflow aspects of the File Manager, from Product ingestion, and flexibly supporting association of Workflows and their subsequent Tasks with any event, not only ingestion.

File Management: Features (2/4) 9/30/2014 ESIP-SUMMER10 39 Leverage existing transactional models such as Java's Transaction API to support transactional management rather than building our own API.Java's Transaction API If we do use any database communication, then making sure that all DB communication is dealt with using standard, available, existing db pooling APIs such as commons-dbcp, available from Apache. commons-dbcp Apache

File Management: Features (3/4) 9/30/2014 ESIP-SUMMER10 40 Clearly separating out the administrative portions of policy management from the existing webapp, and distinguishing what pieces of the webapp are user- centric, and what are administrative-centric. Supporting heirarchical product structures, such as nested directories that contain many sub-directories, and sub-directories of those sub-directories, with files strewn about at all levels u rather than only supporting the existing method of flat product structures, where all files in a product are at the same tree level.

File Management: Features (4/4) 9/30/2014 ESIP-SUMMER10 41 Support metadata extraction based on product type or mime-type Support dynamic product types. The file management component should not need to know about every product type a priori

File Management: Architectural Implications (1/4) 9/30/2014 ESIP-SUMMER10 42 Managing files u Repository: follow the typical repository pattern u Manage information about Products, Product Types, and References to products Managing metadata u Catalog: follow the typical registry pattern u Manage product Metadata Key/Value pairs Separate out the repository and the catalog u This allows data and metadata to be managed independently

File Management: Architectural Implications (2/4) 9/30/2014 ESIP-SUMMER10 43  Need a way to transfer a product from the client to the File Management service  Client gives URIs of files, or collections of files, which identify References belonging to a Product  How does the transfer actually occur?  Can have many different types of data transfer Local Use native system calls, or cp Remote Use whatever protocol you want, REST, XML-RPC, SOAP, WebDAV, etc. Don’t use CORBA or RMI: they’re sooooo last year!

File Management: Architectural Implications (3/4) 9/30/2014 ESIP-SUMMER10 44 Translating the URIs from the client to the File Manager presents an interesting challenge u For example, where should file:///home/chris/myfile.file be transferred to on the File Manager’s system? file:///home/chris/myfile.file Leverage and extend existing CAS method u Existing CAS would have answered the above questions with ProductTypeRepositoryPath/ProductName/VersionId/ u Why should that be the only answer?

File Management: Architectural Implications (4/4) 9/30/2014 ESIP-SUMMER10 45 Have the concept of a Versioner interface Versioner is called by the File Manager before the product is transferred from the client to the File Manager system (or by the system to itself) u Versioner uses the Product metadata, and the original product references to generate data store URIs that tell the DataTransfer implementation where to physically transfer the files for a particular Product Where did I get the name Versioner? u Don’t ask!

File Management: Architectural Conclusion 9/30/2014 ESIP-SUMMER10 46 So, how do we put all these different generic interfaces together? Well, something like the following u A File Manager has… A repository manager that controls how data is stored A catalog, to store metadata and product instance information to A set of Versioners that are associated with Product Types in order to figure out how to generate the reference data store URIs for a particular product A Data Transferer that moves a Product’s file from the client to the File Manager using the source URIs and the data store URIs An external interface to it (e.g., XML-RPC, WebDAV, etc.)

File Management Architecture 9/30/2014 ESIP-SUMMER10 47

Workflow Management 9/30/2014 ESIP-SUMMER10 48 Modeling, executing and monitoring groups of one or more Workflow Tasks Tasks could be u A script file u A java process u An external command u A call to a web service u Many more…

Workflow 9/30/2014 ESIP-SUMMER10 49 Workflow has many definitions u It’s typically represented as a graph u In traditional science data pipeline systems, this graph is constrained to be a sequential set of process nodes u Taxonomy of Workflow Management Systems http://www.gridbus.org/reports/GridWorkflowTaxonomy.pdf u Workflow Patterns http://is.tm.tue.nl/research/patterns/

9/30/2014 ESIP-SUMMER10 50 Workflow Management: Core Terminology

Workflow Management Features 9/30/2014 ESIP-SUMMER10 51 Workflow should be represented as a graph. This will allow for true parallelism. Workflow Management should support identified workflow patterns especially control-flow. workflow patterns u The current level of support for control-flow has to a large extent been relegated to tasks. A collection of tasks is associated with a product ingestion and there is only a priority to sort out the order of execution. Data-flow should be captured. The workflow should be able to minimally hook together input and output streams between tasks. Workflow need not have any interaction with a database u What if I want to persist a workflow in XML? u Or as a flat file, or some other lightweight format

Workflow Management: Architectural Implications (1/3) 9/30/2014 ESIP-SUMMER10 52  Workflow Repositories  Places to go and fetch and “abstract” workflow description from  Places to store running “workflow instance” information  Workflow Execution Engines  Give it an abstract workflow, and let it rip Turns an abstract workflow into a “Workflow Instance”  Should allow monitoring of the workflow instance  System interface  Associate abstract workflows with “events”  This way, workflows can be tied to things other than just product ingestion

Workflow Management: Architectural Implications (2/3) 9/30/2014 ESIP-SUMMER10 53 The Workflow Repository need not be a relational Database u It could be a flat file u A (set of) XML file(s) u An object database u Factories create Workflow Repositories, which create Workflows Tasks are associated with “Workflows”, not “Product Types” u This decouples workflow from the File Management aspects of the CAS Conditions can be pre, or post

Workflow Management: Architectural Implications (3/3) 9/30/2014 ESIP-SUMMER10 54 Workflows are interfaces u They could be backed by a (directed graph), or by an iterator (i.e., a sequential pipeline) or by a HashMap Workflow Tasks have clearly separated out dynamic and static metadata, and they can share metadata u Dynamic metadata is passed via the Workflow Engine between all the tasks in a workflow They can all read/write to it u Static metadata is associated with each workflow task

Workflow Execution 9/30/2014 ESIP-SUMMER10 55  Once you’ve got a Workflow, how do you execute it and turn it into a Workflow Instance?  You hand it off to a Workflow Engine  Workflow Engine manages:  A configurable, extensible thread pool “Worker Threads” are used to process the Workflow Instance they are each handed  A queue of worker threads if they aren’t any available workers in the thread pool to process a Workflow  Monitoring which Workers are handling which Workflow Instances, and the state and status of each Workflow Instance  Workflow Engines execute instances of Workflows

What is the external interface to the system? 9/30/2014 ESIP-SUMMER10 56 Event-based u Event names come into the Workflow Manager u The Workflow Manager looks up any Workflows associated with the event name u The Workflow Manager then calls the Workflow Repository to obtain representations of the Workflow u The Workflow Manager then hands off Workflow representations to the Workflow Engine for execution Current implementation uses XML-RPC, but it’s an interface, so it could use REST/HTTP/SOAP/etc.

Workflow Management: Architectural Conclusion 9/30/2014 ESIP-SUMMER10 57 So, how do we put all of these things together? Well, something like: u A Workflow Manager has A Workflow Repository to obtain abstract Workflow descriptions from A Workflow Instance Repository to store/persist running Workflow Instance information A Workflow Engines to execute Workflows on An external interface

Workflow Management Architecture 9/30/2014 ESIP-SUMMER10 58

Resource Management 9/30/2014 ESIP-SUMMER10 59  Resource management is the notion borne from the high performance computing and grid communities dealing with the execution of Jobs, and Tasks across heterogeneous computing resources including grids, clouds, clusters and individual Resource Nodes  Involves monitoring the underlying hardware resources  Disk space, CPU utilization, Load, etc., across the nodes  Should provide an easy to use façade for the workflow manager to pipeline jobs to

9/30/2014 ESIP-SUMMER10 60 Resource Management: Core Terminology

Resource Management: Definitions 9/30/2014 ESIP-SUMMER10 61  Job - an abstract representation of an execution unit, that stores information about an underlying program, or execution that must be run on some hardware node,including information about the Job Input that the Job requires, information about the job load, and the queue that the job should be submitted to.  Job Input - an abstract representation of the input that a Job requires.  Job Spec - a complete specification of a Job, including its Job Input, and the Job definition itself.  Job Instance - the physical code that performs the underlying job execution.  Resource Node - an available execution node that a Job is sent to by the Resource Manager.

Resource Management: Features 9/30/2014 ESIP-SUMMER10 62  Easy execution - of compute jobs to heterogeneous computing resources, with very different underlying specifications: large and small disks, network file storage, storage area networks, and exotic processor architectures.  Cluster Management Pluggability - the ability to plug into existing batch submission systems (e.g., Torque, LSF), and resource monitoring (e.g., Ganglia).  Scalability – The Resource Manager uses the popular client-server paradigm, allowing new Resource Manager servers to be instantiated, as needed, without affecting the Resource Manager clients, and vice-versa.  Communication over lightweight, standard protocols – The Resource Manager uses XML-RPC, as its main external interface, between Resource Manager client and server. XML-RPC, the little brother of SOAP, is fast, extensible, and uses the underlying HTTP protocol for data transfer.  Wrapping - the use of wrappers to insulate the internal code of Job Instances from their external interfaces allows a variety of different popular programming languages (e.g., shell scripting, Java, Python, Perl, Ruby) to be used to implement the actual job.  Scheduler Pluggability - the ability to define the underlying job scheduling policy.  XML-based job description - allows for existing XML-based editing tools to visualize the different job properties, and for standard job definitions, and interchange.

Resource Management: Architectural Implications (1/2) 9/30/2014 ESIP-SUMMER10 63 Batch Manager u responsible for sending Jobs to the actual nodes that the Resource Manager determines that it is appropriate that they execute on. Job Queue u responsible for queuing up Jobs when the Resource Manager determines that there are no Resource Nodes available to execute the Job on u persistence, and queuing policy (e.g., LRU, FIFO) are all dealt with by this extension point. Job Repository u responsible for actual persistence of a Job, throughout its lifecycle in the Resource Manager u retrieve Job and Job Spec information whether the Job is queued, or executing, or finished.

Resource Management: Architectural Implications (2/2) 9/30/2014 ESIP-SUMMER10 64 Monitor u responsible for monitoring the execution of a Job once it has been sent to a Resource Node by the Batch Manager extension point. Job Scheduler u responsible for determining the availability of underlying Resource Nodes u determining the policy for pulling Jobs off of the Job Queue to schedule for execution, interacting with the Job Repository, the Batch Manager, the Monitor System u provides the external interface to the Resource Manager services.

Resource Management: Architectural Conclusion 9/30/2014 ESIP-SUMMER10 65  So, how do we put all of these things together?  Well, something like:  A Resource Manager has A Job Queue to hold Jobs that cannot be run for scheduling or resource constraint reasons and to provide Jobs to the Scheduler. A Scheduler to implement a policy for determining which Jobs are ready to run. A Monitor to determine the current state (disk space, availability) of Resource Nodes in the system to run Jobs on. A Batch Submission system to physically execute jobs on remote Resource Nodes. A Job Repository to record the current state of a running Job. An external interface

9/30/2014 ESIP-SUMMER10 66 Resource Management Architecture

Conclusions Domain specific software engineering, reference architectures, etc. u Keep it abstract, focus on terminology, vocabulary and components/functions and styles Cull reusable requirements, information, etc., from prior experience u Need to have built it before to really get it Lots of examples to draw from! 67

Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern.

Similar presentations

Presentation on theme: "Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern.

Similar presentations

Presentation on theme: "Exploiting Reference Architecture to Guide the NASA Earth Science System Enterprise Chris A. Mattmann NASA Jet Propulsion Laboratory University of Southern."— Presentation transcript:

Similar presentations

About project

Feedback