Presentation on theme: "Engineering workflow elements Machiel Jansen E-science support SARA Amsterdam."— Presentation transcript:
Engineering workflow elements Machiel Jansen E-science support SARA Amsterdam
Who am I Living in Baarn with Jacqueline No pets…but
Who am I Education Psychology: Artificial Intelligence Philosophy: mainly logic. Computer Science PhD on Formal explorations of knowledge intensive tasks.
And now… Work: IPS, Getronics, UvA, VU (VL-e), Collexis, SARA Languages: Java, Prolog, and many more I try to forget. Fields of expertise: Knowledge representation, logic, information retrieval, software engineering, formal languages, Grid, cognition and as a hobbies: Dutch architecture, art history, literature, birds, plants and history in general.
E-science E-science is about sharing and collaboration It aims for generic software components on a shared infrastructure In BioAssist this should be realized by means of Grid, webservices and workflow
Sharing and collaboration Sharing data and sharing functionality Sharing involves agreeing on common formats and the meaning of the information shared It involves the setting up of contracts for the functionality of software artefacts Sharing means that software should be generic and reusable
Workflow and Web Services The vision: Workflow and Web Services offer you a way to share and collaborate. You publish and share Web Services and use others. (The same with workflows). But what are Web Services? Let’s limit it to SOAP/WSDL…..
SOAP Originally Simple Object Access Protocol but now SOAP stands for SOAP! ▫(Called XP for XML Protocol for a while) Originates from doing Remote Procedural Calls over XML and HTTP Microsoft initiated but moved to W3C Originally a number of competitors ▫XML-RPC etc…
SOAP Web Services Independent of Programming language The objects that you send are serialized and “marshalled” in XML. XML data is received, validated and processed. SOAP is a way to do RPC. But not the only one.
Web Services in workflow In workflow WS are just workflow elements. In Taverna WS can be replaced by local programs or other technical implementations. Each workflow element should be generic. This means it can play a role in more than one application. Workflow can be seen as big abstract programs and the Web Services are as generic software modules.
Important issues How do you identify Workflow Elements? They should play a role in more than one workflow They should be independent of other workflow elements They should be elegant and easily used
More important issues They should be resistant to change They should have a clear interface to the outside world.
Identifying workflow elements Workflow elements are high level objects. Use OO techniques and engineering Start big and partition into smaller chunks and apply the criterion for loose coupling and high cohesion.
(Loose) coupling This is the criterium for independence between elements (objects). Loose coupling means independency. Loosely coupled components do not rely on each other. How well are they separated? A workflow element normally is completely decoupled from any other until you create the workflow. Then you couple elements together.
Workflow coupling Workflow elements may be coupled in time, meaning the availability of one system does affect the other. (Synchronous WS!) Workflow elements may be coupled in format, meaning that differences in data models do have to be resolved to achieve integration. Workflow elements may be coupled in function, meaning that their separate use is not very useful (or likely).
Data coupling What comes out one workflow element must be reformatted before it is send as input to the next. In Taverna: shims and beanshells. In WS this is very hard to go around. WS are stateless. In OO you use decoupling techniques like factories and dependency injection?
Factories and dependency injection Briefly… Suppose a class that listst movies. It should be independent of HOW the movies are stored. Create an interface which publish an abstract method. Then give different implementations Inject the needed implementations in the class that uses a movie lister. Wanna know more? Read Fowler, use Spring, read Design Patterns.
Well… Suppose your WS returns a list of proteins. Another WS wants it tab-delimited, another in XML. How you deal with that? Do you at all? If not, the workflow programmer has to write a shim. But it’s the job of the workflow engine to wire workflow elements. The shim means tight data coupling. You can provide different methods, but then the interface publishes non- functional details. For each new type the interface changes! What you want is to that the workflow engine does the wiring. It should first set the type in the service (injection) and then make a general call for a protein list. But WS are stateless. Still, this is food for thought.
Cohesion Cohesion is about how the activities within a single module are related to one another. Cohesion is the measure of the strength of functional relatedness of elements within a module.
Functional cohesion If you can sum up everything that the module accomplishes as one problem-related function, then that module is functionally cohesive. A functionally cohesive module contains elements that all contribute to the execution of one and only one problem-related task. COMPUTE COSINE OF ANGLE VERIFY ALPHABETIC SYNTAX READ TRANSACTION RECORD Reuse is good.
Cohesion and workflows A workflow element should be a functional cohesive unit on a high level of granularity. If a workflow element is highly functional cohesive it can be used in many different workflows.
Sequential cohesion A sequentially cohesive module is one whose elements are involved in activities such that output data from one activity serves as input data to the next. CLEAN CAR BODY FILL IN HOLES IN CAR SAND CAR BODY APPLY PRIMER Reuse is mor problematic. The activities do not form a functional enitity
Procedural cohesion A procedurally cohesive module is one whose elements are involved in different and possibly unrelated activities in which control flows from each activity to the next. (Remember that in a sequentially cohesive module data, not control, flows from one activity to the next.) CLEAN UTENSILS FROM PREVIOUS MEAL PREPARE TURKEY FOR ROASTING MAKE PHONE CALL TAKE SHOWER CHOP VEGETABLES SET TABLE Such modules are hard to reuse. They may fit in a specific scenario.
The problem with workflow elements Workflow elements are not Web Services by definition. They are generic loosely coupled, functional cohesive, generic software modules. This means that a proper workflow element should be able to be easily used in different kinds of workflows. This is difficult!
Difficulties Organizational -- developing reusable software requires a deep understanding of application developer needs and business requirements. As the number of developers and projects employing reusable assets increases, it becomes hard to structure an organization which can provide effective feedback. Economic -- Developing reusable system takes more effort and time, and hence money. These investments should pay off later. Administrative -- Although it's common to scavenge small classes or functions opportunistically from existing programs, developers often find it hard to locate suitable reusable modules outside of their immediate workgroups. Political -- Groups that develop reusable middleware platforms are often viewed with suspicion by application developers, who resent the fact that they may no longer be empowered to make key architectural decisions. Likewise, rivalries among different may prevent reuse. Psychological -- application developers may also perceive “top down” reuse efforts as an indication that management lacks confidence in their technical abilities. In addition, the “not invented here'' syndrome is ubiquitous in many organizations, particularly among highly talented programmers (reinventing the wheel).