Requirements for Specification of workflows in CLARIN Scenario design driven.

Slides:



Advertisements
Similar presentations
PHP I.
Advertisements

Automating Software Module Testing for FAA Certification Usha Santhanam The Boeing Company.
Overview and Demonstration of declarative workflows in SharePoint using Microsoft SharePoint Designer 2007 Kevin Hughes MCT, MCITP, MCSA, MCTS, MCP, Network+,
Pontus Boström and Marina Waldén Åbo Akademi University/ TUCS Development of Fault Tolerant Grid Applications Using Distributed B.
Chapter 7: User-Defined Functions II Instructor: Mohammad Mojaddam.
How Web Services are introduced in a CLARIN Workflow WSDL and WADL files for Web Service description.
Workshop Summary I think it was an excellent and inspiring meeting (do I have to say that?) I hoped to have a kind of kick-off effect like at the LREC.
The user entered the query “What is the historical relation between Greek and Roma”. Here are the query’s results. The user clicked the topic “Roman copies.
Chapter 4B: More Advanced PL/SQL Programming
Business Process Orchestration
The Programming Discipline Professor Stephen K. Kwan 2010 Things you need to know (learn) for developing large computer programs.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Guide To UNIX Using Linux Third Edition
Adding Automated Functionality to Office Applications.
Workflow API and workflow services A case study of biodiversity analysis using Windows Workflow Foundation Boris Milašinović Faculty of Electrical Engineering.
Software design and development Marcus Hunt. Application and limits of procedural programming Procedural programming is a powerful language, typically.
Python quick start guide
CLARIN tools for workflows Overview. Objective of this document  Determine which are the responsibilities of the different components of CLARIN workflows.
Requirements Elicitation. Requirement: a feature or constraint that the system must satisfy Requirements Elicitation: specification of the system that.
1 Web Based Programming Section 6 James King 12 August 2003.
ZEIT2301 Design of Information Systems
Developing Workflows with SharePoint Designer David Coe Application Development Consultant Microsoft Corporation.
Using Visual Basic 6.0 to Create Web-Based Database Applications
Operator Precedence First the contents of all parentheses are evaluated beginning with the innermost set of parenthesis. Second all multiplications, divisions,
Advanced PI Calculation Engine Makes Complex PI Calculations Easy! Use of EDICTvb for Multi-Plant Advanced PI Calculations Dane OverfieldEXELE Information.
Addressing design Goals  We decompose the system to address complexity  Assigning each subsystem to team to work on  But we also need to address system.
Agent Model for Interaction with Semantic Web Services Ivo Mihailovic.
Computing and the Web Operating Systems. Overview n What is an Operating System n Booting the Computer n User Interfaces n Files and File Management n.
How Java becomes agile riding Rhino Xavier Casellato VP of engineering, Liligo.com.
ASP.NET Programming with C# and SQL Server First Edition Chapter 3 Using Functions, Methods, and Control Structures.
1 Query Operations Relevance Feedback & Query Expansion.
Distributed Aircraft Maintenance Environment - DAME DAME Workflow Advisor Max Ong University of Sheffield.
Human-computer interfaces. Operating systems are software (i.e. programs) used to control the hardware directly used to run the applications software.
CpSc 462/662: Database Management Systems (DBMS) (TEXNH Approach) Stored Procedure James Wang.
95-843: Service Oriented Architecture 1 Master of Information System Management Service Oriented Architecture Lecture 7: BPEL Some notes selected from.
Code Clichés and Conventions Lecture 10: Supporting Material Dr Kathryn Merrick Thursday 2 nd April, 2009.
Handling Exceptions. 2 home back first prev next last What Will I Learn? Describe several advantages of including exception handling code in PL/SQL Describe.
CS Capstone OS Tools for OpenBSD Overview Presentation Team Fugu.
Grid Computing Environment Shell By Mehmet Nacar Las Vegas, June 2003.
SolarFlows Dr. Gabriele Pierantoni (TCD). Contents What is Heliophysics ? How could workflows help ? Some examples What we are doing...
How to Program? -- Part 1 Part 1: Problem Solving –Analyze a problem –Decide what steps need to be taken to solve it. –Take into consideration any special.
PROGRAMMING TESTING B MODULE 2: SOFTWARE SYSTEMS 22 NOVEMBER 2013.
ITERATION. Iteration Computers are often used to automate repetitive tasks. Repeating identical or similar tasks without making errors is something that.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Understanding Desktop Applications Lesson 5. Understanding Windows Forms Applications Windows Forms applications are smart client applications consisting.
Separating Test Execution from Test Analysis StarEast 2011 Jacques Durand (Fujitsu America, Inc.) 1.
Introduction to Javascript. What is javascript?  The most popular web scripting language in the world  Used to produce rich thin client web applications.
Registry requirements Regarding to Web Services and Workflows.
Week 1 Lecture 1 Slide 1 CP2028 Visual Basic Programming 2 “The VB Team” Copyright © University of Wolverhampton CP2028 Visual Basic Programming 2 v Week.
Chapter 7 Modularity Using Functions: Part II. A First Book of ANSI C, Fourth Edition 2 Variable Scope If variables created inside a function are available.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
SE 548 Process Modelling WEB SERVICE ORCHESTRATION AND COMPOSITION ÖZLEM BİLGİÇ.
Dynamic SQL Writing Efficient Queries on the Fly ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
JavaScript Invented 1995 Steve, Tony & Sharon. A Scripting Language (A scripting language is a lightweight programming language that supports the writing.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
1 Seminar on SOA Seminar on Service Oriented Architecture BPEL Some notes selected from “Business Process Execution Language for Web Services” by Matjaz.
WEB BASED DSS Aaron Atuhe. KEY CONCEPTS When software vendors propose implementing a Web-Based Decision Support System, they are referring to a computerized.
Dynamic SQL Writing Efficient Queries on the Fly
Data Virtualization Tutorial: Introduction to SQL Script
z/Ware 2.0 Technical Overview
Introduction to Triggers
Computer Engg, IIT(BHU)
Dynamic SQL Writing Efficient Queries on the Fly
Microsoft Access Illustrated
Designing and Debugging Batch and Interactive COBOL Programs
VISUAL BASIC.
Tiers vs. Layers.
Design Components are Code Components
Human–computer interfaces
Design Contracts and Errors A Software Development Strategy
Presentation transcript:

Requirements for Specification of workflows in CLARIN Scenario design driven

Objectives  The objective of this document is to discover and to understand what is needed to specify correctly a workflow inside CLARIN infrastructure.  It is out of the scope to deal with applications requirements like GUI, debugging, User interaction, workflow running, etc…  Also is not seen in this document the requirements for web services to be CLARIN compliant.

Scenario: Historical search using query expansion  In this scenario the user wants to query one word in historical documents in some corpora.  The word may have different spelling depending on the age of the documents.  Our Workflow will do a query expansion with the desired word, will get documents having the expanded words and will summarize the most relevant documents.

Components needed for this scenario Client expand query frequency sort_by_value Query Expander Corpus XVIII query General Corpus Calculations toolbox

Scenario: Step by step  Step 1: Query expansion.  Step 2: Query corpora for documents including the expanded words.  Step 3: Count word frequency in historical documents  Step 4: Summarize N documents with most word frequency

Step 1: Query expansion  CLARIN workflows must interact with WS. REST and SOAP without distinction. query_expander = ws(registry_PID1) corpus_XVIII = ws(registry_PID2) corpus_general = ws(registry_PID3) calculator = ws(registry_PID4) expanded_queries[] = query_expander.expand(initial_query) for each query in expanded_queries if query.century == "XVIII" then docs[] = corpus_XVIII.query(query.word) else docs[] = corpus_general.query(query.word) end for each doc in docs freq[doc.id] = calculator.frequency(query.word, doc) end sorted_freq = calculator.sort_by_value(freq) for i=0 to N output.append(docs[sorted_freq[i].key]) end

Step 2: Corpora query  Requirements: IF THEN ELSE clause Values coming from WS are used as conditions Be able to access to specified data inside complex objects coming from WS. Constants and Variables declaration for internal workflow use. Even complex types could be important. Scope of variables is an issue as well. (all global can cause problems overriding variables) query_expander = ws(registry_PID1) corpus_XVIII = ws(registry_PID2) corpus_general = ws(registry_PID3) calculator = ws(registry_PID4) expanded_queries[] = query_expander.expand(initial_query) for each query in expanded_queries if query.century == "XVIII" then docs[] = corpus_XVIII.query(query.word) else docs[] = corpus_general.query(query.word) end for each doc in docs freq[doc.id] = calculator.frequency(query.word, doc) end sorted_freq = calculator.sort_by_value(freq) for i=0 to N output.append(docs[sorted_freq[i].key]) end

Step 3: Count word frequency  Requirements: LOOP clauses  For each  While  For to Also Loops in parallel. WS are slow and it is not needed to do it sequentially. Parallelization features are required. query_expander = ws(registry_PID1) corpus_XVIII = ws(registry_PID2) corpus_general = ws(registry_PID3) calculator = ws(registry_PID4) expanded_queries[] = query_expander.expand(initial_query) for each parallel query in expanded_queries if query.century == "XVIII" then docs[] = corpus_XVIII.query(query.word) else docs[] = corpus_general.query(query.word) end for each parallel doc in docs freq[doc.id] = calculator.frequency(query.word, doc) end sorted_freq = calculator.sort_by_value(freq) for i=0 to N output.append(docs[sorted_freq[i].key]) end

Step 4: Summarize documents  Every operation, even if it is small should be a web service.  For optimum performance, very common tasks could be web services hosted in the CLARIN user computer (but still as a web service). query_expander = ws(registry_PID1) corpus_XVIII = ws(registry_PID2) corpus_general = ws(registry_PID3) calculator = ws(registry_PID4) expanded_queries[] = query_expander.expand(initial_query) for each parallel query in expanded_queries if query.century == "XVIII" then docs[] = corpus_XVIII.query(query.word) else docs[] = corpus_general.query(query.word) end for each parallel doc in docs freq[doc.id] = calculator.frequency(query.word, doc) end sorted_freq = calculator.sort_by_value(freq) for i=0 to N output.append(docs[sorted_freq[i].key]) end

What’s missing in this scenario?  Exception handling (special behaviour when an error is raised by a web service). Since registry will have a lot of information about alternative WS, mirrors, etc… CLARIN could handle it automatically in many cases. query_expander = ws(registry_PID1) corpus_XVIII = ws(registry_PID2) corpus_general = ws(registry_PID3) calculator = ws(registry_PID4) expanded_queries[] = query_expander.expand(initial_query) for each parallel query in expanded_queries if query.century == "XVIII" then docs[] = corpus_XVIII.query(query.word) else docs[] = corpus_general.query(query.word) end for each parallel doc in docs freq[doc.id] = calculator.frequency(query.word, doc) end sorted_freq = calculator.sort_by_value(freq) for i=0 to N output.append(docs[sorted_freq[i].key]) end on error run alternative webservice

We need more scenarios!  Backtracking, variable isolation, concurrent behaviour, delayed execution, etc…  There are many other features that we expect to be required by CLARIN workflows but we have not examples of that. We need more scenarios.

First workflows requirements list  If then else clause  Loops (while, for each, for to) clauses  Parallelization features. (also in loops)  Data manipulation of web services results.  Constant and Variable declaration. Basic operators.  Exception handling