WebLicht Application and Workspaces Munich September 2010 www.d-spin.org WebLicht Application and “Workspaces” Erhard Hinrichs & Thomas Zastrow University.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

Welcome to Middleware Joseph Amrithraj
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Snejina Lazarova Senior QA Engineer, Team Lead CRMTeam Dimo Mitev Senior QA Engineer, Team Lead SystemIntegrationTeam Telerik QA Academy SOAP-based Web.
Objectives In this session, you will learn to:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Technical Architectures
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Web browsers.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
OCT1 Principles From Chapter One of “Distributed Systems Concepts and Design”
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
Jun Peng Stanford University – Department of Civil and Environmental Engineering Nov 17, 2000 DISSERTATION PROPOSAL A Software Framework for Collaborative.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development I” Course materials.
Platform as a Service (PaaS)
Enterprise Resource Planning
INTRODUCTION TO WEB DATABASE PROGRAMMING
Joel Bapaga on Web Design Strategies Technologies Commercial Value.
Beyond DHTML So far we have seen and used: CGI programs (using Perl ) and SSI on server side Java Script, VB Script, CSS and DOM on client side. For some.
Fall, Privacy&Security - Virginia Tech – Computer Science Click to edit Master title style Design Extensions to Google+ CS6204 Privacy and Security.
Web Services Mohamed Fahmy Dr. Sherif Aly Hussein.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
11/16/2012ISC329 Isabelle Bichindaritz1 Web Database Application Development.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
DISTRIBUTED COMPUTING
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
Integrated Collaborative Information Systems Ahmet E. Topcu Advisor: Prof Dr. Geoffrey Fox 1.
1 Advanced Software Architecture Muhammad Bilal Bashir PhD Scholar (Computer Science) Mohammad Ali Jinnah University.
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
OWL Representing Information Using the Web Ontology Language.
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
SOAP-based Web Services Telerik Software Academy Software Quality Assurance.
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
TOPIC 7.0 LINUX SERVICES AND CONFIGURATION. ROOT USER Root user is called “super user” because it has power far beyond those of mortal user. As root,
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
1 Service Oriented Architecture SOA. 2 Service Oriented Architecture (SOA) Definition  SOA is an architecture paradigm that is gaining recently a significant.
IPS Infrastructure Technological Overview of Work Done.
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Software Architecture Patterns (3) Service Oriented & Web Oriented Architecture source: microsoft.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
The Holmes Platform and Applications
Multi-Device UI Development for Task-Continuous Cross-Channel Web Applications Enes Yigitbas, Thomas Kern, Patrick Urban, Stefan Sauer
Outline Introduction and motivation, The architecture of Tycho,
Platform as a Service (PaaS)
TextCrowd – Collaborative semantic enrichment of text-based datasets
Chapter 8 Environments, Alternatives, and Decisions.
Platform as a Service (PaaS)
Platform as a Service (PaaS)
Jason Bury Dylan Drake Rush Corey Watt
Understanding SOAP and REST calls The types of web service requests
WEB SERVICES.
Unit – 5 JAVA Web Services
CUAHSI HIS Sharing hydrologic data
Software Design and Architecture
Notification Service May 19, 2006 Jon Atherton Mark Mara.
University of Technology
Lecture 1: Multi-tier Architecture Overview
Module 01 ETICS Overview ETICS Online Tutorials
Common Solutions to Common Problems
Distributed System using Web Services
Presentation transcript:

WebLicht Application and Workspaces Munich September WebLicht Application and “Workspaces” Erhard Hinrichs & Thomas Zastrow University Tübingen

WebLicht Application and Workspaces Munich September Outline  Web-based Linguistic Chaining Tool (WebLicht) for incremental filtering and access of language corpus data  WebLicht – Motivation  WebLicht - Architecture  WebLicht – Future Requirements  Test Case – Gutenberg Corpus

WebLicht Application and Workspaces Munich September CLARIN Mission CLARIN (Common Language Resource and Technology Infrastructure Network) is committed to establishing an integrated and interoperable RI supporting easy access and use of language aims to overcome the current fragmentation and offer a stable, persistent and extendable infrastructure it will offer its services to researchers and scholars across a wide spectrum of domains in particular in the humanities and soc sciences ESFRI roadmap project; implementation phase starts in 2011

WebLicht Application and Workspaces Munich September Typical CLARIN user scenario  Scenario: A PhD student investigates regional differences in vocabulary and in word collocations in different variants of German.  Data: large text corpora available at BBAW in Berlin, at the Austrian Academy of Science in Vienna, the Swiss Text Corpus Project in Basel, and at EURAC, Bolzano.  Tools for targeted data access: WebLicht offers customizable chains of web services for filtering and analyzing the data

WebLicht Application and Workspaces Munich September WebLicht - Motivation Many linguistic resources (corpora, dictionaries, …) and tools (tokenizer, tagger, parser, …) are available Most of them are implemented to run on local machines. This can be inconvenient and error-prone Requirements: go beyond “do-it-yourself” and “download- first” strategies The CLARIN solution: Make tools and resources available as webservices

WebLicht Application and Workspaces Munich September WebLicht - Architecture  WebLicht is a SOA for accessing and processing text corpora  Development started in October 2008  WebLicht consists of the following components:  Distributed services: offering functionality (resources & tools) over the (inter-)net. Implemented as webservices (ca. 90 at the moment)  Repository: stores metadata and technical information about the services  Web 2.0 based user interface: interacts with the user and combines services and information from the repository. Access still possible via scripts / programming code

WebLicht Application and Workspaces Munich September WebLicht - Architecture Web 2.0 Application for Tool Chaining and Execution Repository Stuttgart Tübingen BerlinLeipzigFinland Standard-conformant Text Corpus Encoding StuttgartTübingenLeipzig RomaniaIceland UK

WebLicht Application and Workspaces Munich September WebLicht – Architecture  Services are implemented as REST style webservices  HTTPs POST method is used to send data from the UI to the services  As client, anything which is able to use the HTTP protocol, can be used:  Browser  Commandline tools (wget, curl)  Programming Languages  Anyone can implement his/her own interface to WebLicht

WebLicht Application and Workspaces Munich September WebLicht - Processing Chains

WebLicht Application and Workspaces Munich September WebLicht - Results

WebLicht Application and Workspaces Munich September WebLicht - Results

WebLicht Application and Workspaces Munich September WebLicht - Features  With RESTstyle webservices, everyone can implement a web service for WebLicht (4pages tutorial)  The SOA infrastructure is independent of programming languages or operating systems  The chaining algorithm is independent of the used dataformat  Form a legal point of view, the web services are still located in the institute where they were created

WebLicht Application and Workspaces Munich September WebLicht – Future Requirements  Web services are synchronous: some linguistic annotation processes are very time consuming  an asynchronous behavior of these service would be desirable  The processing power is limited by local computing resources  Scalability only with strong centers possible  The current architecture is not sufficiently parallelized and therefore does not scale up:  Accommodate a large number of simultaneous users  Parallelization of processes

WebLicht Application and Workspaces Munich September WebLicht – Future Requirements  Currently, users have to store the input data and their results on their local machines  Online storage in the form of personal workspaces with reliable backup solutions  Linguistic tools are typically developed in a variety of heterogeneous software environments and programming languages (Java, Perl, Python, C/C++, Prolog, Lisp, …)  Encapsulation of individual services with common APIs for interoperability  Currently, WebLicht services are limited to processing text corpora  Extending webservices also to spoken language and multi- modal datasets (MPI is already working on this)

WebLicht Application and Workspaces Munich September Test Case: Gutenberg Corpus  On the basis of these structure, a part of the free available Gutenberg Project was annotated in Tübingen  Ca texts from 800 authors  Runtime: ca. 3.5 weeks  Result:  217 million tokens (words), 533 million constituents, 110 GB data

WebLicht Application and Workspaces Munich September Gutenberg Corpus – Analyzing  Fulltext index (Lucene)  Database for the linear part of the data  Tree-like structures can be analyzed with XML based techniques (Xpath, Xquery)  DOM based techniques are slow and performance hungry

WebLicht Application and Workspaces Munich September Links etc.  Clarin Homepage:  The D-Spin homepage:  WebLicht (login via DFN AAI): tuebingen.de/ Erhard Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft Universität Tübingen Wilhelmstr. 19 D Tübingen

WebLicht Application and Workspaces Munich September WebLicht - Combinations