Facilitating access to the scientific data service with the use of the Data Management System Cezary Mazurek

Slides:



Advertisements
Similar presentations
POZNAN SUPERCOMPUTING AND NETWORKING CENTER Poznan Supercomputing and Networking Center Portals and Content Cezary Mazurek, Andrzej.
Advertisements

PIONIER 2003, Poznan, , PROGRESS Grid Access Environment for SUN Computing Cluster Poznań Supercomputing and Networking Center Cezary Mazurek.
Digital Object Lifecycle in dLibra Digital Library Framework Cezary Mazurek, Marcin Werla
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Welcome to Middleware Joseph Amrithraj
Database System Concepts and Architecture
Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
Global Grid Access Cezary Mazurek, PSNC. Cezary Mazurek, PSNC, Enable access to global grid, Supercomputing 2003, Phoenix, AZ 2 Agenda Introduction PROGRESS.
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
PROGRESS: ICWS'2003 Web Services Communication within the PROGRESS Grid-Portal Environment Michał Kosiedowski.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Presentation Outline  Project Aims  Introduction of Digital Video Library  Introduction of Our Work  Considerations and Approach  Design and Implementation.
Network+ Guide to Networks, Fourth Edition Chapter 10 Netware-Based Networking.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Digital Library Architecture and Technology
Chapter 9 Elements of Systems Design
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Using the SAS® Information Delivery Portal
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Integration of the Biological Databases into Grid-Portal Environments Michal Kosiedowski, Michal Malecki, Cezary Mazurek, Pawel Spychala, Marcin Wolski.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
I Copyright © 2004, Oracle. All rights reserved. Introduction Copyright © 2004, Oracle. All rights reserved.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
PROGRESS – Computing Portal and Data Management in the Cluster of SUNs Michał Kosiedowski Sun HPC Consortium Heidelberg 2003.
Przemysław Rek, Mirosław Kopeć, Zofia Gdaniec, Łukasz Popenda, Ryszard W. Adamiak, Marcin Wolski, Marcin Lawenda, Norbert Meyer, Maciej Stroiński.
Information Grid Services in the Polish Optical Internet PIONIER Cezary Mazurek, Maciej Stroiński, Jan Węglarz.
A Model of the Environment for Flexible Access to Complex Distributed Applications Michal Kosiedowski
GLOBAL GRID FORUM 10 Workflows in PROGRESS and GridLab environments Michał Kosiedowski.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
Platform Architecture and Features for Thin-Client Application and Tool Delivery and Integration in Internet Based Distance Education Samuel Conn, Asst.
Application portlets within the PROGRESS HPC Portal Michał Kosiedowski
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
9 Systems Analysis and Design in a Changing World, Fourth Edition.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Terena conference, June 2004, Rhodes, Greece Norbert Meyer The effective integration of scientific instruments in the Grid.
DSpace - Digital Library Software
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.
SUPERCOMPUTING 2002, Baltimore, , SUN „Grid Day” PROGRESS Access environment to computational services performed by cluster of SUNs Poznań Supercomputing.
Distributed digital libraries infrastructure in Poland Adam Dudczak, Cezary Mazurek, Marcin Werla EDLocal Kick-off meeting, London, UK,
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
PEPC 2003, Geneva, , PROGRESS Computing Portal Poznań Supercomputing and Networking Center (PSNC) Poland Poland Cezary Mazurek.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
The overview How the open market works. Players and Bodies  The main players are –The component supplier  Document  Binary –The authorized supplier.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
OGSA-DAI.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
9 Systems Analysis and Design in a Changing World, Fifth Edition.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
CS 501: Software Engineering Fall 1999
GSAF Grid Storage Access Framework
Presentation transcript:

Facilitating access to the scientific data service with the use of the Data Management System Cezary Mazurek

Agenda Introduction Data management issues Data Management System – functionality and architecture Accessing SRS resources Conclusions

R&D Center PSNC was established in 1993 and is an R&D Center in: –New Generation Networks POZMAN and PIONIER networks 6-NET, ATRIUM, Muppet, –HPC and Grids GRIDLAB, CROSSGRID, VLAB, PROGRESS projects, Clusterix, HPCEuropa –Portals and Content Management Tools Polish Educational Portal Multimedia City Guide, Digital Library Framework, Interactive TV

SUN Center of Excellence PSNC became the Sun CoE in New Generation Networks, Grids and Portals in November 2002

PROGRESS (1) Project Partners –SUN Microsystems Poland –PSNC IBCh Poznań –Cyfronet AMM, Kraków –Technical University Łódź Co-funded by The State Committee for Scientific Research (KBN) and SUN Microsystems Poland

PROGRESS (2) Deployment phase (2004) –Grid constructors –Computational applications developers –Computing portals operators Enabling access to global grid through deployment of PROGRESS open source packages

PROGRESS (3) Cluster of 80 processors Networked Storage of 1,3 TB Software: ORACLE, HPC Cluster Tools, Sun ONE, Sun Grid Engine, Globus Wrocław Gdańsk

Polish Optical Internet PIONIER

Data Management Issues Hiding the data management complexity from the end user Ability to use new standards defined by grid organizations Cooperation with the different kinds of applications Providing seamless access to data and information for grid computing Enabling intuitive and efficient method for resource exploration Facilitating interface to data management for administrators and scientists

PROGRESS

PROGRESS Communication HPC Portal Grid Service Provider Data Management System Grid Resource Broker saveJob() getApplications() getTemplates() saveTaskOfJob() saveStdOfTask() submitJob() getUserJobs() getJobStatus() listUserDirectory() addUserFile() getUserFileLocation() submitJob() changeTaskStatus()

Web Services and Progress PORTLETS GRID SERVICE PROVIDER DATA MANAGEMENT GRID RESOURCE BROKER WS

Data Management System A distributed system enabling the management of grid data files Storing files in distributed storage modules of various types: generic filesystems, archivers, relational databases Uses metadata to describe files Allows access to data banks like a mirror of Sequence Retrieval System Exposes its functionality within the Data Broker Service

DMS Functionality Virtual file system keeping the data organized in a tree structure. –Metadirectories - hierarchize other objects –Metafiles - represent a logical view of computational data regardless of their physical storage’s location. DMS provides its services in a form of Web Services API to the front-end applications. DMS is a middleware system, belonging to the collective layer as well as the resource layer (Data Container), according to the grid services view.

DMS Functionality Web Services interface with storing, access, describing and delivery of data. –directory mgmt.: e.g. add, remove and rename directories, retrieve root and current path, change path, –file mgmt.: e.g. add, remove and rename files, add, remove and retrieve physical file location, –metadata mgmt.: e.g. retrieve list of schemes and attributes, assign schemes to files and edit values –external datasource mgmt.: e.g. databanks content retrieving, entry resolving, databanks exploring

DMS Architecture

Data Broker Serves as an interface (Web Services) for external clients, such as the HPC Portal and the grid resource broker Mediates in the flow of all requests directed to the DMS. Authorizes the client that submitted the request Data Broker is distributed in the data management environment

Metadata Repository Central and single point of metadata management Responsible for all metadata operations and their storage and maintenance. It stores the following sorts of information: –metadata about resources: data files, its physical localization and possible way to access them, –metadata about rights: all information related to the rights – users, their groups, access rights. –metadata describing the standards of file description, e.g. Dublin Core (DC) –metadata about services: data brokers, data containers

Data Container Enables access to physical data Data is arranged in Data Containers and can be stored on various media types Data can be organized as files on generic filesystems, BLOBs in databases or files on data tapes Each Container possesses a uniform interface regardles of media types which they manage Container do not perform file transfers but it uses the external services and demons, like FTP, HTTPS, GASS, GRIDftp

Proxy (SRS Container) Enables access to external scientific databases Includes both Repository (listing entries, retrieving attached metadata, building queries) and Data Container (downloading files) functionality DMS treats the Proxy as a separate, independent module, that manages read-only data Within the PROGRESS grid-portal environment the Proxy (named SRS Container) enables access to SRS resources

Administrative Portal Web application letting user handle DMS through the web browser An intuitive interface allowing to execute superset of DMS services Basic and extended interface (regarding user privileges) An effective way to explore huge SRS resources Online, sensitive help

SRS Sequence Retrieval System Platform for biological databases integration Delivers uniform data querying interface for resources retrieval Integration of application performing computational tasks on data stored in SRS resources

SRS Resources in PSNC Genbank Release (about 32 mln of entries) Updates (about 2 mlns of entries) EMBL - European Molecular Biology Laboratory Release (about 42 mln of entries) Updates (about 2 mln of entries) PDB – Protein Data Bank Swissprot Swissprot Releas, Swissprot New, SPTREMBL, REMTREMBL

SRS Installation Installation uses different storage recources Data access interface delivered via common portal (srs.man.poznan.pl) Administrative tasks (retrieval and data preparation) splited onto different machines Parallel data retrieving from remote resources Offline data indexing and packing on computational machine (0.5Tb storage) Compressed online data (2*250Gb storage)

SRS Installation - Schema storage 02 SRS srs.man.poznan.pl offline online indexing offindex flatfiles storage 01 flatfilesindex viola.man.poznan.pl bellis-e.man.poznan.pl

SRS Container Using shell-based access to the SRS –Operations commited to execute using SRS mechanisms are send via shell command Access interface based on Web Services –Internal functionality delivered using SOAP communication Data access - ftp, gsiftp, gass protocols –Data are accessed with using external file servers integrated with SRS module Advanced caching system –Databanks and entries are cached and reused in next user requests

Portal Interface – databanks list

Portal Interface – databank content

Portal Interface - searching

Portal Interface – search results

Portal Interface – copying entries

Portal Interface – file properties

DMS Installation Requirements Java virtual machine, recommended Java(TM) 2 Runtime Environment, Standard Edition or higher. Database server. DMS is ready to cooperate with Oracle and PostgreSQL engine: –Oracle - Oracle8i or higher recommended –PostgreSQL - version 7.3 or higher is required with the additional extends: chkpass and tablefunc from contrib package plpqsql support

Conclusions SRS resources have been integrated with the distributed file structure of DMS A web interface enhances the efficiency of the SRS resources exploration: –fast copying an interesting entries directly to the users’ home directory –merging files –saving files in the different format (e.g. Fasta) The universal access layer to the to the scientific databases may by successfully used to connect other data sources to the Data Management System (e.g. digital libraries).

In Closing Check for more information about DMS Download it now: Mail DMS team: