La partecipazione del Gruppo Informatica di Lecce al Progetto EU-US GRID Earth Observation Systems High Energy Physics ASI ESA.

Slides:



Advertisements
Similar presentations
Grid Workshop-Padova 12/2/2000 Giovanni Aloisio Massimo Cafaro Paolo Falabella UNIV. OF LECCE-Italy Roy Williams CACR/CALTECH Carl Kesselman ISI/USC SARA/Digital.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
DILIGENT Digital libraries powered by the Grid Peter Fankhauser
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
High Performance Computing Course Notes Grid Computing.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Ch 4. The Evolution of Analytic Scalability
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
DISTRIBUTED COMPUTING
Database System Concepts and Architecture
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
Chapter 4 Realtime Widely Distributed Instrumention System.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
May http://cern.ch/hep-proj-grid-fabric1 EU DataGrid WP4 Large-Scale Cluster Computing Workshop FNAL, May Olof Bärring, CERN.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
7. Grid Computing Systems and Resource Management
Distributed Computing Systems CSCI 6900/4900. Review Distributed system –A collection of independent computers that appears to its users as a single coherent.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
2. WP9 – Earth Observation Applications ESA DataGrid Review Frascati, 10 June Welcome and introduction (15m) 2.WP9 – Earth Observation Applications.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.
Clouds , Grids and Clusters
Users and Administrators
Joseph JaJa, Mike Smorul, and Sangchul Song
Grid Computing.
GSAF Grid Storage Access Framework
SDM workshop Strawman report History and Progress and Goal.
Ch 4. The Evolution of Analytic Scalability
Users and Administrators
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

La partecipazione del Gruppo Informatica di Lecce al Progetto EU-US GRID Earth Observation Systems High Energy Physics ASI ESA

Intervento trasversale Sezione INFN-Lecce High Energy Physics

ATLAS ALICECMS VIRGO Cookbook Requirements GRID Middleware development

Giovanni Aloisio Massimo Cafaro UNIV. OF LECCE-Italy Roy Williams CACR/CALTECH Carl Kesselman ISI/USC SARA/Digital Puglia A grid enabled remote sensing digital library

An NPACI International Collaboration Advancing Digital Library Technology NPACI Digital Puglia ASI ASI

Five Emerging Models of Networked Parallelism From The Grid Distributed Computing –|| synchronous processing High-Throughput Computing –|| asynchronous processing On-Demand Computing –|| dynamic resources Data-Intensive Computing –|| databases Collaborative Computing –|| scientists

EU/US Workshop on Large Scientific Databases Annapolis-Maryland 8-11 Sept US Paul Messina (DOE/CACR-Caltech) Roy Williams Maria Zemankova EU Giovanni Aloisio (Univ. Lecce) John Darlington (IPC-UK) Fabrizio Gagliardi (CERN) Organized by CACR-Caltech and CERN Supported in part by the National Science Foundation (Grant IIS ) and the European Commission (EU Information Society Technology Programme) Organizing committee

GRID ISSUES  Scalability  Information Modeling  Interoperability  Information flow  Preservation of databases  Education and outreach

Data Base Scalability the quantity of bulk data in the database the geographical separation of the DB components size of the user community the defining limits of applicability of the DB the duration of the DB project complexity and heterogeneity of DBs to be federated Scalability issues must be considered with respect to:

Data Base Size Hierarchical storage systems Distributed storage systems Parallel data delivery Interoperability of “big data” systems Research on:

For data spread around the system, research on: Clustering which data objects should be stored “near” similar objects? Caching which data objects should be on fast storage? Redundancy which datasets should be stored redundantly in different organizational patterns? Indexing how efficient ways to search scientific data can be created? Summarization when should summary data be computed on-demand, and when pre-computed ?

Networking  A crucial requirement for effective GRID EU-US collaboration is trans-Atlantic data communication that provides: - high bandwidth - high availability - low latency  Regional data centers communicate with each other differently fromthe way they communicate with users The most important metric is throughput

 Scheduled streaming as a new paradigm for the analysis of large amount of data DataStreaming Data Streaming  Data architectures oriented to data movement rather than data storage  Shifting from file-oriented to stream-oriented processing  Constructing new kinds of data management components  Alternative structures for data  New roles for metadata

Distributed Databases The data movement generated by queries to the globally-distributed database must be optimized how queries and processing requests can be formulated to streamline this optimization process? how such a query can be split in separate, locally-executed queries, with machine-specific data access? how the cost, in terms of computation, communication, and time, can be estimated before and during execution

Distributed Databases  Load-balancing how computational work and data are spread around GRID?  Replication what should be replicated among the regional centers?  Protocols for - high-speed - parallel I/O - synchronous and asynchronous delivery - real-time steering and control of running jobs

Information modeling What is the nature of the contents of the database and its catalog? How the DBs interoperability can be achieved? Standardization of scientific data objects

Database Interoperability How can information from multiple collections be fused to extract new knowledge? A common infrastructure providing interoperability between European and US scientific databases common interfaces common information model semantic interoperability

Database Interoperability  Federation of collections - wrappers in front of existing collections that transform the information content into a standard representation - wrappers or servers are installed in front of the storage systems that support access through a common API - wrappers tend to be limited to the manipulation of relatively small data sets - wrappers provide an interoperability capability Large scale data manipulation requires the tight integration of data and compute resources

Security and Authentication  Log-in once to access multiple, heterogeneous services  Clear and unambiguous Access and control policies

Information flow How does information move in a complex system? How do users discover the database and its capabilities? How do users initiate and control a complex processing pipeline?

Preservation of databases  How to ensure that digital scientific data is still available, when necessary, many years in the future?  Preservation description information should be associated with digital objects so that: - the chain of custody and processing history available - quality of the data specified - relationships to other digital objects recognized - digital objects unambiguously identified - information content not altered in an undocumented manner

EGrid - The European Grid Forum Redondo Beach- Agosto 1999

...and many more Now, it is time to put things together