Www.Grid.org.il Distributed Data Management for Compute Grid Presented by Michael Di Stefano Founder of Author of Meeting: Tuesday, September 13 th, 2005.

Slides:



Advertisements
Similar presentations
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Advertisements

Database Architectures and the Web
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 9 Distributed Systems Architectures Slide 1 1 Chapter 9 Distributed Systems Architectures.
Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.
Distributed Processing, Client/Server, and Clusters
Technical Architectures
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Distributed Systems Architectures
Grid Computing – Issues in Data grids and Solutions Sudhindra Rao.
Relational DatabaseData Grid Oracle Sybase DB2 MySQL Others Integrasoft Avaki Others Data Management Tables Query Language Procedures Locking Indexing.
Chapter 7: Client/Server Computing Business Data Communications, 5e.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Systems Integration IT 490
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
.NET Mobile Application Development Introduction to Mobile and Distributed Applications.
EAI. 2/31 Example: a simple supply chain purchase order deliver goods write invoice order atricle check availability document customer-contact not available.
Distributed Systems: Client/Server Computing
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 12 Slide 1 Distributed Systems Design 1.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
SOA, BPM, BPEL, jBPM.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
PMIT-6102 Advanced Database Systems
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.
Middleware-Based OS Distributed OS Networked OS 1MEIT Application Distributed Operating System Services Application Network OS.
Client Server Technologies Middleware Technologies Ganesh Panchanathan Alex Verstak.
B.Ramamurthy9/19/20151 Operating Systems u Bina Ramamurthy CS421.
Slide 1 Systems Analysis and Design With UML 2.0 An Object-Oriented Approach, Second Edition Chapter 13: Physical Architecture Layer Design Alan Dennis,
9/5/2012ISC329 Isabelle Bichindaritz1 Web Database Environment.
Computing Infrastructure for Large Ecommerce Systems -- based on material written by Jacob Lindeman.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Lecture 22: Client-Server Software Engineering
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
WebMethods Architecture By webMethods_KB. EAI Architecture Concepts Introduction  EAI IT Landscape  Integration Evolution Basic Concepts  Messaging.
Enterprise Integration Patterns CS3300 Fall 2015.
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
MODERN OPERATING SYSTEMS Third Edition ANDREW S. TANENBAUM Chapter 8 Multiple Processor Systems Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall,
CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.
Middleware Technologies
Grid Computing Framework A Java framework for managed modular distributed parallel computing.
70-412: Configuring Advanced Windows Server 2012 services
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
Chapter 14 Advanced Architectural Styles. Objectives Describe the characteristics of a distributed system Explain how middleware supports distributed.
ViaSQL Technical Overview. Viaserv, Inc. 2 ViaSQL Support for S/390 n Originally a VSE product n OS/390 version released in 1999 n Identical features.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
Creating Simple and Parallel Data Loads With DTS.
Managing Data Resources File Organization and databases for business information systems.
Distributed Systems Architectures Chapter 12. Objectives  To explain the advantages and disadvantages of different distributed systems architectures.
Data Services for Service Oriented Architecture in Finance
Netscape Application Server
Grid Computing.
Chapter 9 – RPCs, Messaging & EAI
Database Architectures and the Web
Software Architecture in Practice
#01 Client/Server Computing
Exploring Azure Event Grid
Ch > 28.4.
Chapter 17: Client/Server Computing
#01 Client/Server Computing
Presentation transcript:

Distributed Data Management for Compute Grid Presented by Michael Di Stefano Founder of Author of Meeting: Tuesday, September 13 th, 2005

Slide Agenda Data Management - The Next Grid Problem Evolution in Compute Topology Objectives of Data Management New Topology – New Data Management Techniques New Techniques, New Research, Emergence of Standards

Slide Two Components of The Grid Compute GRID  The Grid Operating System - provides the core services for grid computing –Physical Resource Accounting –Process Task Queues –Management of Task/Resource Execution Data GRID  Data Management System of Grid - Manages all aspects –Enterprise Data –Data Scheduling –Replication –Availability –Legacy Access Compute Grid Data Grid

Slide Compute Grids Roll your own Compute Grid Free Versions of Compute Grids Product and Supported Compute Grids

Slide Data Grids Data Grid Engine - Movement of Bits and Bytes  FTP  Sockets  Middleware (messaging)  Caches Applications Perspective  Multiple Data Characteristics  Quality of Service  Data Management not Bit/Byte Movement

Slide Evolution in Computing MainframeMiniClient/Server

Slide Years of Distributed Computing Evolution Sockets CORBA Messaging Internet Application Servers Tight Bindings Loose Coupling Publish / Subscribe Grid Topology Emerging from the “Evolutionary Mist” Client/Server © Integrasoft, L.L.C. 2005

Slide Evolution Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide The Grid Topology Client / Server Compute Grid Physical Operational Operating System Physical CPU Peripherals Execution Threads Operating System Physical Nodes Resource/Node Management Inventory of Work/Tasks Resource Inventory Matching of Task to Recourse Close Proximity (Mother Board) Diverse CPU Families Diverse Geography Diverse Network Bandwidth

Slide Application on the Grid Multiple Data Sources and Destinations  Client Information  Portfolio Information  Market Data Quality of Service Levels  Application in its entirety  Application components  Speed of Access  Query  Updates (Transactional, Optimistic)

Slide How QoS is Delivered Today Relational Databases  SQL Query  Transactional Updates  Stored Procedures Middleware Queuing  Various delivery modes  Publish and Subscribe  Easy Programmatic API Other  Object Databases  Object Relational Data flow and movement is optimized. Designed to meet Application QoS For Client/Server Topology

Slide Application Today in Client/Server Threads RAM Connection Pools Tailored Middleware Business Applicatio n Server Machine

Slide What Happens in a Grid Business Applicatio n Server Machine Compute Grid

Slide The Data Access Funnel Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Data Grid Eliminates the Funnel Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Goals of a Data Management in Grid The Big 3 Goals of Data Management in Grid  Optimize Data Affinity –Minimize Data Movement –Optimize the recourse of the Network  Maintain Business Application QoS for Data Management  Integrate Legacy Systems into the Grid

Slide How do Achieve Goals of the Data Grid What the Architect/Developer must Address  How many copies or “Replicas” of data are needed in the Data Grid?  How fine is the granularity of my “Data Atoms” to be replicated?  How do best to “Distribute” Data Atoms across the Data Grid?  What level of “Synchronization” is required?  How do “logically group” data along business lines?  How to “Integrate” and “Operate” legacy data sources?  How to manage “Events” in the Data Grid?  Synchronization of data sources external to the Data Grid?

Slide Data Management in Grid Granularity of Data Atoms Replication Distribution Logical Data Groupings (Data Regions) Synchronization  InterRegion  IntraRegion  External Data Sources Events Integration with Legacy Systems Nothing to do with mechanics of the bits and bytes These are Data Management Issues

Slide Data Management is NOT Caching Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005 Moves the bits and bytes -Cache -Grid FTP -Others Data Management to deliver Business Application’s QoS given the “compute topology”

Slide Engines of a Data Grid Cache  Java based engines such as JCache, Java Spaces, …  Various C++ Caches  Recycled Object Data Base Technology FTP  Grid FTP Meta Data Services File Systems  NFS Distributed File Systems

Slide Right Tool for the Job Business Applications have specific QoS levels from the Data Grid Complex Analysis of Large Data Sets Dependency of small fast moving data sets Large Static Data Sets …….

Slide Business Drivers Fueling Grid

Slide Business Drivers Fueling Grid Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Limited Patience of Business

Slide No Data Management Tools Difficult Custom Code Long Time to Delivery No Reuse Business Prospective Increased Complexity Improved Performance Financial ROI Grid fails Wide Spread Acceptance

Slide Business Prospective Financial ROI With Data Management for Grid Easy to use/understand Reuse Effort on business Increased Complexity Improved Performance Fast Time to Market Ease of Migration to Grid Changes Data Centers

Slide Data Management in Grid Granularity of Data Atoms Replication Distribution Data Regions Synchronization Integration with Legacy Systems If Distributed Data Management is not addressed, wide acceptance of Grid will fail.

Slide Measuring QoS to Determine Data Grid Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Measuring QoS to Determine Data Grid Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005 Application QoS( Work(), Data(), Time(), Geography() Query() ) Where: Work( batch/atomic, sync/async ) Data( overall size, atomic size, transient, query ) Time( RealTime, Non-RealTime, Near-RealTime ) Geography( Topology, Bandwidth ) Query( Basic, Complex )

Slide Objective of Data Grid - Data Affinity Low cost of CPU Data size is determined by application Network bandwidth is limited Data and Work need to be co-located Virtual Centrally Managed Data Base Physically Distributed

Slide How to Achieve Data Affinity Locate data and work close together to minimize data movement across the network  Reactive : Data Grid distributes data in anticipation of where work will be assigned. Distributed Data Management policies of Regionalization Replication Distribution Synchronization  Proactive : Routing of Task to Data. Compute Grid Task Scheduler queries Data Locality Information from Data Grid

Slide Distributed Data Management Data Regions Replication Distribution Synchronization Load and Store Event

Slide Distributed Data Management Policies Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Advanced Topics in Distributed Data Management Natural Attraction Forces of Data Bodies Within a Data Grid To Describe Efficient Data Distribution Patterns White Paper Michael Di Stefano September 2004 Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Advanced Topics in Distributed Data Management Natural Attraction Forces of Data Bodies Within a Data Grid To Describe Efficient Data Distribution Patterns White Paper Michael Di Stefano September 2004 Distributed Data Management for Grid Computing Copyright John Wiley and Sons 2005

Slide Purchasing Information Please Visit To Purchase your copy of “Distributed Data Management for Grid Computing” To receive a 15% discount.