Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.

Similar presentations


Presentation on theme: "Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud."— Presentation transcript:

1 Presenter: Dipesh Gautam

2  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud  Convergence of Grid and Cloud  Vertical RDBMS  Benefits of column-oriented layout 2

3  Data Grid: an architecture or set of services that enable individual or group of users ability to access and transact large amounts of geographically distributed data.  The data may be replicated throughout the grid outside the original administrative domain of the data.  The integration between users and the data are handled and controlled by the data grid middleware. 3

4  Large dataset size  Geographic distribution of users and resources  Computationally intensive analysis  No other architecture exists that allows us to apply technologies in large scale application domains 4

5 5

6  Mechanism Neutrality ◦ Designed to be as independent as possible of low level mechanisms ◦ Defining interfaces that sum up oddness of specific storage systems.  Compatibility with Grid Infrastructure ◦ Take advantage of fundamental Grid infrastructure ◦ Compatible with lower level Grid mechanisms  Uniformity of Information Infrastructure ◦ The same data model and interface used to access the grids metadata  6

7  Middleware provides following services: ◦ Universal namespace ◦ Data transport service ◦ Data access service ◦ Data replication service ◦ Resource management system(RMS) 7

8  Number of systems and networks are connected within a grid  Different file naming conventions of separate systems within grid  Physical file names merely do not address the problem locating the data.  Universal namespace provides logical file names  Storage Resource Broker provides service to map between logical and physical file names  Upon requesting logical file names, all matching physical file names are returned and the end user chose appropriate replica 8

9  Middleware service for data transfer  The atomicity of the requested data transfer ensures the fault tolerant service ◦ Data transfer is resumed after each interruption until all requested data is receive ◦ Many possible strategies:  Starting the entire transmission from the beginning  Resuming from the point of interruption. E.g: GridFTP sends data from the last acknowledged byte without starting the entire transfer from the beginning.  Provides service for low-level access and connection between hosts for file transfer  Provides I/O functions that allow user to see remote files as if they were local to their system  Provides high level abstraction of the access and transfer of data between different systems hiding the complexity and presenting user as a unified data source 9

10  Work with data transport service to provide security, access control and management of data transfer within the grid  Provides security service to authenticate users  Provides authorization service to control access by simple file permission to Access Control Lists (ACLs), Role-Based Access control  Provides encryption service to protect the confidentiality of the data transport (e.g SSL ) 10

11  Why replication? ◦ Scalability ◦ Fast access ◦ User collaboration  Replicas are often placed close to the sites where users need them  Replication is controlled by a replica management system  Replica management system determines the needs of replicas based on the requests  Timely update of the replica is performed by propagating the changes in some node to all the nodes in the grid 11

12  Centralized model: single master replica updates all others  Decentralized model: all peers update each other  The topology of node placement influence update strategy 12

13  Static replication ◦ Uses a fixed replica set of nodes with no dynamic changes to the files being replicated  Dynamic replication ◦ based on popularity of data ◦ If request exceeds the replication threshold, the replica is placed on the server that directly services the client provided that the storage is available ◦ Dynamic deletion of replicas that have null access value  Adaptive replication ◦ The dynamic threshold is computed based on request arrival rates from clients over a period of time ◦ The replicas with lower threshold and were not created in the current replication interval can be removed  Fair-share replication ◦ Based on access load and storage load of candidate servers ◦ Server with less access load is selected for replication as the replicated in server with more access load degrades the performance for all clients ◦ Among the candidate servers with same access load, server with less storage load is selected  Lot more replication placement strategy exists 13

14  Core functionality of data grid  Manages all the actions related to storage resources  Fulfils user and application requests for data resources based on type of request and policies  Schedules creation of replicas  Enforces policy and security within the data grid resources by including authentication, authorization and access support systems with different administrative policies to inter-operate  Enforces system fault tolerance and stability requirements 14

15  Various topologies have been used to address need of the scientific community  Four major types of topologies ◦ Federation topology ◦ Monadic topology ◦ Hierarchical topology ◦ Hybrid topology 15

16  Allows each institution control over their data  The institution who receives request from authorized institution determines whether to send data to the requesting institution  The federation could be loosely or tightly integrated  Preferred by the institutions that wish to share data from already existing systems 16

17  All the collected data is fed into a central repository  Central repository responds to all queries for data  No replicas in the topology  This topology is well suited when all access to the data is local or within a single region with high speed connectivity 17

18  Suited for collaborating data from single source to distributed multiple locations around the world 18

19  Any combination of other topologies  Suited for researches working on projects want to share their results to further research by making it readily available for collaboration 19

20  Grid ◦ Grid refers for distributed computing in science and engineering ◦ In grid computing, virtual organizations share computer resources over a network ◦ Scientific research, collaboration ◦ Share local resources ◦ Heterogeneous, real resource ◦ Geographically distributed, locally owned and managed Cloud – Cloud refers for a computer network in the context of network management – In cloud computing anybody can access data and compute services over the internet – Web services, business apps – Make huge data centers available – Homogeneous virtualized resources – Geographically distributed, centrally owned and managed 20

21  Interoperability standards among the service providers of both grid and cloud should be considered by the user  Interoperating cloud looks like grid 21

22  Column-Oriented DBMS ◦ Store data column wise instead of row wise ◦ In row oriented DBMS the values on the rows are serialized and stored in memory as: 1, Smith, Joe, 40000; 2, Jones, Mary, 50000; 3, Johnson, Cathy, 44000; ◦ In column oriented DBMS the columns are serialized as: ◦ 1, 2, 3; Smith, Jones, Johnson; Joe, Mary, Cathy; 40000, 50000, 44000; EmpIdLastnameFirstnameSalary 1SmithJoe40000 2JonesMary50000 3JohnsonCathy44000 22

23  Efficient when aggregate needs to be computed over many rows but only for notably smaller subset of columns  Efficient in writing a column when new values of column for all rows are supplied at once  Suite for Online Analytical Processing(OLAP) like workloads which involve a smaller number of highly complex queries over all data of terabyte size. 23

24  http://en.wikipedia.org/wiki/Data_grid http://en.wikipedia.org/wiki/Data_grid  http://www.globus.org/toolkit/about.html http://www.globus.org/toolkit/about.html  Martin Antony Walker, Grids and Clouds, http://www.ogf.org/OGF25/materials/1500/ Grids+and+Clouds+OGF25+MAW.pdf  http://staff.science.uva.nl/~adam/courses/2 004/documents/Course-DataGrid.ppt http://staff.science.uva.nl/~adam/courses/2 004/documents/Course-DataGrid.ppt  http://en.wikipedia.org/wiki/Column- oriented_DBMS http://en.wikipedia.org/wiki/Column- oriented_DBMS 24


Download ppt "Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud."

Similar presentations


Ads by Google