Presentation on theme: "23 July 2002 Performance Comparison of Grid Information Services Beth Plale Computer Science Dept. Indiana University Unified Relational GIS Project Collaborative."— Presentation transcript:
23 July 2002 Performance Comparison of Grid Information Services Beth Plale Computer Science Dept. Indiana University Unified Relational GIS Project Collaborative project with Peter Dinda, Northwestern University
23 July 2002 Schemas in performance evaluation influenced by “Key Concepts and Services of a Grid Information Service”, Beth Plale, Peter Dinda, Gregor von Laszewski, IASTED Parallel and Distributed Computing Systems (PDCS), September 2002
23 July 2002 Types of Resource Information Grid EntityDescription OrganizationsAccountable bodies and owners of resources PeopleResource admins, resource providers, GIS admins Physical resourcesCompute resources, network interfaces, benchmark results, number of users, load ServicesJob manager, load leveler, other GIS’ Comm resourcesLink capacity, switch capacity, error rate, drop rate Software packagesBLAS, LAPACK, etc. Event producersGenerators of event streams Event channelsEvent stream propagation vehicle Event dictionariesList of commonly used event types InstrumentsRadar systems, telescopes, etc. Network pathsAvailable bandwidth and expected latency Network topologiesHosts, switches, routers Wireless devicesWireless hosts, wavepoints, cells, etc. Virtual organizationsGroups of collaborators
23 July 2002 Criteria for Inclusion in GIS Defn: object in repository represents entity in real-world grid Grid entity has representation in GIS repository if grid entity: –can be described –has value to more than one application –has persistency needs beyond single application run
23 July 2002 Services Provided by GIS Query interface: request for information through query language –e.g., SELECT … FROM … WHERE in SQL Update interface: request to add/update information in repository –e.g., UPDATE … in SQL Management interface: activation, deactivation of service
23 July 2002 Additional GIS Functionality Replication –Provision of replica transparency Distribution (a grid-driven necessity) –Partitioning of information across sites. Security interface –Object level or column level? –Access control
23 July 2002 XML db GCE testbed portal Xpath query Xpath query XML doc SQL query LDAP query mySQL LDAP 1. 3. 2. 4. GCE testbed XML schema converter XML doc Xpath query View of GIS service Interoperability Xindice
23 July 2002 Benchmark Evaluation of Alternate GIS Representations Evaluation of three databases: relational (mySQL), LDAP (openLDAP), and XML (Xindice) Database schemas: derived from single ER diagram and based partly on GLUE v8 Benchmark: set of query and update use cases derived from Grid job submission. Cost metric: minimized query response times, minimized update times, and minimized size of resulting query set.
23 July 2002 Benchmark Evaluation Assumptions Grid entities have complex relationships. The questions asked of GIS data are becoming more complex. Some entities require extremely rapid update rates. Thus a cost metric that considers multiple aspects: –Minimized query response times, –Minimized update times, and –Minimized size of resulting query set.
23 July 2002 E-R diagram input schemas represent as relat- ional (mySQL) LDAP (open LDAP) transform into schema for start GLUE v8 GCE job submission use cases evaluate against GCE XML XML (Xindice) Grid GIS Benchmark Use Cases Benchmark Evaluation populate by scripts and existing data
23 July 2002 Object classes Classes w/ instances Object instances 3010242 Object classes Classes w/ instances Object instances 311917531 Object classes Classes w/ instances Object instances 195106 Set I: 05-’02, large multi-site projectSet II: 01-’02, large academic HPC site Set III: 11-’00, DOE site Top 5 classes -- MDSDevice -- HostInfo -- MDSDeviceGroup -- top -- MDSSoftware 36.5 % 24.5 13.5 8.5 7.0 ------- 90.0 % Top 5 classes -- Globus Queue -- GlobusServicesJobMgr -- GlobusNetworkInterface -- GlobusPhysicalResource -- GlobusDaemon Top 5 classes -- GlobusFileInstance -- GlobusQueueEntry -- GlobusQueue -- GlobusOrganization -- GlobusServiceJobManager 80.0 % 6.5 3.2 1.8 ------- 94.5 % 42.0 % 26.0 17.5 8.0 6.0 ------- 100.0 %
23 July 2002 network cards end points hosts (compute nodes) application sources subclusters clusters computing elements end-to-end connections has use network paths instan from traceroute packet loss, latency.roundtripDelay.ping, bandwidth.avail.TCP.singleStream host, port, protocol E-R Diagram run on has is-a nodes applications network nodes has is-a network benchmarks has user accounts users has GLUE v8
23 July 2002 network nodes Hierarchical representation subclusters endpoints clusters hosts (compute nodes) EDTtop compute elements application sources connections user accounts network path
23 July 2002 Benchmark: set of Use Cases of GIS query and update Use cases based on job submission. –examples drawn from HotPage (M. Thomas) Query 1: Suppose user is part of NPACI organization and knows his/her binary runs better on T3E. –“Of machines in NPACI organization, give me list of T3Es and their location for which availability is good, a binary is resident, and I have an account.”
23 July 2002 SELECT C.CPUmodel, C.name, C.location FROM Cluster as C, SubCluster as SC, Host as H, Application as A, UserAccount as UA, User as U WHERE C.Organization = “NPACI” and SC.OwningCluster = C.ClusterName and SC.CPUModel = “T3E” and A.OSName = SC.OSName and A.Owner = “Jane Lee” and A.Location = C.Location For All H where H.OwningCluster = C.ClusterName avg(H.SMPLoad1minX100 < 0.50) C.ClusterUniqueID = UA.ID and UA.ID = U.ID and U.Name = “Jane Lee” and UA.ExpireDate > 21-July-2002 and UA.ActivateDate <= 21-July-2002 Cluster is NPACI and user has binary on machine Availability is good User has valid account on cluster Return machines and locations -> GLUEv8
23 July 2002 “Of machines in NPACI organization, give me list of T3Es and their location for which availability is good, a binary is resident, and I have an account.” “availability is good” could be defined different: -- Defined here as ‘average load over all nodes in a SMP is less than.50’. -- More difficult is ‘existence of 20 contiguous nodes.’ ‘Binary is resident’ is fairly easy, ‘binary is nearby’ is a harder question to answer. “Show histographic usage of my job or show historical usage of machine X for task Y where Y is job submission or transfer rate to HPSS”
Your consent to our cookies if you continue to use this website.