Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University.

Similar presentations


Presentation on theme: "Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University."— Presentation transcript:

1 Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University http://plab.cs.northwestern.edu

2 2 Overview RGIS: GIS system based on the relational data model using SQL Complex compositional queries can be posed –“Find me 16 hosts on the same LAN that together have 32 GB of RAM” Can be very expensive to answer –Joins: worst case O(n^m) for m tables of size n Introduce nondeterminism –User gets random sample of result set –Automated query transformation

3 3 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

4 4 RGIS Model of a Grid module endpoint maclink macswitch iplink router host connectorswitch connectorlink Annotated network topology graph Annotation examples –Hosts: memory, disk, OS, NICs, etc. –Router/Switch: backplane bandwidth, ports –Link: latency and bandwidth Highly dynamic data in streams, not DB Virtualization, Futures, Leases –Virtual machines Network Data link Physical Software

5 5 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

6 6 Software Network Data Link Physical Metadata Types Security

7 7

8 8 RGIS Design (Per Site)

9 9 RGIS Design (Intersite) RGIS Server Update Push To Friend Site Update Push To Friend Site Site RGIS server pushes local updates to friend sites Site RGIS server consolidates updates from site and friend sites Site RGIS server answers all queries originating from its site A B C

10 10 Insert/Update/Delete Dual Xeon 1 GHz, 2 GB, 8x36 GB RAID5, Oracle 9i xx

11 11 2,700 lines of authored SQL 4,000 lines of generated PL/SQL 22,000 lines of authored Perl Main dependencies DBI to Oracle 9i SOAP::Lite CGI Not finished yet!

12 12 RGIS Design (Per Site) This talk

13 13 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

14 14 Motivation Queries for compositions of resources easily expressed in SQL: But such queries can be very expensive to execute However, we typically don’t need the entire result set, just some rows, and not always the same ones And we need them in a bounded amount of time “Find 2 hosts with Linux that together have 3 GB of RAM” select h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072

15 15 Why Not Just Limit? Oracle rownum, MySQL limit clause “Return first k rows of result set” Problem: Always get the SAME answer Problem: May STILL take a long time –Results not discovered until near the end Problem: Query time related to DATA as well as k

16 16 Query Approaches All results Scoped results Nondeterministic results (this paper) Approximate results Available in Grid 2003 Paper Return Random Sample of Result Set

17 17 Nondeterministic Version of Query select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds

18 18 Implementing non-deterministic queries select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1 SAMPLE(P), HOSTS H2 SAMPLE(P) WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) Query Manager and Rewriter Random sample of input tables with Selection Probability P determined by time constraint and server load Using Oracle-Specific Extensions

19 19 Implementing non-deterministic queries select nondeterministically h1.insertid, h2.insertid from hosts h1, hosts h2 where h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072 within 2 seconds SELECT H1.INSERTID, H2.INSERTID FROM HOSTS H1, HOSTS H2, INSERTIDS TEMP_H1, INSERTIDS TEMP_H2 WHERE (H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) AND (H1.INSERTID=TEMP_H1.INSERTID AND TEMP_H1.rand > 982663452.975047 AND TEMP_H1.rand 1877769069.94039 AND TEMP_H2.rand <= 1920718742.90039) Query Manager and Rewriter Random sample of input tables with Selection Probability P determined by time constraint and server load Using Our Schema (Not Oracle-Specific) Rest of Talk

20 20 Implementing non-deterministic queries Hostinsertidrandom_number 0N x x+y Random Starting Point y=P*N Reshuffling Requirement

21 21 Deadlines Hard-limiting –Time-limited thread or process forked Climbing –Start with low probability p, issue query, if no results, double probability, try again, keep going until no more time or have results Estimation –Like climbing, but do polynomial estimation over previous runs to estimate if next run will exceed deadline

22 22 Outline Overview Model Implementation Nondeterministic queries Performance evaluation Related work Conclusions D. Lu and P. Dinda, Synthesizing Realistic Computational Grids, SC 2003 D. Lu, J. Skicewicz, and P. Dinda, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

23 23 GridG: Synthesing Realistic Computational Grids http://www.cs.northwestern.edu/~urgis/GridG Generates a Grid as an annotated layer 3 topology –Hosts, routers, links Graph conforms to power laws of Internet topology Annotations include: –memory, clock speed, cpu type, number of CPUs, operating system type, link bandwidths, router bandwidths, etc. –Memory distribution according to Smith study of MDS contents

24 24 Test Grids Grid Size (Hosts)Query 50,000“Find n hosts with 3 GB of memory” 500,000“Find n hosts with 3 GB of memory” 5,000,000“Find n hosts with 3 GB of memory” 10,000“Find 2 close hosts” 50,000“Find 2 close hosts” 100,000“Find 2 close hosts”

25 25 Nondeterministic query performance Meaningful tradeoff between query processing time and result set size is possible Select two hosts that together have >3GB of RAM

26 26 Nondeterministic query performance Can use tradeoff to control query time independent of query complexity Select n hosts that together have >3GB of RAM, holding query time constant

27 27 Deadlines Find 2 hosts with collective 600 GB RAM (VERY RARE) in 50K host grid Max Min

28 28 Extending RGIS to Support Grid Computing On Virtual Machines Virtuals –Each RGIS object has a unique id –Virtualization table associates unique id of virtual resources with unique ids of their constituent physical resources –Virtual nature of resource is hidden unless query explicitly requests it Futures –An RGIS object that does not exist yet –Futures table of unique ids –Future nature of resource hidden unless query explicitly requests it

29 29 Related Work SLP, X.500, LDAP Condor ClassAds MDS R-GMA Redline Random sampling from databases –Olsen, others

30 30 Conclusions GIS system based on relational data model Powerful queries, but expensive to execute Nondeterminism to control query time –Can be implemented without RDMBS support –Automated query translation in RGIS Several techniques to implement deadlines for queries

31 31 People and Acknowledgements Students –Jason Skicewicz, Andrew Weinrich (Web + Soap), Jack Lange (CDN) Collaborator –Relational Grid Resources Project at Indiana Beth Plale http://www.cs.indiana.edu/~plale/projects/RGR Funder –NSF

32 32 For More Information URGIS Site –http://www.cs.northwestern.edu/~urgis Prescience Lab –http://plab.cs.northwestern.eduhttp://plab.cs.northwestern.edu Join The User Comfort Study! http://comfort.cs.northwestern.edu Special Advertising Section


Download ppt "Nondeterministic Queries in a Relational Grid Information Service Peter A. Dinda Dong Lu Prescience Lab Department of Computer Science Northwestern University."

Similar presentations


Ads by Google