Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Issues in Cooperative Computing Douglas Thain

Similar presentations


Presentation on theme: "Research Issues in Cooperative Computing Douglas Thain"— Presentation transcript:

1 Research Issues in Cooperative Computing Douglas Thain http://www.cse.nd.edu/~ccl

2 Sharing is Hard! Despite decades of research in distributed systems and operating systems, sharing computing resources is still very difficult. Problems get worse as scale increases: –Office –Server Room –Distributed System –Computational Grid

3 Designers Go To Extremes: Peer to Peer Central Control Cooperative Computing

4 How Do We Share Data? Central Storage Archive (NFS, UDC, StorageTank.) P2P File Sharing (WWW, Napster)

5 Things I Can’t Do Today Let members of my project team store and retrieve documents from this disk in my office. –(Where my boss defines “project team”.) I must have 1 TB of space for one whole week, but it must be stored by someone I know. –(Where I give a list of trusted people.) Allow a visitor in my office to use my machine. –(But I want her workspace isolated from mine.) This bioinformatics repository can be written by my grad students, read by all ND faculty, and read by anyone approved by the NSF. –(Where each list comes from a different source.)

6 What is Cooperative Computing? CC means putting owners in charge. –I control who uses my resources. –Need tools for expressing trust. CC means respect for social structures. –Trust is rarely symmetric. –Hierarchy and centralization can be important. –Motivation is usually external to the system. CC means ease of use. –Resource owners need simple and effective tools. –Resource users need to be insulated from failures.

7 Consumption Allocation Accounting Quality of Service Security Debugging Consumption Allocation Accounting Quality of Service Security Debugging Consumption Allocation Accounting Quality of Service Security Debugging Every User Should be a Super-User Allocation Accounting Quality of Service Security Debugging Super- User

8 Vision of Cooperative Storage Make it easy to deploy systems that: –Allow sharing of storage space. –Respect existing human structures. –Provide reasonable space/perf promises. –Work easily and transparently without root. –Make the non-ideal properties manageable: Limited allocation. (select, renew, migrate) Unreliable networks. (useful fallback modes) Changing configuration. (auto. discovery/config)

9 basic filesystem storage server Where can I find 100 GB for 24 hours? Make reservation and access data access control server Is this a member of the CSE dept? Members of the CSE dept can borrow 200 GB for one week. Resource Policy storage catalog status updates Evict user! ? Who is here?

10 Cooperative Storage Pool disk storage server storage server storage server storage server storage server storage server dist. file system backup system dist. computation

11 Cooperative Computing is useful in the office… but it is badly needed on the Grid!

12 CPU PBS batch system CPU Condor Batch System CPU Maui Scheduler job Work Queue gate keeper gate keeper gate keeper On the Grid

13 Grid Computing Experience Ian Foster, et al. (102 authors) The Grid2003 Production Grid: Principles and Practice IEEE HPDC 2004 The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by… ATLAS, CMS, SDSS, LIGO…

14 Grid Computing Experience The good news: –27 sites with 2800 CPUs. –40985 CPU-days provided over 6 months. –10 applications with 1300 simultaneous jobs. The bad news: –40-70 percent utilization. –30 percent of jobs would fail. –90 percent of failures were local problems. The lessons: –Most site failures were due to disk space. –Debugging most problems was impossible.

15 Coop Computing and the Grid The Grid is a boundary case of CC. –Large scale, high performance. –Allocate resources to partially trusted visitors. –Everyone wants to exhaust resources. Can CC scale from the office to the grid? –If it is easy for one person to deploy in an office… then it will be usable enough to work on the grid.

16 More Cooperative Computing Nested Principals & Authentication –Simple question: How to allow a visitor? Distributed Access Control –Can we find something more usable than PKI? Storage Abstractions –Can we do better than files/directories? Data-Intensive Grid Computing –How do I use storage and CPU together? Distributing Debugging –Consider it a distributed query problem.

17 Cooperative Computing Credo: Make computer structures model social structures... Not the other way around!

18 For more information… The Cooperative Computing Lab http://www.cse.nd.edu/~ccl Prof. Douglas Thain dthain@cse.nd.edu

19

20 Two Related Problems Users don’t have direct control. –I need 50 GB of storage for one week. –Allow my collaborators to use my space. (Usually considered administrative tasks.) Users don’t have direct information. –Why was I denied this allocation? –What series of steps was used to run my job? (Usually considered implementation details.)

21 The Current Situation storage server storage server storage server storage server storage server libchirp open close read write chirp tool libchirp GET PUT parrot libchirp % cp % emacs % vi catalog server status updates simple ACL hostname kerberos GSI filesystem

22 Distributed Debugging debugger storage Server storage server storage server batch system kerberos license manager auth gateway cpu archival host log file log file log file log file log file log file log file log file workload manager job

23 Distributed Debugging Big challenges! –Language issues: storing and combining logs. –Ordering: How to reassemble events? –Completeness: Gaps, losses, detail. –Systems: Distributed data collection. But, could be a big win: –“A crashes whenever X gets its creds from Y.” –“Please try again: I have turned up the detail on host B.”

24 Grid Computing -The Vision: Make large-scale computing resources as reliable and as simple as the electric power grid or the water utility. -The Reality: Tie together existing computing clusters and archival storage around the country into systems that are (almost) usable by experts.

25 Storage Allocation –Give me 50 GB for 24 hours –Technical Problem: Building Allocation Distributed Debugging –Correlation –Hypothesis Proposal –Reasoning –System Building –Adaptation

26 disk If I can backup to you, you can backup to me. CPU CSE grads can compute here, but only when I’m not. CPU I need ten more CPUs in order to finish my paper by Friday! CPU May I use your CPUs? auth server Is this person a CSE grad? secure I/O My friends in Italy need to access this data. disk I’m not root! PBs of workstation storage! Can I use this as a cache?

27 Cooperative Computing Credo Put users in charge of their resources. –Share resources as they see fit. –Expose information for debugging. Mode of operation: –Make tools that are foolproof enough for casual use by one or two people in the office. –If they really are foolproof, then they will also be suitable for deployment in large scale systems such as computational grids.


Download ppt "Research Issues in Cooperative Computing Douglas Thain"

Similar presentations


Ads by Google