Presentation is loading. Please wait.

Presentation is loading. Please wait.

David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC.

Similar presentations


Presentation on theme: "David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC."— Presentation transcript:

1 David P. Anderson Space Sciences Laboratory University of California – Berkeley davea@ssl.berkeley.edu Public Distributed Computing with BOINC

2 Public-resource computing 95 96 97 98 99 00 01 02 03 04 GIMPS, distributed.net SETI@home, folding@home fight*@home climateprediction.net names: public-resource computing peer-to-peer computing (no!) public distributed computing “@home” computing your computers academ ic business home PCs

3 The potential of public computing ● SETI@home: 500,000 CPUs, 65 TeraFLOPs ● 1 billion Internet-connected PCs in 2010, 50% privately owned ● If 100M participate: – ~ 100 PetaFLOPs – ~ 1 Exabyte (10^18) storage public computing Grid computing cluster computing supercomputin g p CPU power, storage capacity cost

4 Public/Grid differences

5 Economics (0 th order) cluster/Grid computingpublic-resource computing resources ($$) resources (free) you Internet ($$) Network (free) $1 buys 1 computer/day or 20 GB data transfer on commercial Internet Suppose processing 1 GB data takes X computer days Cost of processing 1 GB: cluster/Grid: $X PRC: $1/20 So PRC is cheaper if X > 1/20 (SETI@home: X = 1,000)

6 Economics revisited Underutilized free Internet (e.g. Internet2) you commodity Internet... other institutions Bursty, underutilized flat-rate ISP connection Traffic shapers can send at zero priority ==> bandwidth may be free also

7 Why isn't PRC more widely used? ● Lack of platform – jxta, Jabber: not a solution – Java: apps are in C, FORTRAN – commercial platforms: business issues – cosm, XtremWeb: not complete ● Need to make PRC technology easy to use for scientists

8 BOINC: Berkeley Open Infrastructure for Network Computing ● Goals for computing projects – easy/cheap to create and operate projects – wide range of applications possible – no central authority ● Goals for participants – easy to participate in multiple projects – invisible use of disk, CPU, network ● NSF-funded; open source; in beta test – http://boinc.berkeley.edu

9 SETI@home requirements ideal: current: commercial Internet Berkeley participants tapes Internet2 commercial Internet Berkeley Stanford USC participants 50 Mbps 0.3 MB = 8 hrs CPU

10 Climateprediction.net ● Global climate study (Oxford Univ.) ● Input: ~10MB executable, 1MB data ● CPU time: 2-3 months (can't migrate) ● Output per workunit: – 10 MB summary (always upload) – 1 GB detail file (archive on client, may upload) ● Chaotic (incomparable results)

11 Einstein@home (planned) ● Gravity wave detection; LIGO; UW/CalTech ● 30,000 40 MB data sets ● Each data set is analyzed w/ 40,000 different parameter sets; each takes ~6 hrs CPU ● Data distribution: replicated 2TB servers ● Scheduling problem is more complex than “bag of tasks”

12 Intel/UCB Network Study (planned) ● Goal: map/measure the Internet ● Each workunit lasts for 1 day but is active only briefly (pings, UDP) ● Need to control time-of-day when active ● Need to turn off other apps ● Need to measure system load indices (network/CPU/VM)

13 General structure of BOINC ● Project: ● Participant: Scheduling server (C++) BOINC DB (MySQL) Work generation data server (HTTP) App data server (HTTP) Web interfaces (PHP) Core client (C++) Project back end Retry generation Result validation Result processing Garbage collection

14 Project web site features ● Download core client ● Create account ● Edit preferences – General: disk usage, work limits, buffering – Project-specific: allocation, graphics – venues (home/school/work) ● Profiles ● Teams ● Message boards, adaptive FAQs

15 General preferences

16 Project-specific preferences

17 Data architecture ● Files – immutable, replicated – may originate on client or project – may remain resident on client ● Executables are digitally signed ● Upload certificates: prevent DOS arecibo_3392474_jun_23_01 http://ds.ssl.berkeley.edu/a3392474 http://dt.ssl.berkeley.edu/a3392474 uwi7eyufiw8e972h8f9w7 10000000

18 Computation abstractions ● Applications ● Platforms ● Application versions – may involve many files ● Work units: inputs to a computation – soft deadline; CPU/disk/mem estimates ● Results: outputs of a computation

19 Scheduling: pull model scheduling server core client data server request X seconds of work host description result 1... result n download upload...compute...

20 Redundant computing replicator assimilator validator work generator canonical result clients scheduler select canonical result assign credit

21 BOINC core client core client file transfers restartable concurrent user limited program execution semi-sandboxed graphics control checkpoint control % done, CPU time app API app API shared mem

22 User interface screensaver control panel core client control/state RPCs activate screensaver app graphics

23

24 Anonymous platform mechanism ● User compiles applications from source, registers them with core client ● Report platform as “anonymous” to scheduler ● Purposes: – obscure platforms – security-conscious participants – performance tuning of applications

25 Project management tools ● Python scripts for project creation/start/stop ● Remote debugging – collect/store crash info (stack trace) – web-based browsing interface ● Strip charts – record, graph system performance metrics ● Watchdogs – detect system failures; dial pager

26 Conclusion ● Public-resource computing is a distinct paradigm from Grid computing ● PRC has tremendous potential for many applications (computing and storage) ● BOINC: enabling technology for PRC – http://boinc.berkeley.edu


Download ppt "David P. Anderson Space Sciences Laboratory University of California – Berkeley Public Distributed Computing with BOINC."

Similar presentations


Ads by Google