Presentation is loading. Please wait.

Presentation is loading. Please wait.

Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 7, 2008.

Similar presentations


Presentation on theme: "Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 7, 2008."— Presentation transcript:

1 Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 7, 2008

2 Where’s the computing power? Individuals (~1 billion PCs)‏ Companies (~100M PCs)‏ Government (~50M PCs)‏ Volunteer computing

3 A brief history of volunteer computing Projects Platforms 19952005 2000 distributed.net, GIMPS SETI@home, Folding@home Popular Power Entropia United Devices, Parabon BOINC Climateprediction.net Einstein, Rosetta@home IBM World Community Grid

4 The BOINC project Based at UC Berkeley Space Sciences Lab Funded by NSF since 2002 Personnel  director: David Anderson  other employees: 1.5 programmers  lots of volunteers What we do  develop open-source software  enable online communities What we don’t do  branding, hosting, authorizing, endorsing, controlling

5 The BOINC community Projects Volunteer programmers Alpha testers Online Skype-based help Translators (web, client)‏ Documentation (Wiki)‏ Teams

6 The BOINC model Attachments Your PC BOINC-based projects Climateprediction.net Oxford; climate study Rosetta@home U. of Washington; biology MalariaControl.net STI; malaria epidemiology World Community Grid IBM; several applications... Simple (but configurable)‏ Secure Invisible Independent No central authority Unique ID: URL

7 The volunteer computing ecosystem Projects Public Do more science Involve public in science Teach, motivate volunteer

8 Participation and computing power BOINC  330K active participants  580K computers  ~40 projects  1.2 PetaFLOPS average throughput about 3X an IBM Blue Gene L Folding@home (non-BOINC)‏  200K active participants  1.4 PetaFLOPS (mostly PS3)‏

9 Cost per TeraFLOPS-year Cluster: $124K Amazon EC2: $1.75M BOINC: $2K

10 The road to ExaFLOPS CPUs in PCs (desktop, laptop)‏  1 ExaFLOPS = 50M PCs x 80 GFLOPS x 0.25 avail. GPUs  1 ExaFLOPS = 4M x 1 TFLOPS x 0.25 avail. Video-game consoles (PS3, Xbox)‏ .25 ExaFLOPS = 10M x 100GFLOPS x 0.25 avail Mobile devices (cell phone, PDA, iPod, Kindle)‏ .05 ExaFLOPS = 1B x 100MFLOPS x 0.5 avail Home media (cable box, Blu-ray player)‏  0.1 ExaFLOPS = 100M x 1 GFLOPS x 1.0 avail

11 But it’s not about numbers The real goals:  enable new computational science  change the way resources are allocated  avoid return to the Dark Ages And that means we must:  make volunteer computing feasible for all scientists  involve the entire public, not just the geeks  solve the “project discovery” problem Progress towards these goals: nonzero but small

12 BOINC server software Goals  high performance (10M jobs/day)‏  scalability MySQL DB (~1M jobs)‏ scheduler (CGI)‏ Clients feeder shared memory (~1K jobs)‏ Various daemons

13 Database tables Application Platform  Win32, Win64, Linux x86, Java, etc. App version Job  resource usage estimates, bounds  latency bound  input file descriptions Job instance  output file descriptions Account, team, etc.

14 Data model Files  have both logical and physical names  immutable (per physical name)‏  may originate on client or server  may be “sticky”  may be compressed in various ways  transferred via HTTP or BitTorrent  app files must be signed Upload/download directory hierarchies

15 Submitting jobs Create XML description  input, output files  resource usage estimates, bounds  latency bound Put input files into dir hierarchy Call create_work()‏  creates DB record Mass production  bags of tasks  flow-controlled stream of tasks  self-propagating computations  trickle messages

16 Server scheduling policy Request message:  platform(s)‏  description of hardware CPU, memory, disk, coprocessors  description of availability  current jobs queued and in progress  work request (CPU seconds)‏ Send a set of jobs that  are feasible (will fit in memory/disk)‏  will probably get done by deadline  satisfy the work request

17 Application platform Multithread and coprocessor support client scheduler List of platforms, Coprocessors #CPUs jobs avg/max #CPUs, coprocessor usage command line app planning function app versions platform app version job

18 Result validation Problem: can’t trust volunteers  computational result  claimed credit Approaches:  Application-specific checking  Job replication do N copies, require that M of them agree  Adaptive replication  Spot-checking

19 How to compare results? Problem: numerical discrepancies Stable problems: fuzzy comparison Unstable problems  Eliminate discrepancies compiler/flags/libraries  Homogeneous replication send instances only to numerically equivalent hosts (equivalence may depend on app)‏

20 Server scheduling policy revisited Goals (possibly conflicting):  Send retries to fast/reliable hosts  Send long jobs to fast hosts  Send demanding jobs (RAM, disk, etc.) to qualified hosts  Send jobs already committed to a homogeneous redundancy class Project-defined “score” function  scan N jobs, send those with highest scores

21 Server daemons Per application:  work generator  validator  assimilator Transitioner  manages replication, creates job instances  triggers other daemons File deleter DB purger

22 Ways to create a BOINC server Install BOINC on a Linux box  lots of software dependencies Run BOINC server VM (Vmware)‏  need to worry about hardware Run BOINC server VM on Amazon EC2

23 BOINC API Typical application structure: boinc_init()‏ loop... boinc_fraction_done(x)‏ if boinc_time_to_checkpoint()‏ write checkpoint file boinc_checkpoint_completed()‏ boinc_finish(0)‏ Graphics Multi-program apps Wrapper for legacy apps

24 Volunteer’s view 1-click install All platforms Invisible, autonomic Highly configurable (optional)‏

25 BOINC client structure core client application BOINC library GUI screensaver local TCP schedulers, data servers Runtime system user preferences, control

26 Some BOINC projects Climateprediction.net  Oxford University  Global climate modeling Einstein@home  LIGO scientific collaboration  gravitational wave detection SETI@home  U.C. Berkeley  Radio search for E.T.I. and black hole evaporation Leiden Classical  Leiden University  Surface chemistry using classical dynamics

27 More projects LHC@home  CERN  simulator of LHC, collisions QMC@home  Univ. of Muenster  Quantum chemistry Spinhenge@home  Bielefeld Univ.  Study nanoscale magnetism ABC@home  Leiden Univ.  Number theory

28 Biomed-related BOINC projects Rosetta@home  University of Washington  Rosetta: Protein folding, docking, and design Tanpaku  Tokyo Univ. of Science  Protein structure prediction using Brownian dynamics MalariaControl  The Swiss Tropical Institute  Epidemiological simulation

29 More projects Predictor@home  Univ. of Michigan  CHARMM, protein structure prediction SIMAP  Tech. Univ. of Munich  Protein similarity matrix Superlink@Technion  Technion  Genetic linkage analysis using Bayesian networks Quake Catcher Network  Stanford  Distributed seismograph

30 More projects (IBM WCG)‏ Dengue fever drug discovery  U. of Texas, U. of Chicago  Autodock Human Proteome Folding  New York University  Rosetta FightAIDS@home  Scripps Institute  Autodock

31 Organizational models Single-scientist projects: a dead-end? Campus-level meta-project  UC Berkeley: 1,000 instructional PCs 5,000 faculty/staff 30,000 students 400,000 alumni Lattice  U. Maryland Center for Bioinformatics MindModeling.org  ACT-R community (~20 universities)‏ IBM World Community Grid  ~8 applications from various institutions Extremadura (Spain)‏  consortium of 5-10 universities SZTAKI (Hungary)‏

32 Conclusion Individuals (~1 billion PCs)‏ Companies (~100M PCs)‏ Government (~50M PCs)‏ Volunteer computing  Contact me about: Using BOINC Research based on BOINC davea@ssl.berkeley.edu


Download ppt "Volunteer Computing David P. Anderson Space Sciences Lab U.C. Berkeley May 7, 2008."

Similar presentations


Ads by Google