Presentation is loading. Please wait.

Presentation is loading. Please wait.

David P. Anderson Space Sciences Laboratory University of California – Berkeley Volunteer Computing.

Similar presentations


Presentation on theme: "David P. Anderson Space Sciences Laboratory University of California – Berkeley Volunteer Computing."— Presentation transcript:

1 David P. Anderson Space Sciences Laboratory University of California – Berkeley davea@ssl.berkeley.edu Volunteer Computing

2 Outline ● Volunteer computing ● BOINC: an OS for volunteer computing ● Applications ● Challenges and research directions

3 Where's the power? ● 2010: 1 billion Internet-connected PCs, 55% privately owned ● If 100M people participate: – 100 PetaFLOPs, 1 Exabyte (10^18) storage ● Consumer products drive technology – GPUs (NVIDIA, Sony Cell) your computers academic business home PCs

4 Volunteer computing history 95 96 97 98 99 00 01 02 03 04 05 GIMPS, distributed.net SETI@home, folding@home fight*@home climateprediction.net volunteer computing public [resource] computing Internet computing screensaver computing global computing @home computing peer-to-peer computing Grid computing BOINC Einstein@home

5 Volunteer/Grid differences

6 Save money! So volunteer computing is cheaper if X > 1/20 (SETI@home: X = 1,000) cluster/Gridvolunteer ---------------- -------- computing:$1 per CPU/dayfree network:free$1 per 20 GB cost per GB:$X$1/20 Suppose processing 1 GB of data takes X computer days

7 Educational discount Internet2 (free, underutilized) UCB commodity Internet ($$) UCLA UIUC partner institutions Underutilized flat-rate ISP connections... so bandwidth may be effectively free also

8 Infrastructure software ● Roll your own ● XtremWeb, cosm – not complete/robust ● United Devices, Entropia – not free ● Grid (Globus/Condor), jxta – solve a different problem ● BOINC (Berkeley Open Infrastructure for Network Computing) – http://boinc.berkeley.edu

9 Projects and participants SETIphysics Climate biomedical Joe Alice Jens diversity, autonomy heterogeneity allocation, trust

10 Encourage participation in >1 project ● Better long-term resource utilization – project A works while project B thinks ● Better short-term resource utilization – communicate/compute in parallel – match applications to resources project computing needs think work think work time

11 Creating a BOINC project ● Install BOINC server software on Unix box ● Adapt or develop application – compile for various platforms ● Write scripts/programs to: – generate tasks – validate results – handle results ● Develop web site ● Get media coverage

12 Structure of a BOINC project Scheduling server (C++) BOINC DB (MySQL) Work generation data server (HTTP) Web interfaces (PHP) Retry generation Result validation Result processing Garbage collection Ongoing tasks: - monitor server correctness - monitor server performance - develop and maintain applications

13 Redundant computing ● Addresses hardware errors, hackers ● Issue 2 or more copies of each task – don't send to same host or user – timed retry up to a limit ● Result comparison approaches – Application-specific “fuzzy comparison” – Homogeneous redundancy ● send copies only to numerically equivalent hosts – Develop platform-independent app

14 What do participants want? ● Incentives – contribute to science – get acknowledgement – community – screensaver graphics ● Invisibility, control of resource usage ● Involvement – translation, porting etc.

15 Credit accounting ● Credit is granted for – computation (CPU time x benchmark) – storage – network communication ● Cheat-resistance ● Accounting – user, host, team ● Credit DB export for 3rd-party web sites – cross-project identification

16 Participating ● Select project(s) ● Create account(s) ● Download/install BOINC client software ● Interact via web: – preferences – leaderboards – profile – teams – message boards, dynamic FAQ

17

18 Anonymous platform mechanism ● Participant compiles software from source ● Scheduler RPC: platform is “anonymous” ● Purposes: – support obscure platforms – security-conscious participants – performance tuning of applications

19 Client structure App Core client screensaver BOINC Manager servers

20 Applications ● Computation model – Workunits, results – Deadlines, resource estimates ● Data model – files, file references ● Mostly existing apps (FORTRAN, C) ● Categories – Physical simulation – Data processing – Distribution for its own sake

21 SETI@home ● Analysis of radio telescope data from Arecibo – SETI: search for narrowband signals – Astropulse: search for short broadband signals ● 0.3 MB in, ~4 CPU hours, 10 KB out ● Enhancements under BOINC: – data archival on clients – direct data distribution from observatory

22

23 Climateprediction.net ● Climate change study (Oxford University) – Met Office model (FORTRAN, 1M lines) ● Input: ~10MB executable, 1MB data ● Output per workunit: – 10 MB summary (always upload) – 1 GB detail file (archive on client, may upload) ● CPU time: 2-3 months (can't migrate) – trickle messages – preemptive scheduling

24

25 Biology projects ● Protein folding – Predictor@home (Scripps Institute) – Folding@home (Stanford) ● Virtual drug discovery – fightAIDS@home ● Gene sequence analysis – NTT projects – Lattice (U. Maryland)

26 Einstein@home ● Gravitational wave detection; LIGO ● UW Milwaukee/CalTech/Max Planck Inst. ● 30,000 40 MB data sets ● Each data set is analyzed w/ 40,000 different parameter sets; each takes ~6 hrs CPU ● Locality scheduling – minimize data transfer, client disk usage – minimize credit-granting delay

27

28 CERN projects ● LHC@home – accelerator simulation (Sixtrack) ● HEP@home – collision data analysis

29 Others ● UCB Internet measurement – Map/measure the Internet and home PCs ● BURP (big ugly rendering project) – ray-tracing ● PlanetQuest – image analysis for planetary transit detection

30 Challenges and questions ● Get 100 million participants – simplified account management ● Get more projects ● Distributed file system support ● Use peer-to-peer communication – BitTorrent integration ● Use GPUs and other resources ● Integrate with Grid (Lattice, CERN)

31 Volunteer computing ● A new high-performance computing paradigm ● Benefits to projects: – enables otherwise infeasible computational research – economic advantage even for small projects ● Benefits to participants: – increase public scientific knowledge/interest – catalyze virtual communities – democratize resource allocation


Download ppt "David P. Anderson Space Sciences Laboratory University of California – Berkeley Volunteer Computing."

Similar presentations


Ads by Google