Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reliability of Parallel Build Systems Derrick Coetzee, George Necula UC Berkeley Creative Commons Zero Waiver: To the extent possible under law, the author,

Similar presentations


Presentation on theme: "Reliability of Parallel Build Systems Derrick Coetzee, George Necula UC Berkeley Creative Commons Zero Waiver: To the extent possible under law, the author,"— Presentation transcript:

1 Reliability of Parallel Build Systems Derrick Coetzee, George Necula UC Berkeley Creative Commons Zero Waiver: To the extent possible under law, the author, Derrick Coetzee, waives all copyright and related or neighboring rights to this work.

2 Why parallelize builds? Developer cycle time – Faster builds = Developers get more work done, higher morale Continuous integration – Faster builds = tests run more often Check-in verification systems – Faster builds = more throughput on check-in queue

3 Parallel build systems today Job scheduling Typical example: make -j – Find n build steps that have no unbuilt dependencies and run them – Whenever one exits, start the next one Depends on the dependency graph being correct and complete Coarse-grained task parallelism

4 What could go wrong? Incomplete dependency information – Serial builds → leads to incorrect incremental builds – Parallel builds → leads to nondeterministic builds, build breaks, incorrect builds – Developer changes can introduce or remove dependencies at any time #include "yy.lex.h"

5 Example of missing dependencies gcc test.c -o test – What files does it read/write/test existence of?

6 Example of missing dependencies gcc test.c -o test – What files does it read/write/test existence of? Actual: 5 processes, 119 files/directories /usr/bin/gcc/etc/ld.so.hwcap/tmp /usr/lib/gcc/…/cc1/lib/libc.so.6/tmp/ccdCCHK0.s /usr/bin/as/proc/meminfo/tmp/ccKs1ykU.c /usr/bin/ldtest.c.gch/tmp/cc0YtTuE.o /usr/bin/nm/usr/lib/crt1.o/tmp/ccGGL3Eo.ld /usr/bin/strip/usr/…/lib/specs/tmp/ccG4c608.le ………

7 Parallel builds are error-prone Missing dependencies cause errors Nondeterministic builds make errors difficult to reproduce Unnecessary dependencies limit scalability An alternative: – Developer specifies serial build (easier!) – Serial build is automatically parallelized – Nondeterminism is eliminated

8 Build transactions Each build step’s file operations are monitored using system call interception A transaction manager inserts locks before accessing each file (may suspend processes) Ensure that parallel build behaves in same way as the serial build – Use concurrency control techniques from databases – Schedule is conflict-equivalent to the user’s serial schedule

9 Build transactions example (1) Compile test.c to test.o, then (2) link: tidLock/unlockLock typePathResult 1LOCKREAD/etc/ld.so.cacheOK 2LOCKREAD/etc/ld.so.cacheOK …………… 1LOCKCREATEtest.oOK …………… 2LOCKTESTtest.oBLOCKED …………… 1UNLOCKCREATEtest.oOK 2LOCKTESTtest.oOK

10 Build transactions example What if transaction 2 takes the lock first? tidLock/unlockLock typePathResult 1LOCKREAD/etc/ld.so.cacheOK 2LOCKREAD/etc/ld.so.cacheOK …………… 2LOCKTESTtest.oOK …………… 1LOCKCREATEtest.oROLLBACK 2 …………… 2LOCKTESTtest.oBLOCKED …………… 1UNLOCKCREATEtest.oOK 2LOCKTESTtest.oOK

11 Avoiding cascading rollback To ensure conflict-equivalence to the serial schedule, transactions must commit in order – Strict two-phase locking is too strict Instead, take advantage of the fact that the dependency graph – and lock set – changes very little from build to build Predicted locks – Derived from set of possible conflicts during previous run – Never block – Give no privilege to access data – Block conflicting lock attempts by transactions with larger timestamps

12 Build transactions example Compile step followed by a link step: tidLock/unlockLock typePathResult 1PREDICTED LOCK CREATEtest.oOK 1LOCKREAD/etc/ld.so.cacheOK 2LOCKREAD/etc/ld.so.cacheOK …………… 2LOCKTESTtest.oBLOCKED …………… 1LOCKCREATEtest.oOK …………… 1UNLOCKCREATEtest.oOK 2LOCKTESTtest.oOK

13 Preliminary results - Linux kernel build Number of concurrent processes

14 Preliminary results - Linux kernel build Statistics: – Number of transactions/build steps: 2,949 – Parallel build time: 3m9s – Total lock requests: 1,859,172 – Lock requests blocked due to conflict: 1,697

15 Future work: Unimplemented stuff Haven’t yet implemented rollback – Needed for “unexpected dependencies” Fast cross-platform system call interception – ptrace, binary translation, custom filesystem? Multiversion timestamping – Useful for builds that read/write the same file multiple times Append-only files – Log files, standard out

16 Future work: Diagnosing make build bugs If two build steps experience a conflict, but neither depends on the other directly or indirectly… – This proves the make build is nondeterministic – Isolates most important missing dependencies Filter dependency graph by “files in my source repository” – Finds other interesting dependencies (e.g. headers) Easy bug-finding tool for existing projects

17 Future work: Process hierarchies Long-running process spawning many short- lived processes (e.g. make) Rolling back make would be very bad Solution is virtualization: – Lie to make (your children have completed) – Predict outputs of children based on previous build – block make if it tries to access these – Rolling back make (if necessary) isn’t so bad now

18 Future work: Intra-build step parallelism Efficient parallel parsing for compilation – Ref Par Lab Browser’s work (Seth Fowler) Efficient parallel optimization – Unexplored? Efficient parallel linking – Ref Google’s gold linker

19 Questions?

20 Future work: Validated incremental builds Observation: most build steps produce same output files as in previous build Go ahead and use the old versions – if they’re wrong, we’ll find out when that file is rebuilt Eliminates blocking for a faster parallel build, at the cost of more rollbacks

21 Future work: Distributed parallel builds How to automatically partition builds between machines based on dependency graph? How to efficiently handle unexpected dependencies


Download ppt "Reliability of Parallel Build Systems Derrick Coetzee, George Necula UC Berkeley Creative Commons Zero Waiver: To the extent possible under law, the author,"

Similar presentations


Ads by Google