Presentation is loading. Please wait.

Presentation is loading. Please wait.

POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search.

Similar presentations


Presentation on theme: "POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search."— Presentation transcript:

1 POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search & Transfer

2 Python Execution Model for x in "TEST PROGRAM": if x not in "FORGET IT": print x, Test.py 0 SETUP_LOOP 3 LOAD_CONST ('TEST PROGRAM') 6 GET_ITER 7 FOR_ITER 10 STORE_FAST (x) 13 LOAD_FAST (x) 16 LOAD_CONST ('FORGET IT') 19 COMPARE_OP (not in) 22 JUMP_IF_FALSE (to 33) 25 POP_TOP 26 LOAD_FAST (x) 29 PRINT_ITEM 30 JUMP_FORWARD (to 34) 33 POP_TOP 34 JUMP_ABSOLUTE 37 POP_BLOCK Test.pyc Output SPAM Byte-code compilation Interpretation

3 Python Threading Model Thread A Byte codes Thread B GIL Each thread executes a separate sequence of byte codes All threads must contend for one global interpreter lock

4 Example: Matrix Multiplication Performs a matrix multiplication A = B x C The work is split between several worker threads The application runs on a machine with 8 CPUs Threads do not scale for multiple CPUs due to lock contention on the GIL

5 Workaround: Processes Process AProcess B IPC Each process has its own interpreter lock Requires inter-process communication, using e.g. message passing by means of pipes

6 Matrix Multiplication using Processes A master process distributes the input matrices to a set of worker processes Each worker process computes some part of the output matrix, and returns its result to the master The master process assembles the final result matrix More communication, and more complex pattern than using threads

7 Ways Ahead Communication through standard Python container objects favors threads Scalability on multiprocessor architectures favors processes The GIL is here to stay, so making threads scale better is hard However, there might be room for improvement of inter-process communication mechanisms

8 Using Shared Memory for IPC Process B Shared Memory Process A Processes communicate by accessing a shared memory region Requires explicit synchronization and data marshalling, imposes a flat data structure

9 Using POSH for IPC Process B Shared Memory Process A Allocates regular Python objects in shared memory Shared objects are accessed transparently through regular method calls X L X.method1() L.extend([X, Y]) Y IPC is done by modifying shared, mutable objects

10 Complications Processes must synchronize their access to shared, mutable objects (just like threads) Explicit synchronization of critical regions must be possible, while implicit synchronization upon accessing shared objects is desireable Python’s regular garbage collection algorithm is inadequate for shared objects, which may be referenced by multiple processes

11 Proxy Objects Shared ObjectProxy Object X Provides transparent access to a shared object by forwarding all attribute accesses and method calls Provides a single entry point to a shared object, where synchronization policies may be enforced X.method1() return value X.method2()

12 Multi-Process Garbage Collection Must account for references from all live processes Must stay up-to-date when processes fork, as this may create new references to shared objects Should be able to handle abnormal process termination without leaking shared objects

13 Garbage Collection in POSH Process A Shared Memory Process B X YL Shared Object Proxy Object M Regular Python reference Reference from a process to a shared object (type I) Reference from one shared object to another (type II)

14 Garbage Collection Details POSH creates at most one proxy object per process for any given shared object Shared objects are always referenced through their proxy objects A bitmap in each shared object records the processes that have a corresponding proxy object. This tracks references of type I (from a process) A separate count in each shared object records the number of references to the object from other shared objects. This tracks references of type II Shared objects are deleted when there are no references to them of either type

15 Performance Performing a matrix multiplication A = B x C using POSH The work is split between several worker processes The application runs on a machine with 8 CPUs More overhead, but scales for multiple CPUs

16 Summary Python uses a global interpreter lock (GIL) to serialize execution of byte codes This entails a lack of scalability on multiprocessor architectures for CPU-intensive multi-threaded apps However, threads offer an attractive programming model, with implicit communication Processes + shared memory reduce IPC overheads, but normally impose flat data structures and require data marshalling POSH uses processes + shared memory to offer a programming model similar to threads, with the scalability of processes

17 Availability Open source, hosted at SourceForge Still not very stable Developers wanted

18 Example Usage import posh class Stuff(object): pass posh.allow_sharing(Stuff, posh.generic_init) mystuff = posh.share(Stuff()) def worker1(): mystuff.money = 0 def worker2(): mystuff.debt = def worker3(): mystuff.balance = mystuff.money - mystuff.debt for w in worker1, worker2, worker3: posh.forkcall(w) posh.waitall()


Download ppt "POSH Python Object Sharing Steffen Viken Valvåg In collaboration with Kjetil Jacobsen & Åge Kvalnes University of Tromsø, Norway Sponsored by Fast Search."

Similar presentations


Ads by Google