Presentation is loading. Please wait.

Presentation is loading. Please wait.

StreamComponents: Component-based stream processing “in the cloud”.

Similar presentations


Presentation on theme: "StreamComponents: Component-based stream processing “in the cloud”."— Presentation transcript:

1 StreamComponents: Component-based stream processing “in the cloud”.
Andrew Wendelborn, Paul Martinaitis, Craig Patten. School of Computer Science, University of Adelaide, South Australia.

2 Introduction StreamComponents: Stream processing using components.
Stream comprises stream generator and transducers e.g. The stream is to be highly re-configurable. In particular: Compute codes can be added/changed by the Client at anytime StreamComponents can be deployed, configured and controlled, via WebServices(WS), on a remote cloud. Reflective components as a basis for reconfigurability and adaptation SG1 f2 f3 f4 f5 Any serializable object can be a stream-value. This separates the topology of a Stream from its functional aspects.

3 A simple application is used to consume the images one by one.
Image-pipeline Example Stream processing is common in data analysis and important in scientific workflow We will use an Image processing pipeline as an example stream throughout the talk. Compute codes were written using JMagick which is a JNI wrapper library for the C++ image manipulation library: ImageMagick. A simple application is used to consume the images one by one. Consumer IF3 IF2 IF1 MIS Image Store Several transducer functions have been written for the StreamF components. Each applies a single IM function such as: Blur, OilPaint, Rotate90 etc. Compute code for the Stream component is MultiImageStream. For each invocation of car(), this class returns an image which is loaded from the filesystem.

4 Image pipeline running on a local cluster
stream controller is independent of app code in stream

5 Outline of Talk Objective: explore reflective component technologies as basis for interaction with streams on a remote cloud Supporting technologies: ProActive: distributed objects; asynchronous communication Fractal / GCM Components Science Clouds / Nimbus Remote Deployment and management of a Stream on a Cloud. The WebService Stream Deployer: WSSD. ImagePipeline example using the WSSD. Performance measurements Latency tolerance in cloud-based streams Conclusions and future work.

6 ProActive A 100% Java middleware for creating and managing Active objects. ProActive: a MOP to transparently enable the following properties on Active Objects: Location Transparency. Migration between ProActive Nodes. Asynchronous method invocations with futures based synchronization. Remote deployment and monitoring via GUI based tool: IC2D. ProActive provides an implementation of the Fractal component model Fractive or Grid Components

7 Fractal Components Obtains values from another component by invoking the method provided by a bound server interface Provides values to other components via methods of Content. Fractal is a general (language independent) component model A component consists of a Content surrounded by a membrane. The Content implements the functionality of the component The membrane consists of: Functional interfaces: allow transfer of information between bound components. Binding is from Client to Server. Control interfaces: these reflective interfaces control the non-functional aspects such as: binding (BC), lifecycle (LC) and Attribute Control (AC).

8 The Stream Components – Interfaces
Control-interfaces handle non-functional aspects of the components: LC: LifeCycle Controller BC: Binding Controller Stream(F)CompAttributes StreamFCompAttributes LC BC car() myFunc upStream StreamSink (Stream) Compute code (Stream) StreamSource implementing unaryF interface Stream StreamF A ServerInterface: StreamSource, is provided to supply the next value in the stream. Consists of a single method: public Object car(); The StreamF component is a Stream-transducer and must therefore obtain upstream values. It does this with a client interface: StreamSink.

9 The Stream Components – Compute Code
StreamFCompAttributes LC BC car() myFunc upStream StreamSink (Stream) Compute code (Stream) StreamSource implementing unaryF interface Stream StreamF Example: a transducer which multiplies integer stream values by ten. public class TimesTen implements UnaryF { public Object apply(Object x) { int newValue = ((Integer)x.intValue()) * 10}; return new Integer(newValue);} Example: an Integer Stream public class Ints implements Stream { private static int i = 0; public Object car() {return new Integer(++i) }; }

10 Generic Stream Component Image pipeline on a local cluster
The Stream (red) component supplies values StreamF (green) components are transducers Shows StreamComponents (MIS, IF1 .. IF3) distributed across three hosts (H1 .. H3). Application (via SC_GUI) controls deployment and reconfiguration Change Compute codes and/or topology e.g. Location-transparency of Fractive components handles migration of components between cluster nodes as well as communications between remote components and the local program and SC_GUI. SG1 f2 f3 f4 f5 might become SG1 f2 g1 g2

11 Generic Stream Component Image pipeline on a local cluster
Fractive also transparently facilitates remote communications between bound components Compute codes can be set/changed, via the SC_GUI on the local-host. Fractive bindings allow the byte-code to be transparently uploaded to the remote components.

12 Image pipeline running on a local cluster
public class ImagePipeline { public static void main(String[] args) { StreamComponentFactory SCF = new StreamComponentFactory(); SingleImageStream SIS = new SingleImageStream(); unaryF R90 = new IMRotate90(); unaryF Blur = new Blur(); SISComp = SCF.createNewStreamComponent(“StreamGen", SIS); IF1 = SCF.createNewStreamFComponent(“FuncComp1", R90); … etc. IF4 = SCF.createNewStreamFComponent(“FuncComp4", Blur); SCF.bind(SISComp, IF1); SCF.bind(IF1, IF2); etc. StreamEater SE = new StreamEater( SCF.getStreamInterface(IF4) );

13 Stream Components on a Cloud
When deploying on a remote cloud, intervening network is the Internet. Multiple ‘remote object references’ between the SC and the local-host are not appropriate over the high-latency environment of the Internet The inter-component (Fractive) bindings between the remote StreamComponents present no problems as they are between Cloud hosts.

14 The Web Service Stream Deployer (WSSD)
The WSSD allows creation, monitoring and reconfiguration of a stream on a remote cloud via a single WS interface. The Cloud is presented as a single WSSD WebService, accessed via a local proxy object. All interactions with the localhost occur through a Proxy. Communications over the internet are now consolidated across a single WS channel. Text based handles replace object references Inter-component communications still occur using Fractive remote references Only the WSSD on the remote Cloud has direct access to the components. Different Scheduler components can be plugged in to provide different deployment strategies – transparent to the user.

15 Experimental Infrastructure: Cloud and local
We use the Nimbus Cloud at the University of Chicago. IaaS private cloud scalable to Amazon EC2 Physical hardware: 16 node subset from the (137 node) Teraport cluster. Each node has two 2.2Ghz AMD64 processors, 4GB Ram. Cloud resources provided as Xen virtual machines deployed on the above. Sample and custom images. We use a standard Ubuntu Linux image with ProActive etc installed. Cloud configuration is managed via the Nimbus cloud-client. A command-line utility for creation, monitoring and configuration of nodes. Nodes accessible via ssh. The local-host used in the experiments is Zen: a Core2 duo 2.2GHz, 2GB ram laptop. We also have additional machines forming a local homogeneous cluster: Orac: PentiumD 3.4 GHz 2xCore, 2GB RAM D17: Core2, 2.2GHz, 4xCore, 4GB RAM Interconnection is via the University of Adelaide’s LAN (100 Mbit ethernet) Now we show an example of using WSSD …

16 Running Image-pipeline using WSSD
We use the ‘Load Class’ button to specify compute code. For the Stream component, we select “MultiImageStreamMig”; this class will load a sequence of images from a specified location in the filesystem of the component (in this case the cloud host). Initially, components have no compute-code. Must be set with SC_GUI.

17 We have similarly loaded a StreamIdentity compute class into the three StreamF components.
We can then obtain the first stream value (unchanged image)…

18 We have now changed the StreamF compute classes to perform two 90 degree rotations plus a Blur.
The next stream value shows the same image with the net result of three image operations …

19 Finally, we have reconfigured the stream to work with JAMA matrices.
The Stream component now supplies a stream of diagonal matrices. A StreamF transducer component squares the matrix … We obtain the third value from the stream …

20 Eager Evaluation The stream model presented so far, provides a convenient means of creating and configuring stream computation on a remote cloud. However, the underlying demand driven semantics can be improved because Often, we know that all stream values will be needed; But some components will block waiting for earlier stream values. We can improve performance thus: Create a communication path for unimpeded asynchronous demand flow; And a separate path for flow of return values; Allowing full exploitation of pipelining parallelism inherent in the stream structure. We have implemented such mechanisms for Demand and value flow between stream components themselves Sufficient when stream source and consumer both on cloud Separating demands from values in returning values from remote cloud to localhost This is a WS termed ESVS (used when stream consumer is local) A similar separation in uploading values from local to remote A WS termed WSVS (used when stream source is local)

21 Performance We measured Image-pipeline as before, but with eager components. All transducer functions were set to Blur. We varied the number of Blur stages from one to five each stage deployed on a separate Cloud host. Speedup was calculated against an equivalent non-StreamComponent Java program using the ImageMagick functions directly. Four data series were used consisting of 35 images each: Series Type Resolution Size 950_TIFF TIFF 950x950 1.5MB 2100_TIFF 2100x2100 6.8MB 3000_TIFF 3000x3000 13MB 3000_JPG JPEG

22 Speedup vs. Serial Implementation of Eager Image Pipeline
Both stream source and consumer on the cloud Positive speedup begins at two stages then rises linearly to max of 3.3 for five stages Speedup is largely independent of the image size. Efficiency peaks at 66% for five stages. Size independence of speedup suggests that the limiting factor, for the test data used here at least, is the overheads intrinsic to the SC implementation; specifically the costs of serialization / de-serialization.

23 Performance of stream using ESVS and WSVS
Highest speedup obtained with 3000_JPG due to largest computation/communications ratio. This performance exceeds 3000_TIFF series using cloud-based consumer (previous graph) Both stream source and consumer are on localhost, so both ESVS and WSVS are used 3000_TIFF series still performs well but speedup slightly lower (2.9) than cloud-based consumer (3.3). As expected, 950_TIFF performs worst with lowest computation/communications ratio. Using ESVS and WSVS to transfer values resulted in a 12% drop in speedup for the 3000_TIFFseries compared to using a cloud-based consumer.

24 Network traffic between cloud and local-host during stream execution
We saw that the cost of using the ESVS + WSVS to transfer the data was a 12% drop in speedup for the 3000_TIFF series. This is an excellent result showing that most of the transfer costs have been masked through overlap with computation. Not using ESVS + WSVS would require separately: uploading the raw images, processing them, then downloading them back to the local-host. This would negate most of the speedup gained by using the eager stream.

25 Network traffic between cloud and local-host during stream execution
Pipeline is now full and first values are now being sent back to the consumer. In this region, there is overlap between transmission and receiving of values and computation in the pipeline. At this point, the pipeline begins to fill, resulting in values being transmitted, via WSVS, to the remote cloud.

26 Network traffic between cloud and local-host during stream execution
Last value is received by consumer Pipeline is now empty, so computation halts whilst last values are transmitted back to consumer

27 Summary Reviewed basic StreamComponent model:
Stream and StreamF: stream generators and transducers Demand driven evaluation Compute code can be reconfigured via SC_GUI. Streams very general: any serializable class can be a stream value. Demonstrated the WSSD as a means to create, control and re-configure streams on a remote cloud. Demonstrated how stream evaluation can be made more eager, with a useful amount of positive speedup. Described mechanisms (ESVS and WSVS) for incremental streaming from local node, through stream processing on cloud, and back to local node Faciltating uploading of raw data Results show very useful potential for latency hiding Sound basis for automated adaptive stream controller Explore dynamic properties inherent in streams and in components with elasticity of resource provisioning in cloud

28


Download ppt "StreamComponents: Component-based stream processing “in the cloud”."

Similar presentations


Ads by Google