Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experimenting with Openstack Sahara on Docker

Similar presentations


Presentation on theme: "Experimenting with Openstack Sahara on Docker"— Presentation transcript:

1 Experimenting with Openstack Sahara on Docker
Weiting Chen

2 Legal Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development.  All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. © 2015 Intel Corporation.

3 Agenda Background How to use Docker with Sahara Performance Testing Conclusion

4 Who We Are We are from Intel Big Data Technology Group. We push big data technology forward into OpenStack We contribute Sahara source code in OpenStack, bring Cloudera CDH 5.3 plugin in Kilo.

5 Sahara Background Sahara becomes a core project in Juno
Bring Hadoop into OpenStack Add more features to Kilo release Two Key Features To provide users easily provisioning Hadoop clusters by specifying several parameters Analytics as a Service for data scientist or analyst

6 Sahara Key Features - Provision Cluster
Create/Terminate Cluster Heat API/Nova Direct API Integrate with Neutron/Nova Network Use Guide as a template Anti-affinity Cluster Scaling Add Node/Remove Node Support More Plugins in Kilo Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR/Storm

7 Sahara Key Features - Elastic Data Processing
Support Job Type Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase Support Data Locality Rack/Hypervisor/Swift Data Source Internal: Internal HDFS(Ephemeral Disk/Cinder) External: Swift/HDFS Run Job in Transient Cluster *Different Plugin provide different capabilities

8 Sahara Working Flow Fast Cluster Provisioning
Select Hadoop Version Select Base Image w/ Hadoop Define Cluster Configuration Provision Cluster Operate Cluster Terminate Cluster Provide the details Hadoop configuration, like size, topology, and others Sahara will provision VMs, install and configure Hadoop Support Scale out Cluster to add/remove nodes Analytic as a Service using Elastic Data Processing Select Hadoop Version Configure Jobs Set Limit for Cluster Execute Jobs Get The Result Choose type of the job: pig, hive, jar-file, etc. Select input and output data location (Swift support) Cluster will be removed automatically after the job completion

9 Sahara Data Processing
Pattern 1: Internal - HDFS Only Pattern 2: External - Swift OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance. OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift. Virtual Clusters Virtual Clusters MapReduce MapReduce Cinder HDFS OpenStack Ephemeral Disk Data Stream OpenStack Support External HDFS, but needs to have some configurations manually Swift Collecting Data Collecting Data Collector Agent Collector Agent

10 Docker Background An open source project The latest version is v1.6
Automates the deployment of applications inside software containers Provide fast and application portability Use libcontainer library to use virtualization facilities from Linux kernel Resource isolation using cgroups, kernel namespaces, …etc

11 Sahara + Docker Deliver Better Performance (compare with hypervisors)
Optimize Resource Utilization Reduce Cost Fast Deployment

12 Sahara Architecture Sahara Keystone Hadoop VM Hadoop VM Hadoop VM
Auth Horizon Vendor Plugins Sahara Pages EDP Python Sahara Client Neutron REST API Provisioning Engine Nova|Heat|Cinder DAL Image Registry Glance

13 Sahara + Docker Architecture
Keystone Hadoop VM Hadoop VM Hadoop VM Hadoop VM Sahara Auth Horizon Docker Vendor Plugins Sahara Pages EDP Neutron Python Sahara Client REST API Docker Image nova docker driver Provisioning Engine Nova|Heat|Cinder Docker Registry DAL Image Registry Glance

14 Horizon(OpenStack Dashboard) (Migrate from CM-API Client)
Sahara CDH Plugin End Customer STEP1 CDH Cluster VM1 - Master VM2 - Slave Horizon(OpenStack Dashboard) Cloudera Manager (Cloudera Express v5.1.3, CDH v5.0.0 & CM API v7) Secondary Name Node Sahara Service Data Node Job History CDH Plugin Node Manager Resource Manager Cloudera Manager API Python Client (Migrate from CM-API Client) Oozie Server Name Node STEP2 STEP3 Controller Computing Node1 Step1: Create VM via Heat by using Cluster Template. CM must be included in one master machine. Step2: Use CM API Client to connect to CM and provision the other services in the cluster.

15 Nova Docker Driver Introduced with Havana, move out Icehouse and Juno
For Juno, Must install an older version novadocker # git checkout -b pre-i18n 9045ca43b645e bf5f4f9e4bddbb91 Implement a RESTFul client via httplib to communicate with Docker For Kilo(Upstream), Need to install docker-py Use Docker API Client to communicate with Docker

16 Authenticate & Hostname Issue
Use username & password instead of inject authorized key into instance No cloud-init in docker image, use username & password instead of inject key Upgrade Docker version to support change hostname Docker v1.2 or later can support to change hostname Change “sudo mv etc-host /etc/hosts” to “sudo cp etc-host /etc/hosts” Docker v1.3 response the device is busy when using “mv”. By using “cp” to replace “mv” can be success to run the change

17 Network Port Issue Open Privilege Mode to expose all the ports in the container Modify nova docker driver source code to add “privileged=True” and publish all ports

18 Docker Image Build a docker image by using Dockerfile
Refer sahara-image-elements to build a CDH5 docker image Build a docker image may take a lot of time(try-and-error) Better use Dockerfile cache to reduce the time building the image Copy docker image to every compute node manually Must copy docker image to all the compute nodes, currently glance cannot support to copy the image to compute node If the image cannot be found in docker images, nova will raise an error during starting an instance

19 Build Docker Image - using Dockerfile
Using docker build to build image by DockerFile # docker build -t $image_name:$tag Dockerfile Example From centos:centos6 MAINTAINER Weiting Chen ENV http_proxy RUN echo 'proxy= >> /etc/yum.conf RUN yum install -y cloudera-manager-agent … EXPOSE 21 Add ENV variables at beginning Add proxy setting in individual software configuration Install required software Expose Required Service Port

20 Register & Copy Docker Image to Compute Nodes
Register docker image to glance # docker save cdh5: | glance image-create --is-public=True -- container-format=docker --disk-format=raw --name cdh5: Copy image to all compute nodes # scp cdh5: tar $compute_node:./ Load image to docker registry # docker load -i cdh5: tar If no image can be used in computing node, it will raise an error from nova.

21 Nova Docker Driver Network
Set network to “none” Nova docker driver would leverage existing network configuration from Neutron Support Linux Bridge or OVS NOT use docker0 Use VXLAN in our experiment Create a bridge to OVS automatically Set Privilege Mode to True for convenience Need to set port mapping during docker run if not use privilege mode

22 Docker Network Bridge Mode Host Mode None Mode Default Mode
docker0 docker0 Docker Docker Docker eth0 eth0 eth0 Container1 Container2 Container3 Container1 Container2 Container3 Container1 Container2 Container3 eth0 eth1 Default Mode Support multiple namespaces Only one namespace Nova Docker Driver use this Configure network and connect to bridge via driver

23 Docker Network Performance
BACKGROUND OpenStack Juno using VXLAN Use Docker v1.3 1Gb Ethernet Host to Host Host1 Host2 phy. network 941 Mb Container to different Host Container to the same Host br-ex(floating ip) Host1 C1 Host2 Host1 br-ex(floating ip) C1 941 Mb 14Gb w/ DVR 941 Mb Container to the same Host can be better Container to Container in different Host Container to Container in the same Host Host1 br-tun Host2 Host1 C1 C1 C1 C2 qbr~ 14 Gb 900 Mb

24 Neutron VXLAN without DVR
Controller/Network Node Compute Node qrouter~ snat- br-ex sg~ qg~ tap eth1 qr~ /16 tap tap br-int br-int qbr~ qvb~ tap patch-int patch-int qvo~ tap~ /16 ns~ patch-tun patch-tun vm0 vm0 VM VM qdhcp br-tun br-tun eth2 eth2 /16

25 Controller/Network Node
Neutron VXLAN with DVR Controller/Network Node Compute Node qrouter~ snat- br-ex br-ex fip- qrouter~ sg~ qg~ tap eth1 eth1 tap fg~ fpr~ rfp~ qr~ qr~ /16 tap tap tap br-int br-int qbr~ qvb~ tap patch-int patch-int qvo~ tap~ /16 DVR can enhance the performance in “Container to the same Host”, from 941Mb to 14Gb ns~ patch-tun patch-tun vm0 vm0 VM VM qdhcp br-tun br-tun eth2 eth2 /16

26 Change MTU Size Change MTU Size if you are using VXLAN
Impact: MTU size could impact the network performance. If the MTU size is not change, create instances still can work, but network performance is going down to 1MB. Solution: Change MTU Size in VM #sudo ifconfig eth1 mtu 1400 up

27 Container Disk Space Default image disk space only use 10 GB
Impact: Default reserve 10GB space for HDFS configuration, there is no space to put data in HDFS Solution: Assign parameters when starting Docker service # sudo ./docker -d --storage-opt dm.basesize=20G --storage-opt dm.loopdatasize=200G & *To enable the parameters must clean up /var/lib/docker/ and restart docker

28 vCPU Numbers The number of vCPU is always 1.
Impact: vCPUs calculation may be fail. Solution: In Juno, change the number in nova docker driver source code and set it equal to the number of physical cores.

29 Docker in OpenStack Performance
Network Performance Instance Boot/Cluster Provision Disk Performance using DD HiBench Testing

30 Our Testing Environment
CLUSTER CONFIGURATION Role Details Controller w/ Compute x 1 Controller, Network, Compute Compute x 5 Compute HARDWARE CONFIGURATION Items CPU Intel Xeon X Ghz Memory 64GB(1333Mhz 8GB x 8) Storage 1TB SATA HDD SOFTWARE CONFIGURATION Software Name Versions CentOS 7.0 Docker v1.6 OpenStack Juno

31 Create an instance/Provision a cluster
Assume image has been copied to all the computing node. Create an instance and check the log to capture the response time. Use Docker in 1sec Use KVM in 10sec Provision a cdh cluster still take a long time, this issue comes from Sahara CDH plugin.

32 DD Test Docker Container use CentOS6.6 in Host with CentOS7. File System is XFS Use DD Command: dd if=/dev/zero of=test1 bs=1M count=8192 conv=fdatasync Host: 140~160MB/s Host w/ OpenStack: 100~130MB/s(Controller), 140~160MB/s(Compute) Container Result: 100~140MB/s Docker can provide almost closer disk IO performance with Bare Metal Other machine testing result using SSD(Write Through): Host: 180MB/s 1.3GB/s VM(Ubuntu): 111MB/s 107MB/s

33 Conclusion Docker can bring benefit to boot mass instances
Docker can provide good performance in Disk and Network with a little overhead How to optimize resource utilization will be the focus

34 Call-For-Action Contribute more for Docker and OpenStack
Find the critical components for Big Data on Cloud and let it become better Need more customer use cases for Sahara Contact:

35


Download ppt "Experimenting with Openstack Sahara on Docker"

Similar presentations


Ads by Google