Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenShift as a cloud for Data Science

Similar presentations


Presentation on theme: "OpenShift as a cloud for Data Science"— Presentation transcript:

1 OpenShift as a cloud for Data Science
Yegor Maksymchuk, Soft Serve, Ukraine OpenShift as a cloud for Data Science

2 Agenda Kubernetes Openshift Kubernetes vs Openshift Apache Spark
Radanalitics and OSHINKO

3 whoami Yegor Maksymchuk Software engineer, Soft Serve Ukraine
GitHub: YegorMaksymchuk LinkedIn: ymaksymchuk

4 OSHINKO Problems Integration with Apache Spark in the Openshift.
Usability on UI and API level, it should be easy to use. OSHINKO

5 Data Science in the Cloud

6 Kubernetes Kubernetes is a Greek word, which means, helmsman or pilot. It is required to schedule, organize, runs and manages the containers in a cluster which can be in virtual or physical machines. The main thing about Kubernetes is that it has a declarative approach. Kubernetes was started by Google in the year It is based on over 10 years of experience on the internal container platform, called Borg. It was first released officially in July Google donated the Kubernetes to the Cloud Native Computing Foundation. It is a 100 percent open source and it is written in Go.

7 Kubernetes:POD apiVersion: v1 kind: Pod metadata: name: pod-demo
labels: spec: containers: - name: pod-demo image: yemax/pod-demo:1 ports: - containerPort: 8081 Base entity inK8s. It is group of one or more containers with shared storage/network, and a specification for how to run the containers. Containers within a pod share an IP address and port space, and can find each other via localhost.

8 Kubernetes:Namespace
{ "kind": "Namespace", "apiVersion": "v1", "metadata": { "name": "development", "labels": { "name": "development" } Namespace - grouping entities, provide shared resources. Not security K8s user see all nemaspace. Node same as workers same as Minions - it vm or physical servers, what managed be k8s master.

9 Kubernetes: Replica Sets
ReplicaSet is the next-generation Replication Controller. The only difference between a ReplicaSet and a Replication Controller right now is the selector support. ReplicaSet supports the new set-based selector requirements as described in the labels user guide whereas a Replication Controller only supports equality-based selector requirements. About labels and selectors, we will talk later when starting with deployment from java code demo.

10 K8s: Deployment DC - it is main entity what responsible for creation a Replica Set, and include information about what POD should be created by RS.

11 Kubernetes: Ingress This entity responsible for a connect external traffic to service. It is a Ngenix WebServer with special rule suite. Or An Ingress is a collection of rules that allow inbound connections to reach the cluster services.

12 K8s: Architecture Kubelet
This is a service running on each node that manages containers and is managed by the master. It receives REST API calls from the master and manages the resources on that node. Kubelet ensures that the containers defined in the API call are created and started. Kubelet is a Kubernetes-internal concept and generally does not require direct manipulation etcd This is a simple, distributed, watchable, and consistent key/value store. It stores the persistent state of all REST API objects —for example, how many pods are deployed on each worker node, labels assigned to each pod (which can then be used to include the pods in a service), and namespaces for different resources. For reliability, etcd is typically run in a cluster. Proxy This runs on each node, acting as a network proxy and load bal‐ancer for a service on a worker node. Client requests coming through an external load balancer will be redirected to the containers running in a pod through this proxy. Docker Docker Engine is the container runtime running on each node.It understands the Docker image format and knows how to run Docker containers. Controller manager The controller manager is a daemon that watches the state of the cluster using the API server for different controllers and reconciles the actual state with the desired one (e.g., the number of pods to run for a replica set). Scheduler The scheduler works with the API server to schedule pods to the nodes. The scheduler has information about resources available on the worker nodes, as well as the ones requested by the pods.

13 Openshift

14

15

16

17

18 Openshift: Deployment

19

20 Openshift: S2I

21

22 s2i-lighttpd/ Dockerfile – This is a standard Dockerfile where we’ll define the builder image Makefile – a helper script for building and testing the builder image test/ run – test script, testing if the builder image works correctly test-app/ – directory for your test application .s2i/bin/ assemble – script responsible for building the application run – script responsible for running the application save-artifacts – script responsible for incremental builds, covered in a future article usage – script responsible for printing the usage of the builder image

23 Openshift vs Kubernetes
K8s: Orchestration tool Ingress based on “Ngnix” Namespace not “secure” Openshift: Platform as a Service Routes based on HAProxy Namespace “secure”, and more understandable. S2I Builds new images, after push new source. Pool of prepared images

24 Data Science use Spark

25 Data Science in the Cloud

26 Apache Spark

27 Spark on OpenShift

28 OSHINKO: S2I

29 OSHINKO: Spark integrator

30 DEMO

31 DEMO oc cluster up oc new-project devops-stage-demo
oc create -f oc create -f oc new-app oshinko-webui oshinko create devops-spark-cluster oshinko get devops-spark-cluster oc new-app --template=$namespace/apache-zeppelin-openshift \ param=APPLICATION_NAME=apache-zeppelin \ param=GIT_URI= \ param=ZEPPELIN_INTERPRETERS=md

32 Questions ?

33


Download ppt "OpenShift as a cloud for Data Science"

Similar presentations


Ads by Google