OpenShift as a cloud for Data Science

OpenShift as a cloud for Data Science
Yegor Maksymchuk, Soft Serve, Ukraine OpenShift as a cloud for Data Science

Agenda Kubernetes Openshift Kubernetes vs Openshift Apache Spark
Radanalitics and OSHINKO

whoami Yegor Maksymchuk Software engineer, Soft Serve Ukraine
GitHub: YegorMaksymchuk LinkedIn: ymaksymchuk

OSHINKO Problems Integration with Apache Spark in the Openshift.
Usability on UI and API level, it should be easy to use. OSHINKO

Data Science in the Cloud

Kubernetes Kubernetes is a Greek word, which means, helmsman or pilot. It is required to schedule, organize, runs and manages the containers in a cluster which can be in virtual or physical machines. The main thing about Kubernetes is that it has a declarative approach. Kubernetes was started by Google in the year It is based on over 10 years of experience on the internal container platform, called Borg. It was first released officially in July Google donated the Kubernetes to the Cloud Native Computing Foundation. It is a 100 percent open source and it is written in Go.

Kubernetes:POD apiVersion: v1 kind: Pod metadata: name: pod-demo
labels: spec: containers: - name: pod-demo image: yemax/pod-demo:1 ports: - containerPort: 8081 Base entity inK8s. It is group of one or more containers with shared storage/network, and a specification for how to run the containers. Containers within a pod share an IP address and port space, and can find each other via localhost.

Kubernetes:Namespace
{ "kind": "Namespace", "apiVersion": "v1", "metadata": { "name": "development", "labels": { "name": "development" } Namespace - grouping entities, provide shared resources. Not security K8s user see all nemaspace. Node same as workers same as Minions - it vm or physical servers, what managed be k8s master.

Kubernetes: Replica Sets
ReplicaSet is the next-generation Replication Controller. The only difference between a ReplicaSet and a Replication Controller right now is the selector support. ReplicaSet supports the new set-based selector requirements as described in the labels user guide whereas a Replication Controller only supports equality-based selector requirements. About labels and selectors, we will talk later when starting with deployment from java code demo.

K8s: Deployment DC - it is main entity what responsible for creation a Replica Set, and include information about what POD should be created by RS.

Kubernetes: Ingress This entity responsible for a connect external traffic to service. It is a Ngenix WebServer with special rule suite. Or An Ingress is a collection of rules that allow inbound connections to reach the cluster services.

K8s: Architecture Kubelet
This is a service running on each node that manages containers and is managed by the master. It receives REST API calls from the master and manages the resources on that node. Kubelet ensures that the containers defined in the API call are created and started. Kubelet is a Kubernetes-internal concept and generally does not require direct manipulation etcd This is a simple, distributed, watchable, and consistent key/value store. It stores the persistent state of all REST API objects —for example, how many pods are deployed on each worker node, labels assigned to each pod (which can then be used to include the pods in a service), and namespaces for different resources. For reliability, etcd is typically run in a cluster. Proxy This runs on each node, acting as a network proxy and load bal‐ancer for a service on a worker node. Client requests coming through an external load balancer will be redirected to the containers running in a pod through this proxy. Docker Docker Engine is the container runtime running on each node.It understands the Docker image format and knows how to run Docker containers. Controller manager The controller manager is a daemon that watches the state of the cluster using the API server for different controllers and reconciles the actual state with the desired one (e.g., the number of pods to run for a replica set). Scheduler The scheduler works with the API server to schedule pods to the nodes. The scheduler has information about resources available on the worker nodes, as well as the ones requested by the pods.

Openshift

Openshift: Deployment

Openshift: S2I

s2i-lighttpd/ Dockerfile – This is a standard Dockerfile where we’ll define the builder image Makefile – a helper script for building and testing the builder image test/ run – test script, testing if the builder image works correctly test-app/ – directory for your test application .s2i/bin/ assemble – script responsible for building the application run – script responsible for running the application save-artifacts – script responsible for incremental builds, covered in a future article usage – script responsible for printing the usage of the builder image

Openshift vs Kubernetes
K8s: Orchestration tool Ingress based on “Ngnix” Namespace not “secure” Openshift: Platform as a Service Routes based on HAProxy Namespace “secure”, and more understandable. S2I Builds new images, after push new source. Pool of prepared images

Data Science use Spark

Data Science in the Cloud

Apache Spark

Spark on OpenShift

OSHINKO: S2I

OSHINKO: Spark integrator

DEMO oc cluster up oc new-project devops-stage-demo
oc create -f oc create -f oc new-app oshinko-webui oshinko create devops-spark-cluster oshinko get devops-spark-cluster oc new-app --template=$namespace/apache-zeppelin-openshift \ param=APPLICATION_NAME=apache-zeppelin \ param=GIT_URI= \ param=ZEPPELIN_INTERPRETERS=md

Questions ?

OpenShift as a cloud for Data Science

Similar presentations

Presentation on theme: "OpenShift as a cloud for Data Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

OpenShift as a cloud for Data Science

Similar presentations

Presentation on theme: "OpenShift as a cloud for Data Science"— Presentation transcript:

Similar presentations

About project

Feedback