HADOOP IN DOCKER CONTAINERS

Slides:



Advertisements
Similar presentations
Wei Lu 1, Kate Keahey 2, Tim Freeman 2, Frank Siebenlist 2 1 Indiana University, 2 Argonne National Lab
Advertisements

Virtual Machine Technology Dr. Gregor von Laszewski Dr. Lizhe Wang.
Can’t We All Just Get Along? Sandy Ryza. Introductions Software engineer at Cloudera MapReduce, YARN, Resource management Hadoop committer.
System Center 2012 R2 Overview
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Profit from the cloud TM Parallels Dynamic Infrastructure AndOpenStack.
PlanetLab Operating System support* *a work in progress.
Xen Virtualization Andrew Hamilton
Hadoop YARN in the Cloud Junping Du Staff Engineer, VMware China Hadoop Summit, 2013.
Resource Management with YARN: YARN Past, Present and Future
PaaS Design and Architecture: A Deep Dive into Apache Stratos Samisa Abeysinghe VP Delivery, WSO2 Member Apache Software Foundation 10 th June 2014.
VIRTUALIZATION WITH SOLARIS A.V.Bogdanov, PyaeSoneKoKo State Marine Technical University, St.petersburg.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
CloudStack and Big Data Sebastien May 22 nd 2013 LinuxTag, Berlin.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 7 2/23/2015.
Windows Azure Conference 2014 Running Docker on Windows Azure.
Why we did it.... Thousands of VMs dedicated to run Jenkins!
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
MDC417 Follow me on Working as Practice Manager for Insight, he is a subject matter expert in cloud, virtualization and management.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
01/13/051 Cheap, Easy Virtual Hosts for Web-Based Services Richard L. Goerwitz III.
What does it mean to virtualize the Hadoop File System?
Introduction to virtualization
Stairway to the cloud or can we take the highway? Taivo Liik.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Docker Overview Automating.
Virtualization Redefined: Embedded virtualization through CGE7 and Docker. Paul Farmer Technical Solutions Engineering Manager MontaVista Software
Alfresco deployment with Docker Andrea Agili Software Engineer – Dr Wolf srl Tommaso Visconti DevOps – Dr Wolf srl.
BIG DATA/ Hadoop Interview Questions.
Copyright © Univa Corporation, All Rights Reserved Using Containers for HPC Workloads HEPiX – Apr 21, 2016 Fritz Ferstl – CTO, Univa.
Structured Container Delivery Oscar Renalias Accenture Container Lead (NOTE: PASTE IN PORTRAIT AND SEND BEHIND FOREGROUND GRAPHIC FOR CROP)
Containers: Life Beyond Microservices? Sushil Kumar Robin Systems.
Intro To Virtualization Mohammed Morsi
NAT、DHCP、Firewall、FTP、Proxy
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Fundamentals Sunny Sharma Microsoft
Yarn.
Introduction to Distributed Platforms
Red Hat partner event The evolution of Linux – From containers to OpenShift PaaS & how to get started Kristijan Walter, Presales engineer Veracomp d.o.o.
Docker and Azure Container Service
Linux Containers Overview & Roadmap
Chapter 10 Data Analytics for IoT
Docker Birthday #3.
6/11/2018 8:14 AM THR2175 Building and deploying existing ASP.NET applications using VSTS and Docker on Windows Marcel de Vries CTO, Xpirit © Microsoft.
Containers and Virtualisation
CIS 332 Course Experience Tradition / snaptutorial.com
Virtualization overview
Containers in HPC By Raja.
INDIGO – DataCloud PaaS
Drupal VM and Docker4Drupal For Drupal Development Platform
Software Engineering Introduction to Apache Hadoop Map Reduce
Drupal VM and Docker4Drupal as Consistent Drupal Development Platform
Kubernetes Container Orchestration
Microsoft Ignite NZ October 2016 SKYCITY, Auckland.
Container technology. Let’s dive into the world of docker and kubernetes Bjarte Brandt, DevOps Architect TV2.
Intro about Contanier and Docker Technology
Orchestration & Container Management in EGI FedCloud
OpenShift vs. Vanilla k8s on OpenStack IaaS
Introduction to Docker
OpenStack Summit Berlin – November 14, 2018
Kubernetes.
Assoc. Prof. Marc FRÎNCU, PhD. Habil.
Azure Container Service
Client/Server Computing and Web Technologies
Building, Debugging & Deploying Containerized
The Future of Database Development (with containers)
Presentation transcript:

HADOOP IN DOCKER CONTAINERS WHAT WORKS AND WHAT DOESN’T -- IN PRODUCTION! Nasser Manesh

Who Am I? 25 years in Unix infrastructure/SRE/kernel Startups, architect, VP Engineering/CTO roles. Petabyte-scale, production, multi-tenant Hadoop clusters Virtualization, elasticity, container orchestration for Hadoop Connect with me on LinkedIn: nasser@gmail.com

Taking Docker to Production Getting it to Work for Hadoop Pitfalls, Solutions

Show of Hands... Operations, SRE, DevOps? Developer? User of Big Data applications / Data Scientist Management, product managers

Our Hadoop Clusters at Altiscale NodeManagers + DataNodes Workbench Apache Pig, Hive, HDFS-NFS Data Science Apps Machine Learning Apps SSH Name Node Hadoop Slave Resource Manager Hadoop Slave Hadoop Slave Secondary Name Node Hadoop Slave Browser

Hadoop as a Service: It’s not about NODES

Optimization: Business mandate We run on bare metal Multiple data centers Heavily optimized for Hadoop MARGINS: Optimized resource allocation How to partition/re-allocate physical machines?

Partition & Re-allocate Hadoop’s built-in capabilities Hypervisors: Virtual Machines Containers: Lightweight Virtualization Lightweight is important for thousands of very busy cores!

Containers Isolation (namespaces) Resource limits (cgroups)

Containers vs. vm’S

From Chroot to Containers chroot: limiting filesystem view BSD jail (1995): better sandbox, networking, but limited Linux-VServer (2001): security Solaris Zones (2004) OpenVZ (2005) / Parallels LXC (2006) Containers in the kernel (2007)

From Jail to Docker LXC: robust. BSD Jails: well-designed. lmctfy (Let Me Containerize That For You): Google quality. OpenVZ: active development. They have been pretty hard to use! DOCKER IS EASY TO USE. EVERYBODY CAN DO IT.

Docker Is Great For... Local develop/build/test pipelines Builds that are “safer” to ship to production Testing software in different environments CI slave machines Creating mini-clusters for development/testing Packaging and software delivery – can replace RPMs

YES, BUT...

Developers Love Docker, but OPS? Not operations friendly. Separate orchestration/provisioning/automation required. Logging? Are you kidding me? Docker networking considered harmful… Very simplistic. Good for single application, not so for “system” containers. Race conditions, race conditions, race conditions.

Operational Requirements Stability, reliability, predictability Performance and security Enterprise-grade, high throughput networking Metrics and monitoring Delivery infrastructure Troubleshoot-a-bility

Docker in Hadoop? YARN’s ApplicationMaster asks the NodeManager to launch containers: LinuxContainerExecutor Docker can be used not only for fine-grained performance isolation, but for delivering software packages

YES, BUT...

Still Needs Work Support in both YARN and Docker is needed Both sets of changes take time See YARN-1964 for details Altiscale is working with both communities.

Hadoop in Docker Containers The bulk of a cluster consists of DataNodes (HDFS) and NodeManagers (YARN) Traditionally, DN and NM are paired on machines Put the DN and NM into containers, isolate them, and start moving things around It’s repeatable, and can be automated

How We Do It Typical machine: 1 DN container, 1+ NM container Additional NM containers can float around NM containers (and the DN container) are isolated Each container has its own resource limits DN uses a lot of disk IO, not many cores or memory NMs use most of the cores and memory

Cs

Disk Allocation Bulk of the disks go to DNs But NMs need disks too Choose a repeatable layout for multiple disks/machine Think both vertical and horizontal Volumes: pass directories and not devices to Docker Make sure Docker does not see these as AUFS

Networking Docker tries to take over the host Default networking is simple, for ease of development Jumbo frames are not supported out of the box - set your own MTU! Avoid race conditions by serializing Network Namespace operations

Monitoring and Metrics You do not necessarily need to monitor the docker process How your NM checks the health of the node may need additional mounts in the docker container Metrics… check out cAdvisor! Disk metrics in cAdvisor are weak, Altiscale is contributing

Security Isolation is important, but… Privileged mode is a big No No Containers share the same kernel You have to be on top of Docker and libcontainer/lxc security Are hypervisors safer?

Delivery Infrastructure Docker containers are created off of “images” Docker images are served by a registry, an HTTP server Has very basic functionality Images are usually big, and can be proprietary So you need to add authentication, per-colo caching

Orchestration Chef or Puppet: node level Kubernetes, Mesos. Libswarm? Really? Rundeck + Chef – take “scheduler” out of the picture. In-house development/custom work required.

visit us at: www.altiscale.com WE ARE HIRING! THANK YOU FOR JOINING… QUESTIONS? visit us at: www.altiscale.com WE ARE HIRING!

Resources Docker website “The Docker Book” by James Turnbull