Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team.

Slides:



Advertisements
Similar presentations
MyProxy Jim Basney Senior Research Scientist NCSA
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
Cloud Computing: hadoop Security Design -2009
Central Authentication Service (CAS). What is CAS? JA-SIG Central Authentication Service is an enterprise level, open-source, single sign on solution.
Security Protocols Sathish Vadhiyar Sources / Credits: Kerberos web pages and documents contained / pointed.
Windows 2000 Security --Kerberos COSC513 Project Sihua Xu June 13, 2014.
Akshat Sharma Samarth Shah
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MyProxy: A Multi-Purpose Grid Authentication Service
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Using Kerberos the fundamentals. Computer/Network Security needs: Authentication Who is requesting access Authorization What user is allowed to do Auditing.
Securing the Hadoop Ecosystem
Resource Management with YARN: YARN Past, Present and Future
Grid Security. Typical Grid Scenario Users Resources.
National Center for Supercomputing Applications Integrating MyProxy with Site Authentication Jim Basney Senior Research Scientist National Center for Supercomputing.
Securing the Borderless Network March 21, 2000 Ted Barlow.
DGC Paris Community Authorization Service (CAS) and EDG Presentation by the Globus CAS team & Peter Kunszt, WP2.
ACCESS CONTROL MANAGEMENT Project Progress (as of March 3) By: Poonam Gupta Sowmya Sugumaran.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Hortonworks. We do Hadoop.
JA-SIG CAS Enterprise Single Sign-On Scott Battaglia Application Developer Enterprise Systems & Services Rutgers, the State University of New Jersey Copyright.
Jian Wang Based on “Meet Hadoop! Open Source Grid Computing” by Devaraj Das Yahoo! Inc. Bangalore & Apache Software Foundation.
Next Generation of Apache Hadoop MapReduce Arun C. Murthy - Hortonworks Founder and Architect Formerly Architect, MapReduce.
Developing and Deploying Apache Hadoop Security Owen O’Malley - Hortonworks Co-founder and © Hortonworks Inc.
GROUP 7 TOOLS FOR BIG DATA Sandeep Prasad Dipojjwal Ray.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
HADOOP ADMIN: Session -2
Edwin Sarmiento Microsoft MVP – Windows Server System Senior Systems Engineer/Database Administrator Fujitsu Asia Pte Ltd
Ins and Outs of Authenticating Users Requests to IIS 6.0 and ASP.NET Chris Adams Program Manager IIS Product Unit Microsoft Corporation.
70-294: MCSE Guide to Microsoft Windows Server 2003 Active Directory Chapter 9: Active Directory Authentication and Security.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
State of the Elephant Hadoop yesterday, today, and tomorrow Page 1 Owen
Our Experience Running YARN at Scale Bobby Evans.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
HAMS Technologies 1
Netprog: Kerberos1 KERBEROS. Contents: Introduction History Components Authentication Process Strengths Weaknesses and Solutions Applications References.
Authentication Applications Unit 6. Kerberos In Greek and Roman mythology, is a multi-headed (usually three-headed) dog, or "hellhound” with a serpent's.
Module 5 Configuring Authentication. Module Overview Lesson 1: Understanding Classic SharePoint Authentication Providers Lesson 2: Understanding Federated.
Single Sign-on with Kerberos 1 Chris Eberle Ryan Thomas RC Johnson Kim-Lan Tran CS-591 Fall 2008.
Military Technical Academy Bucharest, 2004 GETTING ACCESS TO THE GRID Authentication, Authorization and Delegation ADINA RIPOSAN Applied Information Technology.
CAS Lightning Talk Jasig-Sakai 2012 Tuesday June 12th 2012 Atlanta, GA Andrew Petro - Unicon, Inc.
ACCESS CONTROL MANAGEMENT Project Progress (as of March 3) By: Poonam Gupta Sowmya Sugumaran.
Planning a Microsoft Windows 2000 Administrative Structure Designing default administrative group membership Designing custom administrative groups local.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Key Management. Given a computer network with n hosts, for each host to be able to communicate with any other host would seem to require as many as n*(n-1)
A Distributive Server Alberto Pareja-Lecaros. Introduction Uses of distributive computing - High powered applications - Ever-expanding server so there’s.
ACCESS CONTROL MANAGEMENT Poonam Gupta Sowmya Sugumaran PROJECT GROUP # 3.
Ins and Outs of Authenticating Users Requests to IIS 6.0 and ASP.NET Chris Adams Program Manager IIS Product Unit Microsoft Corporation.
National Computational Science National Center for Supercomputing Applications National Computational Science GSI Online Credential Retrieval Requirements.
© Hortonworks Inc Hadoop and Kerberos: The madness beyond the gate Steve 2015.
Web Services Security Patterns Alex Mackman CM Group Ltd
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop Joshua Nester, Garrison Vaughan, Calvin Sauerbier, Jonathan Pingilley, and Adam Albertson.
WLCG Authentication & Authorisation LHCOPN/LHCONE Rome, 29 April 2014 David Kelsey STFC/RAL.
Advanced Authentication Campus-Booster ID: Copyright © SUPINFO. All rights reserved Kerberos.
Cloud Computing project NSYSU Sec. 1 Demo. NSYSU EE IT_LAB2 Outline  Our system’s architecture  Flow chart of the hadoop’s job(web crawler) working.
February 1999T. Haupt, DATORR meeting1 Gateway System New Generation of WebFlow.
LINUX Presented By Parvathy Subramanian. April 23, 2008LINUX, By Parvathy Subramanian2 Agenda ► Introduction ► Standard design for security systems ►
Next Generation of Apache Hadoop MapReduce Owen
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
1 Example security systems n Kerberos n Secure shell.
What is Kerberos? Network authentication protocol Developed at MIT in the mid 1980s Kerberos is a three-headed dog Available as open source or in supported.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Grid Security.
Chapter 10 Data Analytics for IoT
IBM Certified WAS 8.5 Administrator
The Basics of Apache Hadoop
Lecture 16 (Intro to MapReduce and Hadoop)
Presentation transcript:

Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team

Berlin Buzzwords 2011 Introductions Who I am –Principal Engineer at Yahoo! Sunnyvale Working on Apache Hadoop and related projects –MapReduce, Hadoop Security, HCatalog Apache Hadoop Committer/PMC member Apache HCatalog Committer

Berlin Buzzwords 2011 Problem Different yahoos need different data. PII versus financial Need assurance that only the right people can see data. Need to log who looked at the data. Yahoo! has more yahoos than clusters. Requires isolation or trust. Security improves ability to share clusters between groups 3

Berlin Buzzwords 2011 History Originally, Hadoop had no security. –Only used by small teams who trusted each other –On data all of them had access to Users and groups were added in 0.16 –Prevented accidents, but easy to bypass –hadoop fs –Dhadoop.job.ugi=joe –rmr /user/joe We needed more… 4

Berlin Buzzwords 2011 Why is Security Hard? Hadoop is Distributed –runs on a cluster of computers. Trust must be mutual between Hadoop Servers and the clients

Berlin Buzzwords 2011 Need Delegation Not just client-server, the servers access other services on behalf of others. MapReduce need to have user’s permissions –Even if the user logs out MapReduce jobs need to: –Get and keep the necessary credentials –Renew them while the job is running –Destroy them when the job finishes

Berlin Buzzwords 2011 Solution Prevent unauthorized HDFS access All HDFS clients must be authenticated. Including tasks running as part of MapReduce jobs And jobs submitted through Oozie. Users must also authenticate servers Otherwise fraudulent servers could steal credentials Integrate Hadoop with Kerberos Proven open source distributed authentication system. 7

Berlin Buzzwords 2011 Requirements Security must be optional. –Not all clusters are shared between users. Hadoop must not prompt for passwords –Makes it easy to make trojan horse versions. –Must have single sign on. Must handle the launch of a MapReduce job on 4,000 Nodes Performance / Reliability must not be compromised

Berlin Buzzwords 2011 Security Definitions Authentication – Who is the user? –Hadoop 0.20 completely trusted the user Sent user and groups over wire –We need it on both RPC and Web UI. Authorization – What can that user do? –HDFS had owners and permissions since Auditing – Who did that?

Berlin Buzzwords 2011 Authentication RPC authentication using Java SASL (Simple Authentication and Security Layer) –Changes low-level transport –GSSAPI (supports Kerberos v5) –Digest-MD5 (needed for authentication using various Hadoop Tokens) –Simple WebUI authentication done via plugin –Yahoo! uses internal plugin, SPNEGO, etc.

Berlin Buzzwords 2011 Authorization HDFS –Command line and semantics unchanged MapReduce added Access Control Lists –Lists of users and groups that have access. –mapreduce.job.acl-view-job – view job –mapreduce.job.acl-modify-job – kill or modify job Code for determining group membership is pluggable. –Checked on the masters. All servlets enforce permissions.

Berlin Buzzwords 2011 Auditing HDFS can track access to files MapReduce can track who ran each job Provides fine grain logs of who did what With strong authentication, logs provide audit trails

Berlin Buzzwords 2011 Kerberos and Single Sign-on Kerberos allows user to sign in once –Obtains Ticket Granting Ticket (TGT) kinit – get a new Kerberos ticket klist – list your Kerberos tickets kdestroy – destroy your Kerberos ticket TGT’s last for 10 hours, renewable for 7 days by default –Once you have a TGT, Hadoop commands just work hadoop fs –ls / hadoop jar wordcount.jar in-dir out-dir 13

Berlin Buzzwords 2011 Kerberos Dataflow 14

Berlin Buzzwords 2011 HDFS Delegation Tokens To prevent authentication flood at the start of a job, NameNode creates delegation tokens. –Krb credentials are not passed to the JT Allows user to authenticate once and pass credentials to all tasks of a job. JobTracker automatically renews tokens while job is running. –Max lifetime of delegation tokens is 7 days. Cancels tokens when job finishes.

Berlin Buzzwords 2011 Other tokens…. Block Access Token –Short-lived tokens for securely accessing the DataNodes from HDFS Clients doing I/O –Generated by NameNode Job Token –For Task to TaskTracker Shuffle (HTTP) of intermediate data –For Task to TaskTracker RPC –Generated by JobTracker MapReduce Delegation Token –For accessing the JobTracker from tasks –Generated by JobTracker

Berlin Buzzwords 2011 Proxy-Users Oozie (and other trusted services) run operations on Hadoop clusters on behalf of other users Configure HDFS and MapReduce with the oozie user as a proxy: –Group of users that the proxy can impersonate –Which hosts they can impersonate from 17

Berlin Buzzwords 2011 Primary Communication Paths 18

Berlin Buzzwords 2011 Task Isolation Tasks now run as the user. –Via a small setuid program –Can’t signal other user’s tasks or TaskTracker –Can’t read other tasks jobconf, files, outputs, or logs Distributed cache –Public files shared between jobs and users –Private files shared between jobs

Berlin Buzzwords 2011 Questions? Questions should be sent to: Security holes should be sent to: Available from – release of Apache Hadoop – ch-0.20-security/ ch-0.20-security/ Thanks! (also thanks to Owen O’Malley for the slides)

Berlin Buzzwords 2011 If time permits…

Berlin Buzzwords 2011 Upgrading to Security Need a KDC with all of the user accounts. Need service principals for all of the servers. Need user accounts on all of the slaves If you use the default group mapping, you need user accounts on the masters too. Need to install policy files for stronger encryption for Java –

Berlin Buzzwords 2011 Mapping to Usernames Kerberos principals need to be mapped to usernames on servers. Examples: -> ddas -> mapred Operator can define translation.