Hortonworks. We do Hadoop.

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

OneBridge Mobile Data Suite Product Positioning. Target Plays IT-driven enterprise mobility initiatives Extensive support for integration into existing.
MyProxy: A Multi-Purpose Grid Authentication Service
Access Control Chapter 3 Part 3 Pages 209 to 227.
Privileged Account Management Jason Fehrenbach, Product Manager.
Securing the Hadoop Ecosystem
Building and Deploying Safe and Secure Android Apps for Enterprise Presented by Technology Consulting Group at Endeavour Software Technologies.
Security and Policy Enforcement Mark Gibson Dave Northey
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Report Distribution Report Distribution in PeopleTools 8.4 Doug Ostler & Eric Knapp 7264.
Confidential FullArmor Corp Platform for SaaS and mobile apps to remotely access, migrate, and sync Active Directory resources with the cloud ADanywhere.
Identity and Access Management
Understanding Active Directory
MongoDB Sharding and its Threats
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Developing and Deploying Apache Hadoop Security Owen O’Malley - Hortonworks Co-founder and © Hortonworks Inc.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Windows ® Powered NAS. Agenda Windows Powered NAS Windows Powered NAS Key Technologies in Windows Powered NAS Key Technologies in Windows Powered NAS.
Managing Client Access
Making Apache Hadoop Secure Devaraj Das Yahoo’s Hadoop Team.
Directory and File Transfer Services Chapter 7. Learning Objectives Explain benefits offered by centralized enterprise directory services such as LDAP.
May 30 th – 31 st, 2006 Sheraton Ottawa. Microsoft Certificate Lifecycle Manager Saleem Kanji Technology Solutions Professional - Windows Server Microsoft.
Edwin Sarmiento Microsoft MVP – Windows Server System Senior Systems Engineer/Database Administrator Fujitsu Asia Pte Ltd
TAM STE Series 2008 © 2008 IBM Corporation WebSEAL SSO, Session 108/2008 TAM STE Series WebSEAL SSO, Session 1 Presented by: Andrew Quap.
Chapter 12: Additional Active Directory Server Roles
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
MCSE Guide to Microsoft Exchange Server 2003 Administration Chapter Four Configuring Outlook and Outlook Web Access.
Hands-On Microsoft Windows Server 2008
Global Customer Partnership Council Forum | 2008 | November 18 1IBM - GCPC MeetingIBM - GCPC Meeting IBM Lotus® Sametime® Meeting Server Deployment and.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
State of the Elephant Hadoop yesterday, today, and tomorrow Page 1 Owen
Brent Mosher Senior Sales Consultant Applications Technology Oracle Corporation.
Simplify and Strengthen Security with Oracle Application Server Allan L Haensgen Senior Principal Instructor Oracle Corporation Session id:
TWSd - Security Workshop Part I of III T302 Tuesday, 4/20/2010 TWS Distributed & Mainframe User Education April 18-21, 2010  Carefree Resort  Carefree,
Sudha Iyer Principal Product Manager Oracle Corporation.
Identity Solution in Baltic Theory and Practice Viktors Kozlovs Infrastructure Consultant Microsoft Latvia.
ArcGIS Server for Administrators
® Gradient Technologies, Inc. Inter-Cell Interworking Access Control Across the Boundary Open Group Members Meeting Sand Diego, CA USA April 1998 Brian.
Empowering people-centric IT Unified device management Access and information protection Desktop Virtualization Hybrid Identity.
Windows Role-Based Access Control Longhorn Update
© Hortonworks Inc Hadoop and Kerberos: The madness beyond the gate Steve 2015.
© ITT Educational Services, Inc. All rights reserved. IS3230 Access Security Unit 6 Implementing Infrastructure Controls.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Week 4 Objectives Overview of Group Policy Group Policy Processing Implementing a Central Store for Administrative Templates.
OVERVIEW OF ACTIVE DIRECTORY
Implementing Server Security on Windows 2000 and Windows Server 2003 Fabrizio Grossi.
Introduction to Active Directory
Active Directory. Computers in organizations Computers are linked together for communication and sharing of resources There is always a need to administer.
Windows 2003 Architecture, Active Directory & DNS Lecture # 3 Hassan Shuja 02/14/2006.
LINUX Presented By Parvathy Subramanian. April 23, 2008LINUX, By Parvathy Subramanian2 Agenda ► Introduction ► Standard design for security systems ►
C Copyright © 2007, Oracle. All rights reserved. Security New Features.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
What’s New in Fireware v WatchGuard Training.
Copyright © New Signature Who we are: Focused on consistently delivering great customer experiences. What we do: We help you transform your business.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
Secure Connected Infrastructure
Introduction to Operating Systems
Vinay Shukla Director, Product Management Dec 8, 2016
Access Control Model for the Hadoop Ecosystem
How to Solve BigData Security Puzzle?
BBMRI Competence Centre Status Report
Securing the Network Perimeter with ISA 2004
Radius, LDAP, Radius used in Authenticating Users
Power BI Security Best Practices
Enterprise security for big data solutions on Azure HDInsight
Data Security for Microsoft Azure
Introduction to Apache
Designing IIS Security (IIS – Internet Information Service)
Presentation transcript:

Hortonworks. We do Hadoop. HDP with Advanced Security Comprehensive Security for Enterprise Hadoop Hortonworks. We do Hadoop.

Agenda Our approach across security pillars Component Deep Dive Questions

Security needs are changing YARN unlocks the data lake Multi-tenant: Multiple applications for data access Changing and complex compliance environment ETL of non-sensitive data can yield sensitive data 5 areas of security focus Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Summer 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters

Security in Hadoop with HDP + Argus (XA Secure) Centralized Security Administration Authentication Who am I/prove it? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion Kerberos in native Apache Hadoop HTTP/REST API Secured with Apache Knox Gateway HDFS Permissions, HDFS ACL, Audit logs in with HDFS & MR Hive ATZ-NG Wire encryption in Hadoop Open Source Initiatives Partner Solutions HDP 2.1 As-Is, works with current authentication methods HDFS, Hive and Hbase Fine grain access control RBAC Centralized audit reporting Policy and access history Future Integration Argus

Map to Nevada Energy Requirements Questions HDP Security Component End User Security LDAP Integration Kerberos, Argus (XA) Group level access Argus(XA) Multiple level of access Multiple Environments Developer Security Access control for creating tables Limit of creating scheme, creating folders

HDP w/ Advanced Security Security Features HDP w/ Advanced Security Authentication Kerberos Support ✔ Perimeter Security – For services and rest API Authorizations Fine grained access control HDFS, Hbase and Hive Role base access control Column level Permission Support Create, Drop, Index, lock, user Auditing Resource access auditing Extensive Auditing Policy auditing

HDP w/ Advanced Security Security Features HDP w/ Advanced Security Data Protection Wire Encryption ✔ Volume Encryption File/Column Encryption Partners Reporting Global view of policies and audit data Manage User/ Group mapping Global policy manager, Web UI Delegated administration

Authentication w/ Kerberos

Kerberos Primer KDC NN Client DN 5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens if access permitted NN 1. kinit - Login and get Ticket Granting Ticket (TGT) 3. Get NameNode Service Ticket (NN-ST) Client Client talks to KDC with Kerberos Library Orange line – Client to KDC communication Green line – Client to HDFS communication, does not talk to Kerberos/KDC DN 2. Client Stores TGT in Ticket Cache 4. Client Stores NN-ST in Ticket Cache 6. Read/write block given Block Access Token and block ID Client’s Kerberos Ticket Cache

Kerberos Summary Provides Strong Authentication Establishes identity for users, services and hosts Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for Authorization Strong Authentication = Password never sent over the wire

Hadoop Authentication Users authenticate with the services CLI & API: Kerberos kinit or keytab Web UIs: Kerberos SPNego or custom plugin (e.g. SSO) Services authenticate with each other Prepopulated Kerberos keytab e.g. DN->NN, NM->RM Services propagate authenticated user identity Authenticated trusted proxy service e.g. Oozie->RM, Knox->WebHCat Job tasks present delegated user’s identity/access Delegation tokens e.g. Job task -> NN, Job task -> JT/RM Strong authentication is the basis for authorization Client (User) Kerberos or Custom Name Node Data Node (Service) Kerberos Name Node Oozie (Service) Kerberos + (User) doas Job Tracker Task Name Node (User) Delegation Token

User Management Most customers use LDAP for user info LDAP guarantees that user information is consistent across the cluster An easy way to manage users & groups The standard user to group mapping comes from the OS on the NameNode Kerberos provides authentication PAM can automatically log user into Kerberos

Kerberos + Active Directory Use existing directory tools to manage users Use Kerberos tools to manage host + service principals AD / LDAP Cross Realm Trust KDC Hosts: host1@HADOOP.EXAMPLE.COM Users: smith@EXAMPLE.COM Services: hdfs/host1@HADOOP.EXAMPLE.COM User Store Client Authentication Hadoop Cluster

Groups Define groups for each required role Hadoop has pluggable interface Mapping from user to group not stored within Hadoop Defaults to the OS information on master node Typically driven from LDAP on Linux Existing Plugins ShellBasedUnixGroupsMapping - /bin/id JniBasedUnixGroupsMapping – system call LdapGroupsMapping – LDAP call CompositeGroupMapping – combines Unix & LDAP group mapping Strong authentication and role-based groups provide protections enabling shared clusters

Groups AD / LDAP Client Hadoop Cluster Lookup Groups User Store Access NameNode Plugin rw Access Client Hadoop Cluster

Kerberos FAQ Where do I install KDC? User Provisioning On a master type node User Provisioning Hook up to Corporate AD/LDAP to leverage existing User Provisioning Growing a cluster Provision new services and nodes in MIT KDC, copy keytabs to new nodes Is Kerberos a SPOF? Kerberos support HA, with delegation tokens the KDC load is reduced

Perimeter REST API Security Knox Gateway Overview Perimeter REST API Security

What does Perimeter Security really mean? Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall REST API REST API User Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host Gateway Hadoop Services Firewall

Enterprise Integration Why Knox? Enhanced Security Centralized Control Protect network details Partial SSL for non-SSL services WebApp vulnerability filter Central REST API auditing Service-level authorization Alternative to SSH “edge node” Simplified Access Enterprise Integration Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility

Current Hadoop Client Model FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via “Edge Node” that is “inside” cluster Users SSH to Edge Node and execute API commands from shell User SSH Edge Node Hadoop

Hadoop REST APIs Service API WebHDFS Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations, JDBC/ODBC over HTTP HBase HBase REST API operations Oozie Job submission and management, and Oozie administration. Learn more about Oozie. Useful for connecting to Hadoop from the outside the cluster

Hadoop REST API Security: Drill-Down Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 DMZ Firewall Firewall HBase Edge Node/Hadoop CLIs RPC LB Knox Gateway GW REST Client HTTP HTTP HTTP Hadoop Cluster 2 Node the arrows to Hadoop Cluster are simplifications Actually there will be multiple arrow – one per port open between Knox and Hadoop Services it supports (WebHDFS, WebHCAT, HiveServer2, HBase, Oozie) & more in future Masters NN HBase RM Oozie Web HCat HS2 Slaves LDAP Enterprise Identity Provider LDAP/AD DN NM

Authorization and Auditing

Authorization and Audit Fine grain access control HDFS – Folder, File Hive – Database, Table, Column HBase – Table, Column Family, Column Audit Extensive user access auditing in HDFS, Hive and HBase IP Address Resource type/ resource Timestamp Access granted or denied Flexibility in defining policies Control access into system

Central Security Administration HDP Advanced Security Delivers a ‘single pane of glass’ for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack

Setup Authorization Policies file level access control, flexible definition Control permissions

Monitor through Auditing

Authorization and Auditing w/ XA XA Administration Portal Enterprise Users Legacy Tools RDBMS XA Audit Server XA Policy Server Integration API HDFS XA Plugin* Hadoop Components HBase XA Plugin Knox XA Plugin* Storm XA Plugin Hive Server2 XA Plugin* Falcon Hadoop distributed file system (HDFS) XA Plugin * - Future Integration YARN : Data Operating System

Simplified Workflow - Hive Audit Database Audit logs pushed to DB 5 1 Admin sets policies for Hive db/tables/columns XA Policy Manager XA Agent IT users access Hive via beeline command tool 2 Hive Authorizes with XAAgent User Application 3 2 Hive Server2 Users access Hive data using JDBC/ODBC HiveServer2 provide data access to users 4 29

Data Protection HDP allows you to apply data protection policy at three different layers across the Hadoop stack Layer What? How ? Storage Encrypt data while it is at rest Partners, OS level encrypt, Custom Code Transmission Encrypt data as it moves Supported in HDP 2.1 Upon Access Apply restrictions when accessed Partners, Open Source Initiatives

Points of Communication Hadoop Cluster Client Nodes Nodes 1 WebHDFS 2 DataTransferProtocol 2 DataTransfer 3 RPC 3 RPC 4 JDBC/ODBC 4 M/R Shuffle

Data Transmission Protection in HDP 2.1 WebHDFS Provides read/write access to HDFS Optionally enable HTTPS Authenticated using SPNEGO (Kerberos for HTTP) filter SSL based wire encryption RPC Communications between NNs, DNs, etc. and Clients SASL based wire encryption DTP encryption with SASL JDBC/ODBC Also available SASL based encryption Shuffle Mapper to Reducer over HTTP(S) with SSL

Data Storage Protection Encrypt at the physical file system level (e.g. dm-crypt) Encrypt via custom HDFS “compression” codec Encrypt at Application level (including security service/device) ABC DEF DEF ABC Security Service (Partner) ETL App ENCRYPT DECRYPT HDFS ABC 1a3d

Current Open Source Initiatives HDFS Encryption Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community Dependency on Key Management Server and Keyshell Key Management Server Key Provider API Hive Column Level Encryption HBase Column Level Encryption Transparent Column Encryption, needs more testing/validation Command line Key Operations

Resources

Security Page

Hortonworks Security Investment Plans Investment themes HDP + XA Comprehensive Security for Enterprise Hadoop Previous Phases Kerberos Authentication HDFS, Hive & Hbase authorization Wire Encryption for data in motion Knox for perimeter security Basic Audit in HDFS & MR SQL Style Hive Authorization ACLs for HDFS Delivered Goals: Comprehensive Security Meet all security requirements across Authentication, Authorization, Audit & Data Protection for all HDP components XA Secure Phase Centralized Security Admin for HDFS, Hive & HBase Centralized Audit Reporting Delegated Policy Administration Delivered XA Secure Central Administration Provide one location for administering security policies and audit reporting for entire platform Future Phases Encryption in HDFS, Hive & Hbase Centralized security administration of entire Hadoop platform Centralized auditing of entire platform Expand Authentication & SSO integration choices Tag based global policies (e.g. Policy for PII) Consistent Integration Integrate with other security & identity management systems, for compliance with IT policies …all IN Hadoop