Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hortonworks. We do Hadoop.

Similar presentations


Presentation on theme: "Hortonworks. We do Hadoop."— Presentation transcript:

1 Hortonworks. We do Hadoop.
HDP with Advanced Security Comprehensive Security for Enterprise Hadoop Hortonworks. We do Hadoop.

2 Agenda Our approach across security pillars Component Deep Dive
Questions

3 Security needs are changing
YARN unlocks the data lake Multi-tenant: Multiple applications for data access Changing and complex compliance environment ETL of non-sensitive data can yield sensitive data 5 areas of security focus Administration Centrally management & consistent security Authentication Authenticate users and systems Authorization Provision access to data Audit Maintain a record of data access Data Protection Protect data at rest and in motion Summer 2014 65% of clusters host multiple workloads Fall 2013 Largely silo’d deployments with single workload clusters

4 Security in Hadoop with HDP + Argus (XA Secure)
Centralized Security Administration Authentication Who am I/prove it? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & in motion Kerberos in native Apache Hadoop HTTP/REST API Secured with Apache Knox Gateway HDFS Permissions, HDFS ACL, Audit logs in with HDFS & MR Hive ATZ-NG Wire encryption in Hadoop Open Source Initiatives Partner Solutions HDP 2.1 As-Is, works with current authentication methods HDFS, Hive and Hbase Fine grain access control RBAC Centralized audit reporting Policy and access history Future Integration Argus

5 Map to Nevada Energy Requirements
Questions HDP Security Component End User Security LDAP Integration Kerberos, Argus (XA) Group level access Argus(XA) Multiple level of access Multiple Environments Developer Security Access control for creating tables Limit of creating scheme, creating folders

6 HDP w/ Advanced Security
Security Features HDP w/ Advanced Security Authentication Kerberos Support Perimeter Security – For services and rest API Authorizations Fine grained access control HDFS, Hbase and Hive Role base access control Column level Permission Support Create, Drop, Index, lock, user Auditing Resource access auditing Extensive Auditing Policy auditing

7 HDP w/ Advanced Security
Security Features HDP w/ Advanced Security Data Protection Wire Encryption Volume Encryption File/Column Encryption Partners Reporting Global view of policies and audit data Manage User/ Group mapping Global policy manager, Web UI Delegated administration

8 Authentication w/ Kerberos

9 Kerberos Primer KDC NN Client DN 5. Read/write file given NN-ST and
file name; returns block locations, block IDs and Block Access Tokens if access permitted NN 1. kinit - Login and get Ticket Granting Ticket (TGT) 3. Get NameNode Service Ticket (NN-ST) Client Client talks to KDC with Kerberos Library Orange line – Client to KDC communication Green line – Client to HDFS communication, does not talk to Kerberos/KDC DN 2. Client Stores TGT in Ticket Cache 4. Client Stores NN-ST in Ticket Cache 6. Read/write block given Block Access Token and block ID Client’s Kerberos Ticket Cache

10 Kerberos Summary Provides Strong Authentication
Establishes identity for users, services and hosts Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for Authorization Strong Authentication = Password never sent over the wire

11 Hadoop Authentication
Users authenticate with the services CLI & API: Kerberos kinit or keytab Web UIs: Kerberos SPNego or custom plugin (e.g. SSO) Services authenticate with each other Prepopulated Kerberos keytab e.g. DN->NN, NM->RM Services propagate authenticated user identity Authenticated trusted proxy service e.g. Oozie->RM, Knox->WebHCat Job tasks present delegated user’s identity/access Delegation tokens e.g. Job task -> NN, Job task -> JT/RM Strong authentication is the basis for authorization Client (User) Kerberos or Custom Name Node Data Node (Service) Kerberos Name Node Oozie (Service) Kerberos + (User) doas Job Tracker Task Name Node (User) Delegation Token

12 User Management Most customers use LDAP for user info
LDAP guarantees that user information is consistent across the cluster An easy way to manage users & groups The standard user to group mapping comes from the OS on the NameNode Kerberos provides authentication PAM can automatically log user into Kerberos

13 Kerberos + Active Directory
Use existing directory tools to manage users Use Kerberos tools to manage host + service principals AD / LDAP Cross Realm Trust KDC Hosts: Users: Services: User Store Client Authentication Hadoop Cluster

14 Groups Define groups for each required role
Hadoop has pluggable interface Mapping from user to group not stored within Hadoop Defaults to the OS information on master node Typically driven from LDAP on Linux Existing Plugins ShellBasedUnixGroupsMapping - /bin/id JniBasedUnixGroupsMapping – system call LdapGroupsMapping – LDAP call CompositeGroupMapping – combines Unix & LDAP group mapping Strong authentication and role-based groups provide protections enabling shared clusters

15 Groups AD / LDAP Client Hadoop Cluster Lookup Groups User Store Access
NameNode Plugin rw Access Client Hadoop Cluster

16 Kerberos FAQ Where do I install KDC? User Provisioning
On a master type node User Provisioning Hook up to Corporate AD/LDAP to leverage existing User Provisioning Growing a cluster Provision new services and nodes in MIT KDC, copy keytabs to new nodes Is Kerberos a SPOF? Kerberos support HA, with delegation tokens the KDC load is reduced

17 Perimeter REST API Security
Knox Gateway Overview Perimeter REST API Security

18 What does Perimeter Security really mean?
Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall REST API REST API User Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host Gateway Hadoop Services Firewall

19 Enterprise Integration
Why Knox? Enhanced Security Centralized Control Protect network details Partial SSL for non-SSL services WebApp vulnerability filter Central REST API auditing Service-level authorization Alternative to SSH “edge node” Simplified Access Enterprise Integration Kerberos encapsulation Extends API reach Single access point Multi-cluster support Single SSL certificate LDAP integration Active Directory integration SSO integration Apache Shiro extensibility Custom extensibility

20 Current Hadoop Client Model
FileSystem and MapReduce Java APIs HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs) Typical use of APIs is via “Edge Node” that is “inside” cluster Users SSH to Edge Node and execute API commands from shell User SSH Edge Node Hadoop

21 Hadoop REST APIs Service API WebHDFS
Supports HDFS user operations including reading files, writing to files, making directories, changing permissions and renaming. Learn more about WebHDFS. WebHCat Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL commands. Learn more about WebHCat. Hive Hive REST API operations, JDBC/ODBC over HTTP HBase HBase REST API operations Oozie Job submission and management, and Oozie administration. Learn more about Oozie. Useful for connecting to Hadoop from the outside the cluster

22 Hadoop REST API Security: Drill-Down
Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 DMZ Firewall Firewall HBase Edge Node/Hadoop CLIs RPC LB Knox Gateway GW REST Client HTTP HTTP HTTP Hadoop Cluster 2 Node the arrows to Hadoop Cluster are simplifications Actually there will be multiple arrow – one per port open between Knox and Hadoop Services it supports (WebHDFS, WebHCAT, HiveServer2, HBase, Oozie) & more in future Masters NN HBase RM Oozie Web HCat HS2 Slaves LDAP Enterprise Identity Provider LDAP/AD DN NM

23 Authorization and Auditing

24 Authorization and Audit
Fine grain access control HDFS – Folder, File Hive – Database, Table, Column HBase – Table, Column Family, Column Audit Extensive user access auditing in HDFS, Hive and HBase IP Address Resource type/ resource Timestamp Access granted or denied Flexibility in defining policies Control access into system

25 Central Security Administration
HDP Advanced Security Delivers a ‘single pane of glass’ for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack

26 Setup Authorization Policies
file level access control, flexible definition Control permissions

27 Monitor through Auditing

28 Authorization and Auditing w/ XA
XA Administration Portal Enterprise Users Legacy Tools RDBMS XA Audit Server XA Policy Server Integration API HDFS XA Plugin* Hadoop Components HBase XA Plugin Knox XA Plugin* Storm XA Plugin Hive Server2 XA Plugin* Falcon Hadoop distributed file system (HDFS) XA Plugin * - Future Integration YARN : Data Operating System

29 Simplified Workflow - Hive
Audit Database Audit logs pushed to DB 5 1 Admin sets policies for Hive db/tables/columns XA Policy Manager XA Agent IT users access Hive via beeline command tool 2 Hive Authorizes with XAAgent User Application 3 2 Hive Server2 Users access Hive data using JDBC/ODBC HiveServer2 provide data access to users 4 29

30 Data Protection HDP allows you to apply data protection policy at three different layers across the Hadoop stack Layer What? How ? Storage Encrypt data while it is at rest Partners, OS level encrypt, Custom Code Transmission Encrypt data as it moves Supported in HDP 2.1 Upon Access Apply restrictions when accessed Partners, Open Source Initiatives

31 Points of Communication
Hadoop Cluster Client Nodes Nodes 1 WebHDFS 2 DataTransferProtocol 2 DataTransfer 3 RPC 3 RPC 4 JDBC/ODBC 4 M/R Shuffle

32 Data Transmission Protection in HDP 2.1
WebHDFS Provides read/write access to HDFS Optionally enable HTTPS Authenticated using SPNEGO (Kerberos for HTTP) filter SSL based wire encryption RPC Communications between NNs, DNs, etc. and Clients SASL based wire encryption DTP encryption with SASL JDBC/ODBC Also available SASL based encryption Shuffle Mapper to Reducer over HTTP(S) with SSL

33 Data Storage Protection
Encrypt at the physical file system level (e.g. dm-crypt) Encrypt via custom HDFS “compression” codec Encrypt at Application level (including security service/device) ABC DEF DEF ABC Security Service (Partner) ETL App ENCRYPT DECRYPT HDFS ABC 1a3d

34 Current Open Source Initiatives
HDFS Encryption Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community Dependency on Key Management Server and Keyshell Key Management Server Key Provider API Hive Column Level Encryption HBase Column Level Encryption Transparent Column Encryption, needs more testing/validation Command line Key Operations

35 Resources

36 Security Page

37 Hortonworks Security Investment Plans
Investment themes HDP + XA Comprehensive Security for Enterprise Hadoop Previous Phases Kerberos Authentication HDFS, Hive & Hbase authorization Wire Encryption for data in motion Knox for perimeter security Basic Audit in HDFS & MR SQL Style Hive Authorization ACLs for HDFS Delivered Goals: Comprehensive Security Meet all security requirements across Authentication, Authorization, Audit & Data Protection for all HDP components XA Secure Phase Centralized Security Admin for HDFS, Hive & HBase Centralized Audit Reporting Delegated Policy Administration Delivered XA Secure Central Administration Provide one location for administering security policies and audit reporting for entire platform Future Phases Encryption in HDFS, Hive & Hbase Centralized security administration of entire Hadoop platform Centralized auditing of entire platform Expand Authentication & SSO integration choices Tag based global policies (e.g. Policy for PII) Consistent Integration Integrate with other security & identity management systems, for compliance with IT policies …all IN Hadoop


Download ppt "Hortonworks. We do Hadoop."

Similar presentations


Ads by Google