Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Solve BigData Security Puzzle?

Similar presentations


Presentation on theme: "How to Solve BigData Security Puzzle?"— Presentation transcript:

1 How to Solve BigData Security Puzzle?

2 Two Reasons for Security in BigData
1 Hadoop Contains Sensitive Data As Hadoop adoption grows so too has the types of data organizations look to store. Often the data is proprietary or personal and it must be protected. In this context, Hadoop is governed by the same security requirements as any data center platform. Hadoop is subject to Compliance adherence Organizations are often subject to comply with regulations such as HIPAA, PHI, PCI, DSS, FISAM that require protection of personal information. Adherence to other Corporate security policies. 2

3 BigData Security: Key Aspects
Centralized Security Administration Kerberos API security Authentication Who am I/prove it? Authorization What can I do? Fine grain access control Centralized audit reporting Audit What did I do? Wire encryption in Hadoop Native and partner encryption Data Protection Can data be encrypted at rest and over the wire?

4 Authentication: Kerberos
Provides Strong Authentication Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for Authorization

5 API Security Pattern Hadoop Cluster Application Tier App A App N App B
App C Data Ingest ETL Admin/ Operators Bastian Node SSH RPC Call Falcon Oozie Scoop Flume Data Operator Business User JDBC/ODBC REST/HTTPS API Gateway

6 API Security …. Why we need it? Challenges/Limitations
Extend reach of Hadoop APIs to Anyone on Any device Enterprise authentication Apply Enterprise capabilities to All REST APIs – IdM Integration, SSO, OAuth, SAML Avoid exposing Cluster port, hostnames to all users Challenges/Limitations Not suitable for heavy data ingestion activities Supports specific services WebHDFS (HDFS) Templeton (HCatalog) Stargate (HBase) Oozie Hive/JDBC Yarn RM Storm

7 Authorization and Audit
Fine grain access control HDFS – Folder, File Hive – Database, Table, Column HBase – Table, Column Family, Column Storm, Knox and more Audit Extensive user access auditing in HDFS, Hive and Hbase etc. IP Address Resource type/ resource Timestamp Access granted or denied Flexibility in defining policies Monitoring through Auditing

8 Authorization Apache Ranger
Delivers a ‘single pane of glass’ for the security administrator Centralizes administration of security policy Ensures consistent coverage across the entire Hadoop stack

9 Audit

10 Data Protection in Hadoop
must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Direct data flows “into” and “out of” 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption). Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Direct data flows “into” and “out of” 3rd party encryption tools.

11 Data Protection: Data at Rest
Encryption Data converted to binary Ciphertext using mathematical algorithm. Can be one-way (Hash) or reversible (Symmetric/Asymmetric). Tokenization Real data is replaced with randomly generated characters of same data type.

12 Data Protection: Transmission
WebHDFS Provides read/write access to HDFS Optionally enable HTTPS Authenticated using SPNEGO (Kerberos for HTTP) filter RPC Communications between NNs, DNs, etc. and Clients SASL based wire encryption JDBC/ODBC SSL based wire encryption Shuffle Mapper to Reducer over HTTP(S) with SSL

13 Data Protection: Upon Access
Hadoop Ecosystem KMS Encryption server/agent Ingestion stream Hadoop Component UDFs (e.g. HIVE) Encrypt/ Decrypt Reporting HDFS File/Directory encryption – HDFS Column level encryption – Hadoop Ecosystem components or ingestion framework Data tokenization – Encryption server/agent

14 Infrastructure Security
A physically zoned off Data lake using Firewall with network segmentation Data lake cluster nodes running on specific ports open & IP tables turned on SSSD or Centrify for users and groups OS Hardening Security scans to shut down non-essentials services on individual hosts. Up-to date patching levels applied to the server.

15 Enterprise-wide Authorization Policy Management
AD is identity store with existing user and group provisioning process as prescribed at an enterprise level. Active Directory Group & User Group & user sync Enterprise Policy authorization Software Integration API Hadoop Policy Server DBMS Enterprise Policy authorization software is the enterprise level repository for defining policies across all applications managed under this umbrella for a seamless authorization experience based on role and group participations in AD. Policy store Hadoop Component Hadoop Ecosystem

16 Application Access Security Pattern
LDAP Sync users/groups from LDAP Policy Manager SSL SSL SSL 2.Knox Authenticates user/pass 5. Ranger AuthZ HDFS HiveServer 2 1.Original request w/user id/password Gateway Security 6.Hive creates map reduce using NN ST 4.Knox calls as proxy user SASL A B C SSL SSL Client gets query result Use Hive ST, submit query O/JDBC Client Hive gets Namenode (NN) service ticket At rest data encryption option at HDFS level 3.Knox gets service ticket for Hive KDC

17 Lessons Learned .. Start Early Think Enterprise Engage Experts
Collaborate and Socialize Manage Vulnerabilities Monitor, Report and Audit


Download ppt "How to Solve BigData Security Puzzle?"

Similar presentations


Ads by Google