Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA SECURITY AND BIG DATA Carole Murphy November 20, 2013.

Similar presentations


Presentation on theme: "DATA SECURITY AND BIG DATA Carole Murphy November 20, 2013."— Presentation transcript:

1 DATA SECURITY AND BIG DATA Carole Murphy November 20, 2013

2 Big Data Conferences Major conferences are opportunities to learn, meet colleagues, see vendor demos

3 Executive Summary Five Things You Need To Know About Big Data Security 1. Time-to-insight is even more important than cost savings as a business driver for Hadoop 2. Unless you take action, security is likely addressed later, and then applies brakes to the Big Data project 3. Data security in the Hadoop ecosystem is about much more than authorization and authentication 4. Traditional data security solutions protect data at rest, but not in use or in motion. The best solutions retain data value even as they remove security and compliance obstacles to the project 5. Big Data presents an opportunity to address security and compliance across your IT environment. Look for adaptable and extensible security solutions 3

4 Big Data IS Now! Biggest growth drivers Accelerating Enterprise adoption Maturing software Increasingly sophisticated professional services Continued investment Transforming the Data Center “By 2017, Big Data will be the norm for data management…”* 4 *Forrester, The Top Emerging Technologies To Watch: Now Through 2018, by Brian Hopkins and Frank E. Gillett, February 7, 2013

5 Background Big Data – What’s Different? ETLDWBI Raw Load HadoopBI 5 Data coming from many sources Doesn’t need a schema –Dump raw loads of data into Hadoop Hadoop processing is so fast –Compute in minutes what would take a night to batch process BI is real-time –Ask questions you didn’t know you needed to ask Elephant in the room –Data “lake” many times cheaper than DW path

6 ETL Offload Use Case* 6 Hadoop (HDFS, Map Reduce, Pig) BI * Presented by MapR at Hadoop Summit, San Jose, June 2013

7 Taming the Explosion in Data Optimizing Time-to-Insight The explosion in data fuels growth and agility But time to data value is gated by risk and compliance Attacks to data are here to stay, and big data means a big target Balancing data access and data security is critical 7 “90% of the data in the world today has been created in the last two years alone”* - IBM Parabolic growth in data created and consumed* - Cisco

8 Risk Increases as Data Moves to Cloud and Big Data Environments Risk Increases 8 Individual Apps Mainframes OLTP Data Warehouse (Oracle, Teradata, Netezza, etc.) Hadoop Cloud Not created for the enterprise Security is just starting to be bolted on Who has control of your data?

9 Extracting Value from Data Big Data Includes Sensitive Data Marketing – analyze purchase patterns Social media – find best customer segments Financial systems – model trading data Banking and insurance – 360° customer view Security – identify credit card fraud Healthcare – advance disease prevention 9 How do you liberate the value in data – without increasing risk?

10 Why Projects Get Stopped Hidden Risks in Big Data Adoption Big Data Enables deeper data analysis More value from old data New risks if data is not protected 10 Data Concentration Risks –Financial position –Market position –Corporate Compliance risk Cloud Adoption Risks –Sensitive data in untrusted systems –Data in storage, in use, transmitted to cloud Data Sharing Risks –Compliance challenges with 3 rd party risk –Cross-border, data residency –Data in and out of the enterprise Breach Risks –Internal users –External shares –Backup’s, Hadoop stores, data feeds

11 Take Advantage of Big Data Benefits Identifying an Effective Data Security Strategy Integrate security, enable access Protect sensitive data before entering Hadoop, in Hadoop and on the way out Enable accurate analytics on encrypted data Assure compliance Address global compliance comprehensively Reduce audit scope for PCI to cut costs Provable, verified, published, peer-reviewed, NIST recognized security techniques Optimize performance and extensibility High performance Adapt to the newest tools in the ecosystem Fit into infrastructure, fast and easy to implement 11

12 Options for Security Hadoop Community SSL Disabled by default; doesn’t cover all paths, adds latency and CPU load Existing Hadoop access controls Kerberos is still the primary way to secure a Hadoop cluster Not fine-grained, can’t limit by data type or column Inappropriate access post-analysis Sentry from Cloudera Offers permission controls accessing data through Hive Knox from Hortonworks Gateway server provides a single point of authentication and access for Hadoop services in a cluster MapR native authentication and authorization Transparent integration with Kerberos OR option for native authentication 12

13 Options for Security Commercial Data Security Products Container-based encryption Data-at-rest security at the block or file level Do you want different people/applications to have access to different data types? Traditional data-masking 1-way-only limits use cases (e.g. fraud analysis) Technique doesn’t support production use cases Application level Encryption and tokenization options Consider standards-based approaches, key management 13

14 Goals All sensitive data must be stored on disk in protected form (encrypted or tokenized) Compliance requirements (PCI, HIPAA) Disks are often removed from data center for servicing There are many ways that data can flow into HDFS Such as unstructured data being copied directly in Sensitive data also should be protected during analysis Because Hadoop has insufficient access controls Provide access controls to data based on data type and project (data set) 14

15 Solutions for Handling Structured and Unstructured Data Disk Volume-level (whole file) encryption Enables compliance Covers unstructured data, from all sources Provides protection against drive loss Good, but may not be sufficient Doesn’t reduce audit scope for PCI DSS Access controls in Hadoop can’t control user access at the field level, so access to the cluster may need to be restricted to pass PCI or HIPAA audit Field-level tokenization and/or encryption Enables wider use of the cluster by multiple teams Data sharing with certain fields remaining protected Protects against failures at multiple layers Required for regulatory compliance in many cases 15

16 All Hadoop Integration Options 16 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more

17 Protecting Data Inbound to Hadoop 17 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more Before Ingestion

18 Protecting Data Inbound to Hadoop 18 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more During Ingestion

19 Protecting Data Inbound to Hadoop 19 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more After Ingestion

20 Retrieving Clear Data from Hadoop 20 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more Before export/query

21 Retrieving Clear Data from Hadoop 21 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more During export/query

22 Retrieving Clear Data from Hadoop 22 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more After export/query

23 PCI Data – Keep Hadoop and Data Warehouse out of Audit Scope 23 ETL Batch Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Management, Tokenization and Policy Control Data Sources Landing Zone + more Sqoop Flume Map Reduce + more

24 PHI Data – Encrypted in Hadoop for HIPAA; Minimized Application Changes 24 Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources + more Sqoop Flume Map Reduce + more

25 Private Application Data – Critical part of Compliance – 100% Transparent 25 Data Warehouse BI Applications Storage Encryption HDFS Sqoop Hive Map Reduce Key Management, Tokenization and Policy Control Data Sources + more Sqoop Flume Map Reduce + more

26 Use Case: Healthcare Company Challenge Big Data team tasked with securing large multi- node Hadoop cluster for HIPAA, HITECH Challenging time-frames Solution Data de-identified in ETL move before entering Hadoop Ability to decrypt analytic results when needed, through multiple tools Benefits Ability to leverage medical data to develop more targeted marketing strategies and services to key demographics 26

27 Use Case: Multi-national Bank Challenge PCI compliance is #1 driver ETL offload use case with Hadoop alongside a traditional data warehouse Solution Integrate with Sqoop on ingestion; Hive on the applications / query side to protect dozens of data types Fraud analysts work with tokenized credit card numbers Benefits Enable fraud analytics directly on protected data in Hadoop Fraud analysts have ability to de-tokenize as needed with strict controls 27

28 Use Case: U.S. Military Organization Challenge US Surgeon General directive – share healthcare data with medical research institutes Maintain HIPAA/HITECH Compliance Solution De-identified 100+TB dataset at field level before release Format-preserving encryption enables distributed analytics in Hadoop Usable data values for accurate analytics Benefits Secure re-identification by Agency as needed Improved healthcare with compliance 28

29 Key Considerations Most Big Data projects are associated with Data Warehouse projects… What is your data warehouse strategy (e.g. expansion, ETL offload to Hadoop, integrating new data sources…)? What is your use case(s)? What does the business need? If you use de-identified data in Hadoop, would you ever need to get back to the original data? Will you have sensitive data going into Hadoop (PII, PCI, PHI)? What compliance or privacy regulations are you concerned about addressing? Do you need data protection across disparate systems (open systems to mainframe)? 29

30 Security Checklist to Make Big Data Safe Solves complex global compliance issues Ensures data stays protected wherever it goes Enables accurate analytics on encrypted data Optimizes performance Flexibly adapts to the fast-growing Hadoop ecosystem Reduces PCI audit scope where applicable 30

31 31 Copyright 2013 Voltage Security 31 About Voltage Security Origins: DARPA Funded Research at Stanford University Patented Innovations: 27 Unstructured data: Identity Based Encryption (IBE) Structured data: Format Preserving Encryption (FPE), Tokenization, Data Masking, Stateless Key Management Leader in large scale data-centric security solutions. Customers: Enterprise Customers/Government Agencies. Analyst Recognition: Gartner, Forrester, Burton IT1, Mercator Contact Voltage Security:

32 THANK YOU


Download ppt "DATA SECURITY AND BIG DATA Carole Murphy November 20, 2013."

Similar presentations


Ads by Google