1Cloud Computing: hadoop Security Design -2009 *All opinions and information are mine and do not represent the view(S) of my employerKaveh NoorbakhshKent State: CSOwen O’Malley | Kan Zhang | Sanjay Radia | Ram Marti | Christopher Harrell | Yahoo!
2Brief History: Cloud Computing as a Service 1961:John McCarthy Introduces Concept of Cloud Computing as a business model1969ARAPANET1997“Cloud Computing” coined by Ramnath Chellappa1999Saleforce.comEnterprise Applications via simple web interface2002Amazon Web Services2004HDFS & Map/Reduce in Nutch2006Google DocsAmazon EC2Yahoo hires Doug Cutting2008Eucalyptus1st Open Source AWS API for Private CloudsOpenNebulaPrivate and Hybrid cloudsHadoop hits web scale2009MS AzureAmazon RDSMySQL supported2011Amazon RDS supports OracleOffice 365
6Two LayersMapReduce:Code runs hereHDFS:Data lives here
7Advantages of the Cloud Database as a Service = DBaaSInfrastructure as a Service = IaasSoftware as a Service = SaaSPlatform as a Service = PaaSShare hardware and energy costsShare employee costsFast spin-up and tear downExpand quickly to meet demandsCosts ideally proportional to usageScalability
10Security Challenges of the Cloud Where is my data living?You may not know where you data is exactly since the data can be distributed among many physical disksWhere is my data going?In the cloud, especially in map/reduce, data is constantly in moving from node to node and nodes may be across multiple mini-cloudsWho has access to my data?There may be other clients using the cloud, as well as, administrators and others who maintained the cloud that could have access to the data if it is not properly protected.
11Hadoop Security Concerns Hadoop services do not authenticate users or other services.(a) A user can access an HDFS or MapReduce cluster as any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented.(b) An attacker can masquerade as Hadoop services. For example, user code running on a MapReduce cluster can register itself as a new TaskTracker.DataNodes do not enforce any access control on accesses to its data blocks. This makes it possible for an unauthorized client to read a data block as long as she can supply its block ID. It’s also possible for anyone to write arbitrary data blocks to DataNodes.
12Security Requirements for Hadoop Users are only allowed to access HDFS files that they have permission to access.Users are only allowed to access or modify their own MapReduce jobs.User to service mutual authentication to prevent unauthorized NameN- odes, DataNodes, JobTrackers, or TaskTrackers.Service to service mutual authentication to prevent unauthorized services from joining a cluster’s HDFS or MapReduce service.The degradation of performance should be no more than 3%.
13Proposed Solution – Use Case 1 Accessing Data1) User/App requests access to a data block.2) Name Node authenticates and gives the user a block token.3) User/App uses block token on Data Node to access block for READ, WRITE, COPY or REPLACE.
14Proposed Solution – Use Case 2 Submitting Jobs1) A user may obtain a delegation token through Kerberos.2) Token given to user jobs for subsequent authentication to NameNode as the user.3) Jobs can use the delegation token to access data that user/app has access to
15Core Principles Analysis ConfidentialityAnalysisUsers/Apps will only have access to the data blocks they should have via block tokensPass
16Core Principles Analysis IntegrityAnalysisData is only available at the block level if the block token matches.There is an assumption that the data is good because the blocks are not checkedPassFail
17Core Principles Analysis AvailabilityAnalysisJob Tracker and Name Nodes are single points of failure for system.Tokens persist for a small period of time so the system is resilient to short outages of Name Node and Job TrackerFailPass
18ConclusionThe token method for authentication for both data and process access makes sense in a highly distributed system like hadoop. However, the fact that tokens have so much power and are not constantly re-checked leaves this design open to very serious TOCTOU attacks.As compared to the currently model(aka no security) this represents a major step forward.