Presentation is loading. Please wait.

Presentation is loading. Please wait.

SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications.

Similar presentations


Presentation on theme: "SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications."— Presentation transcript:

1 SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications Conference, 2009, pp.73-82. Presenter: Tsuei-Hung Sun ( 孫翠鴻 ) Date: 2010/9/17

2 2 Outline Introduction Motivation Contribution Scheme Security analysis Performance evaluation Comment

3 3 Introduction MapReduce – A parallel data processing model to simplify parallel data processing on large clusters. – Proposed by Google. – It is mainly running on clusters belonging to a single administration domain.  Yahoo’s Hadoop – Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (Amazon S3).

4 4 Introduction Fig. The MapReduce data processing reference model. M1. M2. M3. R1. R2. R3. (Distributed File System)

5 5 Introduction Fig. Combine multiple map and reduce phases.

6 6 Introduction Data processing service integrity  Replication-based techniques – Sampling techniques – Checkpoint-based verification

7 7 Motivation Existing address the service integrity, but not on data processing service. Replication-based techniques drawback – Replicate all distributed computing tasks for consistency verification is not efficiency. – Not scalable to perform centralized consistency verification over massive result data.

8 8 Contribution Decentralized replication-based integrity verification for MapReduce in open systems. Achieves security: non-repudiation, resilience to DoS attacks and replay attacks. Security components can be easily integrated into existing MapReduce implementations. Low performance overhead. The first attempt to address data processing service.

9 9 Scheme Three types of entities: a DFS, a master and workers. Assumptions 1. Worker has a public/private key pair associated with a unique worker identifier. 2. Master is trusted and its public key is known to all, workers are not necessarily be trusted. 3. Good worker is honest and always returns the correct result for its task while a bad worker may behave arbitrarily. 4. DFS provides data integrity protection, each node can verify the integrity of data. 5. If a worker is good, then others cannot tamper its data.

10 10 Scheme SecureMR - Architecture Design

11 11 Scheme SecureMR - Communication Design Commitment protocol Verification protocol

12 12 Scheme Commitment Protocol ID Map : a monotonically increasing identity of a map task. Data Loc : input data block location. sig: Master’s signature. K pubM : Mapper’s public key. sigM: Mapper’s signature. H P1,…,H Pr : hash value for each partition of its intermediate result Scheduler Task Executor Commit Manager

13 13 Scheme Verification Protocol Pi: partition of intermediate results that the reducer will process. AD M : Mapper’s address. H Pi : Pi partition committed by the Committer. Req Seq : sequence number. Task Executor Manager Scheduler Verifier Committer Verifier Committer Verifier Manager Verifier sigR

14 14 Scheme Extension for Reducers and MapReduce Chain Map Phase Map Phase Reduce Phase Reduce Phase Verify Phase Add Verifier component Add Committer component

15 15 Security analysis Collusive Attack - Attacker behavior analysis – Periodical Attacker Naive attacker Without collusion attacker With collusion attacker – Strategic Attacker

16 16 Security analysis Fig. Detection Rate for Non- Collusion Naive Attacker. Fig. Detection Rate for Non- Collusion Periodical Attacker. b = 20; Pm = 1 b = 20; Pm = 0.5 b : block number of one input job. Pm: misbehaving probability. l: misbehavior of mapper is detected when he do number of jobs.

17 17 Security analysis Fig. Detection Rate for Collusion Periodical Attacker. Fig. Misbehaving Probability vs. Duplication Rate. n : total worker number. m: malicious workers n = 50; Pm = 0.5; b=20; l = 15 n = 50; b =20; l = 15

18 18 Performance evaluation T: time D: data transmission cost. r: number of reducers.

19 19 Performance evaluation Fig. Response Time vs. Number of Reduce Tasks. Fig. Response Time vs. Data Size. number of map task = 60; Data Size = 1GB number of map task = 60; number of reduce task =25

20 20 Performance evaluation Fig. Response time vs. Duplication Rate. Fig. Response time vs. Number of Reduce Tasks. number of map task = 60; Data Size = 1GB

21 21 Comment Assign and Notify can combine into one step. Ticket M contain some parameters are the same as reducer sign part in request massage. If first request is failure, how can reducer do? (Ticket M and Req Seq how to renew) In Response massage, mapper can sign Data together that can avoid one hash and reducer also didn’t need to check it.

22 22 References MapReduce: Simplified Data Processing on Large Clusters http://labs.google.com/papers/mapreduce.html http://labs.google.com/papers/mapreduce.html Computer cluster http://en.wikipedia.org/wiki/Computer_cluster http://en.wikipedia.org/wiki/Computer_cluster Monotonic function http://en.wikipedia.org/wiki/Monotonic_function http://en.wikipedia.org/wiki/Monotonic_function

23 23 Analysis Function Naive attacker and without collusion attacker With collusion attacker

24 24 Analysis Function Strategic attacker

25 25 Appendix MapReduce: Simplified Data Processing on Large Clusters Computer cluster: A computer cluster is a group of linked computers, working together closely, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that of a single computer, while typically being much more cost- effective than single computers of comparable speed or availability. Monotonic function is a function which preserves the given order.

26 26 MapReduce in open system – Entities in MapReduce come from different domains, which are not always trusted. – Communications and data transferred among entities are through public networks. (be eavesdropped) Most of research focus on utilize MapReduce to solve problems in specific application, few work pays attention to service integrity protection.

27 27 Advantage vs. weakness Advantage – Inconsistencies for mappers No False Alarm Non-Repudiation – Ensure MapReduce data processing service integrity by scalable decentralized replication- based verification.

28 28 1) Provide mappers to examine the integrity of data blocks from the DFS. 2) Provide reducers to verify the authenticity and correctness of the intermediate results generated by mappers. 3) Provide users to check if the final result produced by reducers is authentic and correct. 4) Combination of three ensures the MapReduce data processing service integrity to users.

29 29 First step: ensures the integrity of inputs for MapReduce in open systems. Second step: provides reducers with the integrity assurance for their inputs. Third step: guarantees the authenticity and correctness of the final result for users.

30 30 DoS attacks – Sending requests to a good worker and asking for intermediate results. – Impersonate the master to send fake task assignments. Replay attack – Sending old task assignments to keep them busy. Eavesdrop attack Tamper the messages

31 31 Non-collusive malicious behavior – Independently, not necessarily agree or consult with each other. – Ex. Return wrong results for the same input, they may return different wrong results. (detected) Collusive malicious behavior – Depends on other collusive workers, communicate, exchange information, and make an agreement with each other. – Ex. Assigned tasks by the master, know their colluders receive tasks with the same input blocks. (not detected)


Download ppt "SecureMR: A Service Integrity Assurance Framework for MapReduce Author: Wei Wei, Juan Du, Ting Yu, Xiaohui Gu Source: Annual Computer Security Applications."

Similar presentations


Ads by Google