Presentation is loading. Please wait.

Presentation is loading. Please wait.

RABA’s Red Team Assessments 14 December 2005 QuickSilver.

Similar presentations

Presentation on theme: "RABA’s Red Team Assessments 14 December 2005 QuickSilver."— Presentation transcript:

1 RABA’s Red Team Assessments 14 December 2005 QuickSilver

2 Agenda Tasking for this talk… Projects Evaluated Approach / Methodology Lessons Learned o and Validations Achieved The Assessments o General Strengths / Weaknesses o AWDRAT (MIT) Success Criteria Assessment Strategy Strengths / Weaknesses o LRTSS (MIT) o QuickSilver / Ricochet (Cornell) o Steward (JHU)

3 The Tasking “Lee would like a presentation from the Red Team perspective on the experiments you've been involved with. He's interested in a talk that's heavy on lessons learned and benefits gained. Also of interest would be red team thoughts on strengths and weaknesses of the technologies involved. Keeping in mind that no rebuttal would be able to take place beforehand, controversial observations should be either generalized (i.e., false positives as a problem across several projects) or left to the final report.” -- John Frank (November 28, 2005)

4 Specific Teams We Evaluated Architectural-Differencing, Wrappers, Diagnosis, Recover, Adaptive Software and Trust Management (AWDRAT) o October 18-19, 2005 o MIT Learning and Repair Techniques for Self-Healing Systems (LRTSS) o October 25, 2005 o MIT QuickSilver / Ricochet o November 8, 2005 o Cornell University Steward o Dec 9, 2005 o JHU

5 Basic Methodology Planning o Present High Level Plan at July PI Meeting o Interact with White Team to schedule o Prepare Project Overview o Prepare Assessment Plan Coordinate with Blue Team and White Team Learning o Study documentation provided by team o Conference Calls o Visit with Blue Team day prior to assessment Use system, examine output, gather data Test Formal De-Brief at end of Test Day

6 Lessons Learned (and VALIDATIONS achieved)

7 Validation / Lessons Learned Consistent Discontinuity of Expectations o Scope of the Assessment + Success Criteria Boiling it down to “Red Team Wins” or “Blue Team Wins” on each test required significant clarity o Unique to these assessments because the metrics were unique Lee/John instituted an assessment scope conference call ½ way through o we think that helped a lot o Scope of Protection for the systems Performer’s Assumptions vs. Red Team’s Expectations In all cases, we wanted to see a more holistic approach to the security model We assert each program needs to define its security policy o And especially document what it assumes will be protected / provided by other components or systems

8 LL: Scope of Protection

9 Validation / Lessons Learned More time would have helped A LOT o Longer Test Period (2-3 day test vice 1 day test) Having an evening to digest then return to test would have allowed more effective additional testing and insight o We planned an extra 1.5 days for most, and that was very helpful We weren’t rushing to get on an airplane We could reduce the data and come back for clarifications if needed We could defer non-controversial tests to the next day to allow focus with Government present More Communication with Performers o Pre-Test Site/Team Visit (~2-3 weeks prior to test) Significant help in preparing testing approach The half-day that we implemented before the test was crucial for us o More conference calls would have helped, too o Hard to balance against performers main focus, though

10 Validation / Lessons Learned A Series of Tests Might Be Better o Perhaps one day of tests similar to what we did o Then a follow-up test a month or two later as prototypes matured With the same test team to leverage understanding of system gained We Underestimated the Effort in Our Bid o Systems were more unique and complex than we anticipated o 20-25% more hours would have helped us a lot in data reduction Multi-talented team proved vital to success o We had programming (multi-lingual), traditional red team, computer security, systems engineering, OS, system admin, network engineering, etc. talent present for each test Highly tailored approach proved appropriate and necessary o Using more traditional network-oriented Red Team Assessment approach would have failed

11 The Assessments

12 Overall Strengths / Weaknesses of Projects Strengths o Teams worked hard to support our assessments o The technologies are exciting and powerful Weaknesses o Most Suffered a Lack of System Documentation We understand there is a balance to strike – these are research prototypes essentially after all Really limited ability to prepare for assessment o All are Prototypes -- stability needed for deterministic test results o All provide incomplete security / protection almost by definition o Most Suffered a Lack of Configuration Management / Control o Test “Harnesses” far from optimal for Red Team use Of course, they are oriented around supporting the development But, we’re fairly limited in using other tools due to uniquenesses of the technologies

13 AWDRAT Assessment October 18-19, 2005

14 Success Criteria The target application can successfully and/or correctly perform its mission The AWDRAT system can o detect an attacked client’s misbehavior o interrupt a misbehaving client o reconstitute a misbehaving client in such a way that the reconstituted client is not vulnerable to the attack in question The AWDRAT system must o Detect / Diagnose at least 10% of attacks/root causes o Take effective corrective action on at least 5% of the successfully identified compromises/attacks

15 Assessment Strategy Denial of Service o aimed at disabling or significantly modifying the operation of the application to an extent that mission objectives cannot be accomplished o attacks using buffer-overflow and corrupted data injection to gain system access False Negative Attacks o a situation in which a system fails to report an occurrence of anomalous or malicious behavior o Red Team hoped to perform actions that would fall "under the radar". We targeted the modules of AWDRAT that support diagnosis and detection. False Positive Attacks o system reports an occurrence of malicious behavior when the activity detected was non-malicious o Red Team sought to perform actions that would excite AWDRAT's monitors. Specifically, we targeted the modules supporting diagnosis and detection. State Disruption Attacks o interrupt or disrupt AWDRAT's ability to maintain its internal state machines Recovery Attacks o disrupt attempts to recover or regenerate a misbehaving client o target the Adaptive Software and Recovery and Regeneration modules in an attempt to allow a misbehaving client to continue operating

16 Strengths / Weaknesses Strengths o With a reconsideration of system’s scope of responsibility, we anticipate the system would have performed far better in the tests o We see great power in the concept of wrapping all the functions Weaknesses o Scope of Responsibility / Protection far too Limited o Need to Develop Full Security Policy o Single points of failure o Application-Specific Limitations o Application Model Issues Incomplete – by design? Manually Created Limited Scope Doesn’t really enforce multi-layered defense

17 LRTSS Assessment October 25, 2005

18 Success Criteria The instrumented Freeciv server does not core dump under a condition in which the uninstrumented Freeciv server does core dump The LRTSS system can o Detect a corruption in a data structure that causes an uninstrumented Freeciv server to exit o Repair the data corruption in such a way that the instrumented Freeciv server can continue running The LRTSS system must o Detect / Diagnose at least 10% of attacks/root causes o Take effective corrective action on at least 5% of the successfully identified compromises/attacks

19 Assessment Strategy Denial of Service o Aimed at disabling or significantly modifying the operation of the Freeciv server to an extent that mission objectives cannot be accomplished o In this case, not achieving mission objectives is defined as the Freeciv server exits or dumps core o Attacks using buffer-overflow, corrupted data injection, and resource utilization o Various data corruptions aimed at causing the server to exit o Formulated the attacks by targeting the uninstrumented server first, then running the same attack against the instrumented server State Disruption Attacks o interrupt or disrupt LRTSS's ability to maintain its internal state machines

20 Strengths / Weaknesses Strengths o Performs very well under simple data corruptions (that would cause the system to crash under normal operation) o Performs well under a large number of these simple data corruptions (200 to 500 corruptions are repaired successfully) o Learning and Repair algorithms well thought out Weaknesses o Scope of Responsibility / protection too limited o Complex Data Structure Corruptions not handled well o Secondary Relationships are not protected against o Pointer Data Corruptions not entirely tested o Timing of Check and Repair Cycles not optimal o Description of “Mission Failure” as core dump may be excessive

21 QuickSilver Assessment November 8, 2005

22 Success Criteria Ricochet can successfully and/or correctly perform its mission o “Ricochet must consistently achieve a fifteen-fold reduction in latency (with benign failures) for achieving consistent values of data shared among one hundred to ten thousand participants, where all participants can send and receive events." Per client direction, elected to use average latency time as the comparative metric o Ricochet’s Average Recovery demonstrates 15-fold improvement over SRM o Additional constraint levied requiring 98% update saturation (imposing the use of the NACK failover for Ricochet)

23 Assessment Strategy Scalability Experiments -- test scalability in terms of number of groups per node and number of nodes per group. Here, no node failures will be simulated, and no packet losses will be induced (aside from those that occur as a by-product of normal network traffic). o Baseline Latency o Group Scalability o Large Repair Packet Configuration o Large Data Packet Storage Configuration Simulated Node Failures – simulate benign node failures. o Group Membership Overhead / Intermittent Network Failure Simulated Packet Losses – introduce packet loss into the network. o High Packet Loss Rates Node-driven Packet Loss Network-driven Packet Loss Ricochet-driven Packet Loss o High Ricochet Traffic Volume o Low Bandwidth Network Simulated Network Anomalies – simulate benign routing and network errors that might exist on a deployed network. The tests will establish whether or not the protocol is robust in its handling of typical network anomalies, as well as those atypical network anomalies that may be induced by an attacker. o Out of Order Packet Delivery o Packet Fragmentation o Duplicate Packets o Variable Packet Sizes

24 Strengths / Weaknesses Strengths o Appears to be very resilient when operating within its assumptions o Very stable software o Significant performance gains over SRM Weaknesses o FEC-orientation - focus in statistics belies valuable data regarding complete packet delivery o Batch-oriented Test Harness – Impossible to perform interactive attacks Very limited insight into blow-by-blow performance o Metrics collected were very difficult to understand fully

25 STEWARD Assessment December 9, 2005

26 Success Criteria The STEWARD system must: o Make progress in the system when under attack. Progress is defined as the eventual global ordering, execution, and reply to any request which is assigned a sequence number within the system o Maintain a consistency of data replicated on each of the servers in the system

27 Assessment Strategy Validation Activities - tests we will perform to verify that STEWARD can endure up to five Byzantine faults while maintaining a three-fold reduction in latency with respect to BFT o Byzantine Node Threshold o Benchmark Latency Progress Attacks - attacks we will launch to prevent STEWARD from progressing to a successful resolution of an ordered client request o Packet Loss o Packet Delay o Packet Duplication o Packet Re-ordering o Packet Fragmentation o View Change Message Flood o Site Leader Stops Assigning Sequence Numbers o Site Leader Assigns Non-Contiguous Sequence Numbers o Suppressed New-View Messages o Consecutive Pre-Prepare Messages in Different Views o Out of Order Messages o Byzantine Induced Failover Data Integrity Attacks - attempts to create an inconsistency in the data replicated on the various servers in the network o Arbitrarily Execute Updates o Multiple Pre-Prepare Messages using Same Sequence Numbers and Different Request Data o Spurious Prepare, Null Messages o Suppressed Checkpoint Messages o Prematurely Perform Garbage Collection o Invalid Threshold Signature Protocol State Attacks - attacks focused on interrupting or disrupting STEWARD's ability to maintain its internal state machines o Certificate Threshold Validation Attack o Replay Attack o Manual Exploit of Client or Server Note: We did not try to validate or break the encryption algorithms.

28 Strengths / Weaknesses Strengths o First system that assumes and actually tolerates corrupted components (Byzantine attack) o Blue Team spent extensive time up front in analysis, design and proof of the protocol – it was clear in the performance o System was incredibly stable and resilient o We did not compromise the system Weaknesses o Limited Scope of Protection Relies on external entity to secure and manage keys which are fundamental to the integrity of the system STEWARD implicitly and completely trusts the client o Client-side attacks were out of scope of the assessment

29 Going Forward White Team will generate definitive report on this Red Team Test activity o It will have the official scoring and results RABA (Red Team) will generate a test report from our perspective o We will publish to: PI for the Project White Team (Mr. Do) DARPA (Mr. Badger)

30 Questions or Comments Any Questions, Comments, or Concerns?

Download ppt "RABA’s Red Team Assessments 14 December 2005 QuickSilver."

Similar presentations

Ads by Google