Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 5150 1 CS 5150 Software Engineering Lecture 22 Reliability 3.

Similar presentations


Presentation on theme: "CS 5150 1 CS 5150 Software Engineering Lecture 22 Reliability 3."— Presentation transcript:

1 CS 5150 1 CS 5150 Software Engineering Lecture 22 Reliability 3

2 CS 5150 2 Administration Final presentations Sign up for your presentations now. Weekly progress reports Remember to send your progress reports to your TA

3 CS 5150 3 The Heisenbug

4 CS 5150 4 Failures and Faults Failure: Software does not deliver the service expected by the user (e.g., mistake in requirements, confusing user interface) Fault (BUG): Programming or design error whereby the delivered system does not conform to specification (e.g., coding error, interface error)

5 CS 5150 5 Faults and Failures Actual examples (a)An application crashes with an emulator, even though the emulator is bug free. (Compensating bug problem.) (b)A mathematical function never converges with badly formulated data. (Round off error.) (c)After a network is hit by lightning, it crashes on restart. (Problem of incremental growth.) (d) The head of an organization is paid $5 a month instead of $10,005 because the maximum salary allowed by the program is $10,000. (Requirements problem.) (e) An operating system fails because of a page-boundary error in the firmware. (Different operating environment problem.)

6 CS 5150 6 Terminology Fault avoidance Build systems with the objective of creating fault- free (bug-free) software Fault tolerance Build systems that continue to operate when faults (bugs) occur Fault detection (testing and validation) Detect faults (bugs) before the system is put into operation.

7 CS 5150 7 Fault Avoidance Software development process that aims to develop zero-defect software. Careful requirements and detailed specification Incremental development with customer input Constrained programming options Static verification Statistical testing It is always better to prevent defects than to remove them later. Example: The four color problem.

8 CS 5150 8 Defensive Programming Murphy's Law: If anything can go wrong, it will. Defensive Programming: Redundant code is incorporated to check system state after modifications. Implicit assumptions are tested explicitly. Risky programming constructs are avoided.

9 CS 5150 9 Defensive Programming: Error Avoidance Avoid risky programming constructs except where really necessary Pointers Dynamic memory allocation Floating-point numbers Parallelism Recursion Interrupts Multi-threading All are valuable in certain circumstances, but should be used with discretion

10 CS 5150 10 Defensive Programming Examples Use boolean variable not integer Test i <= n not i == n Assertion checking (e.g., validate parameters) Build debugging code into program with a switch to display values at interfaces Error checking codes in data (e.g., checksum or hash)

11 CS 5150 11 Some Notable Bugs Even commercial systems may have serious bugs Built-in function in Fortran compiler (e 0 = 0) Japanese microcode for Honeywell DPS virtual memory The microfilm plotter with the missing byte (1:1023) The Sun 3 page fault that IBM paid to fix Left handed rotation in the graphics package The preload system with the memory leak Good people work around problems. The best people track them down and fix them!

12 CS 5150 12 Fault Tolerance Aim: A system that continues to operate even when problems occur. Examples: Invalid input data (e.g., in a data processing application) Overload (e.g., in a networked system) Hardware failure (e.g., in a control system) General Approach: Failure detection Damage assessment Fault recovery Fault repair

13 CS 5150 13 Fault Tolerance Basic Techniques: Timers and timeout in real-time and networked systems After error continue with next transaction (e.g., discard data record, drop packet) Allow user break options (e.g., force quit, cancel) Use error correcting codes in data (e.g., RAID) Spare hardware (e.g., bad block tables on disk drives) Redundancy in data structures (e.g., forward and backward pointers) Report all errors for quality control

14 CS 5150 14 Fault Tolerance Backward Recovery: Record system state at specific events (checkpoints). After failure, recreate state at last checkpoint. Combine checkpoints with system log (audit trail of transactions) that allows transactions from last checkpoint to be repeated automatically. Test the restore software!

15 CS 5150 15 Fault Tolerance Google and Hadoop Files Systems Clusters of commodity computers (1,000+ computers, 1,000+ TB) "Component failures are the norm rather than the exception.... We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors,networking, and power supplies." Data is stored in large chunks (64 MB). Each chunk is replicated, typically with three copies. If component fails, new replicas are created automatically. Ghemawat, et al., The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003

16 CS 5150 16 Fault Tolerance N-version programming Execute independent implementation in parallel, compare results, accept the most probable. Used when extreme reliability is required with no opportunity to repair (e.g., space craft). Difficulty is to ensure that the implementations are truly independent (e.g., separate power supplies, sensors, algorithms).

17 CS 5150 17 Software Engineering for Real Time The special characteristics of real time computing require extra attention to good software engineering principles: Requirements analysis and specification Special techniques (e.g., locks on data, semaphores, etc.) Development of tools Modular design Exhaustive testing Heroic programming will fail!

18 CS 5150 18 Software Engineering for Real Time Testing and debugging need special tools and environments Debuggers, etc., can not be used to test real time performance Simulation of environment may be needed to test interfaces -- e.g., adjustable clock speed General purpose tools may not be available

19 CS 5150 19 Validation and Verification Validation: Are we building the right product? Verification: Are we building the product right? In practice, it is sometimes difficult to distinguish between the two. That's not a bug. That's a feature!

20 CS 5150 20 Security in the Software Development Process The security goal The security goal is to make sure that the agents (people or external systems) who interact with a computer system, its data, and its resources, are those that the owner of the system would wish to have such interactions. Security considerations need to be part of the entire software development process. They may have a major impact on the architecture chosen. Example. Integration of Internet Explorer into Windows

21 CS 5150 21 Agents and Components A large system will have many agents and components: each is potentially unreliable and insecure components acquired from third parties may have unknown security problems commercial off-the-shelf (COTS) problem The software development challenge: develop secure and reliable components protect whole system so that security problems in parts of it do not spread to the entire system

22 CS 5150 22 Techniques: Barriers Place barriers that separate parts of a complex system: Isolate components, e.g., do not connect a computer to a network Firewalls Require authentication to access certain systems or parts of systems Every barrier imposes restrictions on permitted uses of the system Barriers are most effective when the system can be divided into subsystems with simple boundaries

23 CS 5150 23 Techniques: Authentication & Authorization Authentication establishes the identity of an agent: What the agent knows (e.g., password) What the agent possess (e.g., smart card) Where does the agent have access to (e.g., crt-alt-del) What are the physical properties of the agent (e.g., fingerprint) Authorization establishes what an authenticated agent may do: Access control lists Group membership

24 CS 5150 24 Example: An Access Model for Digital Content Digital material Attributes User Roles Actions Operations Access Policies

25 CS 5150 25 Techniques: Encryption Allows data to be stored and transmitted securely, even when the bits are viewed by unauthorized agents Private key and public key Digital signatures Encryption Decryption X Y Y X

26 CS 5150 26 Security and People People are intrinsically insecure: Careless (e.g, leave computers logged on, use simple passwords, leave passwords where others can read them) Dishonest (e.g., stealing from financial systems) Malicious (e.g., denial of service attack) Many security problems come from inside the organization: In a large organization, there will be some disgruntled and dishonest employees Security relies on trusted individuals. What if they are dishonest?

27 CS 5150 27 Design for Security: People Make it easy for responsible people to use the system Make it hard for dishonest or careless people (e.g., password management) Train people in responsible behavior Test the security of the system thoroughly and repeatedly, particularly after changes Do not hide violations

28 CS 5150 28 Suggested Reading Trust in Cyberspace, Committee on Information Systems Trustworthiness, National Research Council (1999) http://www.nap.edu/readingroom/books/trust/ Fred Schneider, Cornell Computer Science, was the chair of this study.


Download ppt "CS 5150 1 CS 5150 Software Engineering Lecture 22 Reliability 3."

Similar presentations


Ads by Google