CSCI 5801: Software Engineering

CSCI 5801: Software Engineering
Software Reliability CSCI 5801: Software Engineering

Software Reliability

Software Reliability What you know after testing:
Software passes all cases in test suite What customer wants to know: Is the code well written in general? How often will it fail? What has to happen for it to fail? What happens when it fails?

Larger Context of Reliability
Fault detection (testing and validation) Detect faults before the system is put into operation Fault avoidance Build systems with the objective of creating fault-free software Fault tolerance Build systems that continue to operate when faults occur

Code Reviews Examining code without running it
Remove dependency on test cases Methodology: look for typical flaws Best done by others who have different POV Code walkthroughs done by other programmers Pair programming in XP Static analysis tools Goal: Detect flaws before they become faults (fault avoidance)

Code Walkthroughs Estimated to find 60% to 90% of code errors
Going through code by hand, statement by statement 90 – 125 statements/hour on average Team with ~4 members, with specific roles: Moderator: runs session, insure proceeds smoothly Code author Inspectors (at least 2) Scribe: writes down results/suggestions Estimated to find 60% to 90% of code errors

Code Walkthroughs Preparation Meeting
Developer provides colleagues with code listing and documentation Participants study the documentation in advance Meeting Developer leads reviewers through the code, describing what each section does and encouraging questions Inspectors look for possible flaws and suggest improvements

Code Walkthroughs Example checklist:
Data faults: Initialization, constants, array bounds, character strings Control faults: Conditions, loop termination, compound statements, case statements Input/output faults: All inputs used; all outputs assigned a value Interface faults: Parameter numbers, types, and order; structures and shared memory Storage management faults: Modification of links, allocation and de-allocation of memory Exceptions: Possible errors, error handlers

Static Analysis Tools Scan source code for possible faults and anomalies Lint for C programs PMD for Java Examples: Control flow: Loops with multiple exit or entry points Data use: Undeclared or uninitialized variables, unused variables, multiple assignments, array bounds Interface faults: Parameter mismatches, non-use of functions results, uncalled procedures Storage management: Unassigned pointers, pointer arithmetic Good programming practice eliminates all warnings from source code

PMD Example

Static Analysis Tools Cross-reference table: Shows every use of a variable, procedure, object, etc. Information flow analysis: Identifies input variables on which an output depends. Path analysis: Identifies all possible paths through the program.

Software reliability Definition: Probability that the system will not fail during a certain period of time in a certain environment Failures/CPU hour, etc. Questions: How much more testing is needed to reach required reliability? What is expected reliability gain for further testing?

Statistical Testing Testing software for reliability rather than fault detection Measuring the number of errors/transaction allows the reliability of the software to be predicted Key problem: Software will never be 100% reliable! An acceptable level of reliability should be specified in RSD, and the software tested and modified until that level of reliability is reached

Reliability Prediction
Reliability growth model Mathematical model of how system reliability is predicted to change over time as faults found and removed Extrapolated from current data about failures Can be used to determine whether system meets reliability requirements Mean time to failure Average failures per transaction Can be used to predict when testing will be completed and what level of reliability is feasible

Operational Profile Problem: Statistical testing requires large number of test cases for statistical significance (thousands) Where do such test cases come from? Often too many to create by hand Random generation not sufficient

Operational Profile Operational profile: Set of test data whose frequency matches the actual frequency of these inputs from ‘normal’ usage of the system Close match with actual usage is necessary or the measured reliability will not be reflected in the actual usage of the system Can be generated from real data collected from an existing system or (more often) depends on assumptions made about the pattern of usage of a system.

Example Operational Profile
Note that some types of inputs much more likely than others

LPM Estimates Logarithmic Poisson execution time model (LPM)
Major bugs found quickly Those major bugs cause most failures Effectiveness of fault correction decreases over time There exists a point at which further testing has little gain

Reliability prediction

Reliability Measurement Problems
Operational profile uncertainty The operational profile may not be an accurate reflection of the real use of the system High costs of test data generation Costs can be very high if the test data for the system cannot be generated automatically Statistical uncertainty You need a statistically significant number of failures to compute the reliability but highly reliable systems will rarely fail

Stress Testing Goal of stress testing: Determine what it will take to “break” system “Break” = no longer meets requirements in some way Functional: fails to perform required functions Reliability: fails more often than specified Performance: slower than required Approaches: Increase load/decrease resources until system breaks Perform “attacks” designed to produce undesirable result

Stress Testing Increase load on system in different ways
Number of students simultaneously adding courses Size of files/databases that must be read … Decrease resources available to system (may require fault injection software) Increase number of other processes running on system Increase lag time of networked resources Goal: point at which system fails should be much greater than scenarios listed in RSD

Stress Testing “Attack” testing common in security
Goal of normal testing: Goal of secure programming: Input for specific test case Desired response for specific test case System Does not produce undesirable result System Any input

Stress Testing Based on risk analysis from design stage:
Can roster database be deleted? Can intruder read files (in violation of FERPA)? Can a student add a course but not be added to the roster?

Fault Tolerance Goals: Problems can occur from many sources
System continues to operate when problems occur System avoids critical failures (data loss, etc.) Problems can occur from many sources Anticipated at design stage Unanticipated (hardware faults, etc.) Cannot prevent all failures!

Fault Tolerance Usually based on idea of “backward recovery”
Record system state at specific events (checkpoints). After failure, recreate state at last checkpoint. Combine checkpoints with system log (audit trail of transactions) that allows transactions from last checkpoint to be repeated automatically. Note that backward recovery software must also be thoroughly tested!

CSCI 5801: Software Engineering

Similar presentations

Presentation on theme: "CSCI 5801: Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCI 5801: Software Engineering

Similar presentations

Presentation on theme: "CSCI 5801: Software Engineering"— Presentation transcript:

Similar presentations

About project

Feedback