Tom Gilb Result Planning Limited

Tom Gilb www.Gilb.com Tom@Gilb.com Result Planning Limited
Ten Quality Methods which probably won't improve product quality; and ten quality methods that more probably will succeed - for all aspects of quality” Tom Gilb Result Planning Limited NEED TO ADD WELLERS CMM DATA FROM ORLEANS CD OR WELLER Gerolds XP www added 12 march Jan 5 version printed as pdf file MASTER Version May 8th 2002

“won’t improve” (means, in this talk)
You do not end up getting ‘as good as you expected’, when you invested in the method. You would not have used the method, if that were the result ALTERNATIVE TALK TITLE "Ten Quality Methods which probably won't improve product quality, and ten quality methods that more probably will succeed - for all aspects of quality - not just bugs. "

Software All elements of the software (not just code)The code
The updates The data and databases The online user instruction The interfaces The user manuals The training materials The development and maintenance documentation The test planning The test scripts Anything else which is not clearly hardware

Quality All stakeholder valued aspects of system performance, including quality and savings Speed Capacity Adaptability Maintainability Availability Reliability Portability Reusability Testability Usability And very many more

Here are some popular methods or approaches which people expect some software quality from, but I suggest they will in practice be disappointed - often because of poor teaching and implementation - often because of lack of quality focus

1. Go for CMM Level X The Software Engineering Institute’s Capability Maturity Models CMM and CMMI Levels 2 to 5 Results of Level 3 Why Not? Not “quality” oriented CMM Bureaucracy overwhelms any idea of quality Intended mainly to put reasonable software engineering processes in place, but Does not directly address any quality aspect of a system Maybe you can get quality in spite of CMM but not because of it.

2. Demand ‘Better’ (Conventional) Testing
Conventional Software Testing is not normally directed towards product or system quality levels It looks for bugs (to oversimplify quite a bit!) Conventional Testing is ‘function’ oriented (not quality-oriented) It does not measure multiple quality type levels Conventional Testing is too late in the development cycle You get quality by designing it in, not testing it in! Test can prove presence of bugs/defects but cannot prove their absence (Note I will suggest Evolutionary Testing as a way of improving software quality later. Evo testing is not conventional, yet!)

3. Use Cases Use cases are not directed to qualities of a system Use cases cannot express quality requirements Use cases are not judged on the degrees of quality they deliver to an architecture There is no evidence published about the relationship between Use cases and any sort of quality I’d be happy to be informed of evidence I have overlooked!

The list of problems with Use Cases and UML
I have no intention of going through this in detail during my talk, but I wanted to make the details available to the participant - to lend more credibility to my point. The details are at the end of these slides.

A Use Case Critique Summary By Don Mills [Mills01]
This Appendix lists the “problems with use cases” that I found in my brief, and unscientific, survey of “the literature” (a mixture of books on my and my employer’s shelves, with articles found by browsing the Internet). The first eight entries come from the UI Design.net editorial for October ( Solutions to all of the problems exist, but not within the RUP or the UML (or only clumsily, ambiguously, or inconsistently), while outside those strictures many competing solutions have been proposed. Note that this is not intended as an exhaustive list .. DETAILS AT END OF THESE SLIDES.

4. RUP, RUP SE “System Quality – Provides the views to support addressing system quality issues in an architecture driven process” [RUP SE] “In RUP SE, [RUP] this idea is carried forward, adding systems engineers to the mix. Their area of concern is the design and specification of the hardware and system deployment to ensure that the overall system requirements are addressed.” Rational Unified Process never did address quality. RUP SE (Systems Engineering) is a belated, but weak (TG Opinion) attempt to patch that hole in RUP

RUP SE Example of ‘dealing with quality’ [RUP]

5. Conventional Inspection, Peer reviews, Reviews
Reviews do not generally focus on quality. Specific reviews may attempt to address quality. But in my view not professionally (quantified!). Conventional Inspections as they are usually done will fail to deal with quality in general, and will be very cost ineffective for quality in terms of bugs Why are ‘Conventional inspections a failure route? They focus on clean up of bad work (high bug injection rates) Their effectiveness for bugs is maximum 60% (one pass) They are rarely done at full effect ( likely effect 10%-30%)

5. (continued) Inspections, to deal with quality, must:
Deal with all aspects of quality engineering including quality requirements, quality design Define required quality practices in terms of process ‘Rules’ (failed rule = defect, detected by Inspection) Like: “All quality requirements will be defined with a scale of measure” “All design specification will be evaluated quantitatively on an impact estimation table”

6. Extreme Programming XP
XP has no direct focus on quality But there are several mechanisms which can help reduce injection of bugs in XP That does not deal with many other types of quality. XP can’t hurt you but it does not pretend to solve the larger quality attribute problem Click here for XP development method

Kent Beck XP (Dec 18 2001 partial email to Kent ) XP
Emotion can be a good thiing. Fad in my view is different fad·dish adj 1. very popular but only for a short time 2. tending to have strongly held, but brief, enthusiasms. Encarta® World English Dictionary © 1999 Microsoft Corporation. All rights reserved. Developed for Microsoft by Bloomsbury Publishing Plc. More emphasis on the temporal, because people find that it does not really work for them. I would like to suggest a simple and basic remedy: Quantify. Quantify the effects expected by a method (the performance, quality, costs) Show how to measure that the expected effects are happening. Give people diagnostics to help them understand why effects are not reached as expected. I think your remarks are dangerous (to you and your readers) in the sense that you are making an imprecise analogy with XP methods and cultural history. I think this is an interesting background analysis. But it is not sufficient to: 1. claim that the XP methods are sufficiently identical to the historical references to ensure that any given results will be achieved 2. guarantee that the use of an XP method will give expected benefits My experience is that even well defined and 'good' methods, such as Inspection, can totally fail in real implementation for subtle reasons ( like failing to use numeric exit levels, failing to use optimum checking rates). The only reasonable defence against failure to gain expected benefits, is to specifically state numerically what those expected benefits are ( good engineering and scientific practice!), and to insist that implementers measure that the expected benefits and costs have really happened. I know you are keen to get some better data on XP and I look forward to hearing about it. There might be some good things here, but I cannot afford to take that on blind faith. Nor should anyone! Best wishes (Dec partial to Kent )

XP Pair Programming IEEE Software July/Aug 2000
As Beck writes, “Even if you weren’t more productive, you would still want to pair, because the resulting code quality is so much higher.”10 Strengthening the Case for Pair Programming Laurie Williams, North Carolina State University Robert R. Kessler, University of Utah Ward Cunningham, Cunningham & Cunningham Ron Jeffries IEEE Software July/August 2000 By working in tandem, the pairs completed their assignments 40% to 50% faster.

Different View 12 March 2002 dear tom,
browsing through your presentation "10 guaranteed ways ..." that i did not have the opportunity to listen to, i noticed that you also have a slide concerning the XP practice of pair programming. you might be interested in a new study on pair programming to be found at cki.pdf . the study is essentially contradicting earlier findings by laurie williams. i actually set up a paper "Extreme Programming Considered Harmful for Reliable Software Development" that you can find at and you might want to have a look at it. regards, gerold keefer ===================================================================== AVOCA GmbH - Advanced Visioning of Components and Architectures Kronenstrasse 19 D Stuttgart fo fa

Woodward asks about XP 1/3
Questions on XP from In response to 1. How do you manage required changes in Software Architecture? Not all programmers are architects and not all architects are programmers so who does the work and what do the programmers do while the architecture is changed? 2. It seems to assume that all team members are equally experienced and skilled i.e. can make changes to the system with equal levels of confidence and competence. Otherwise, who is responsible for the integrity of the system, data models etc? 3. Who specifies the requirements? How are they specified? Or do the programmers have free reign to interpret often-fuzzy statements by the users however they want to? 4. What does the Project Manager do? 5. Why is XP different to what is known as RAD? OR DSDM? Or Evo? Or RUP? 6. XP promotes good practice, right? So where is the Process? 7. How does a system programmed via XP allow changing requirements to be implemented more easily than in other methods? Getting early feedback will not itself provide the answers. 8. How does XP help to prevent bugs getting into code in the first place? You cannot test quality into software; you must build it in. 9. It assumes very close contact with end users, right? This is rarer than you might think. And who co-ordinates and organises and presents the user requirements? Who checks them and makes sure that they do not invalidate the integrity of the system, current or proposed? 10. All the XP documentation that I have seen seems to set it up as the only way to handle changing requirements. I refer again to point 5 above. 11. How does XP mitigate risk?

Woodward on XP 2/3 12. How can XP handle projects with many man-years of estimated effort? Or many and complex interfaces? 13. (deleted as redundant) 14. How are the goals of XP different to those of any other method i.e. to produce software to the customer on time and to budget? Why should XP have different goals (if they do)? (Possibly redundant SW) 15. Why should XP make it any easier to produce quality products than any other method? Why should software engineering be easy just because the rules are? (Possibly redundant SW) 16. What’s difference between User Stories (XP) and Use Cases + UML? Why should XP be better in this respect? 17. What is refactoring and how does it product the most effective architecture? How does this differ to what we do already? 18. Is XP telling me that programmers can do effective functional testing in pairs or otherwise? How? What does XP see as the purpose of testing? 19. If the Customers are expected to write User Stories and they do not use some form of precise language then where is the quality, accuracy, consistency etc built in? Is this not a recipe for getting all the ambiguities into the code i.e. hacking?

Woodward on XP 3/3 20. Don't bother dividing the project velocity by the length of the iteration or the number of developers. This number isn't any good to compare two project's productivity because each project team will have a different bias to estimating stories and tasks, some estimate high, some estimate low. It doesn't matter in the long run. Tracking the total amount of work done during each iteration is the key to keeping the project on an even keel. I agree – you must measure and compare estimates with actuals to learn! 21. Iterative Development adds agility to the development process. Divide your development schedule into about a dozen iterations of 1 to 3 weeks in length. Gilb says 2%. I think this is arbitrary and a natural size develops (environmental factors). Team size plays a part – see OMAR. 22. Don't schedule your programming tasks in advance. Instead have an iteration planning meeting at the beginning of each iteration to plan out what will be done. It is also against the rules to look ahead and try to implement anything that it is not scheduled for this iteration. There will be plenty of time to implement that functionality when it becomes the most important story in the release plan. When you never add functionality early and practice just-in-time planning it is easy to stay on top of changing user requirements. YUP! 23. What if the real customers cannot be available?

Stuart Woodward comments XP s.woodward@computer.org

7. Better Programmers Programmers do not design quality into systems
Designers, engineers, architects do Good Programmers will correctly program low quality into a system to meet bad requirements or design on time

8. Outsourcing Outsourcing will not in itself give you better software quality You have to contract for it You have to specify the levels you want You have to confirm you got it

Evolutionary Project Management Contract Modifications 1/2
Design idea: designed to work within the scope of present contract with minimum modification. An Evo step is considered a step on the path to delivering a phase. You can choose to declare this paragraph has priority over conflicting statements, or to clean up other conflicting statements. §30. Evolutionary Result Delivery Management. 30.1 Precedence. This paragraph has precedence over conflicting paragraphs. 30.2 Steps of a Phase. The Society may optionally undertake to specify, accept and pay for evolutionary usable increments of delivery, of the defined Phase, of any size. These are hereafter called “Steps”. 30.3 Step Size. Step size can vary as needed and desired by the Society, but is assumed to usually be based on a regular weekly cycle duration. 30.4 Intent. The intent of this evolutionary project management method is that the Society shall gain several benefits: earlier delivery of prioritised system components, limited risk, ability to improve specification after gaining experience, incremental learning of use of the new system, better visibility of project progress, and many other benefits. This method is the best known way to control software projects (now US DoD Mil Standard ). 30.5 Specification Improvement. All specification of requirements and design for a phase will be considered a framework for planning, not a frozen definition. The Society shall be free to improve upon such specification in any way that suits their interests, at any time. This includes any extension, change or retraction of framework specification which the Society needs.

Evolutionary Project Management Contract Modifications 2/2
30.6 Payment for Acceptable Results. Estimates given in proposals are based on initial requirements, and are for budgeting and planning purposes. Actual payment will be based on successful acceptable delivery to the Society in Evolutionary Step deliveries, fully under Society Control. The Society is not obliged to pay for results which do not conform to the Society-agreed Step Requirements Specification. 30.7 Payment Mechanism. Invoicing will be on a Step basis triggered by end of Step preliminary (same day) signed acceptance that the Step is apparently as defined in Step Requirements. If Society experience during the 30 day payment due period demonstrates that there is a breach of specified Step requirements, and this is not satisfactorily resolved by the Company, then a Stop Payment signal for that Step can be sent and will be respected until the problem is resolved to meet specified Step Requirements. 30.8 Invoicing Basis. The documented time and materials will be the basis for invoicing a Step. An estimate of the Step costs will be made by the Company in advance and form a part of the Step Plan, approved by the Society. 30.9 Deviation. Deviation plus or minus of up to 100% from Step cost and times estimates will normally be acceptable (because they are small in absolute terms), as long as the Step Requirements are met. (The Society prioritises quality above cost). Larger deviations must be approved by the Society in writing before proceeding with the Step or its invoicing. 30.9 Scope. This project management and payment method can include any aspect of work which the Company delivers including software, documentation and training, maintenance, testing and any requested form of assistance.

A Subcontracting Policy
1. Specifications are to made to give both us, and the suppliers, the highest degree of flexibility ( for changes and unforeseen things) to carry out the real intent of the contract. For example: we shall avoid giving detailed design or feature lists, when we can control the product or service quality and performance better by a higher level statement which forces all necessary detail to happen. For: instead of a list of usability features, we should make sure we have the measurable testable usability quality requirements specified. If necessary the proposed detail can be a variable attachment which itself is not mandatory but for guidance.

Policy Quality Control
All contracts, Requests for proposal and attached technical specifications will be Inspected using a rigorous inspection process against our current specification rules for contracts or whatever document types we are using. Exit (for signing or reviewing) will be given when it is measured that there are less than 0.1 major defects/Logical page probably remaining.

Evo Form for quantified stepwise specs of the quality levels you want
Resource Constraints: Calendar Time: Work-Hours: Qualified People: Money (Specific Cost Constraints for this step): Other Constraints Design Constraints Legal Constraints Generic Cost Constraints Quality Constraints Assumptions: Dependencies: Design: Technical Design (for Benefit Cost requirements) Tag: Description (or pointer to tags defining it): Expected impacts: Evidence (for expected level of impacts) Source (of evidence) Buyer Requirements Functional Requirements Benefit/Quality/Performance Requirements Tag:____________ GIST: __________ SCALE:_____ METER [END STEP ACCEPTANCE TEST] ___ PAST[WHEN?, WHERE?] ___ MUST [when?, where?]____________ PLAN[when?, where?]____________ AMBITION LEVEL: __________

9. Deadline Pressure When the deadline is clear and holy, but the quality is not clear and not holy Deadline will win You will fail to get the quality you want

10. Define ‘Quality’ in terms of Bugs in code
Do you define food quality in terms of bugs per liter? The qualities you and your stakeholders want are many and varied, and bugs is only one measure, and not the most important one.

11. Re-usable software One client of mine invested on a very large scale in reusable modules But when it came time to reuse them over 60% of the modules had far too many bugs in them to use at all. What is the lesson? N experience with bad quality reuse

Summary of 10+1 Ways to Fail at Improving Software Quality
1. Go for CMM Level X 2. Demand Better Testing 3. Use Cases 4. RUP 5. Inspection, Peer reviews, Reviews 6. Extreme Programming 7. Better Programmers 8. Outsourcing 9. Deadline Pressure 10. Define ‘Quality’ in terms of Bugs in code 11. Re-usable software

Ten Better Approaches to Improve Software Quality
More effective More efficient (effect/cost) Better proven documented track record available More direct attack on measurable quality levels themselves Improve? Quantitative increase in quality levels attainable at a given cost. Significant increase

10. Evolutionary Testing Why is it better? What is it?
All quality attributes can be measured at each Evo step There are many steps (about 50 steps) Delivered quality levels are compared to numeric plans Tracking is done on an impact estimation table Delivery steps are to real stakeholders, not just testers - Why is it better? Focus is on total system ( people, data, platforms, real work) not code alone Early and frequent measurement Opportunity to learn from small failures and to prevent big ones

Philips Evo Pilot May 2001 36 Frank van Latum, The Manager
The GxxLine PXX Optimizer EVO team proudly presents the success of the Timing Prediction Improvement EVO steps. Shown are the results of the test set used to monitor the improvement process. The size of the test set has grown, as can be seen in the first column. (In the second column the week number is shown.) We measured the quality of the timing prediction in percentages, in which –5% means that the prediction by the optimizer is 5% too optimistic. Excellent quality (–5% to +10%) is given the color green, very good quality quality is yellow, good quality is orange, & the rest is red. The results are for the ToXXXz X(i) and EXXX X(i), and are accomplished by thorough analysis of the machines, and appropriate adaptation of the software. The GXXline Optimiser Team presented the word document below to the Business Creation Process review team. The results were received with great applause. The graphics are based on the timing accuracy scale of measure that was defined with Jan verbakel. Classification: Unclassified

Erieye Project: Inspection Cleanup per Evo Delivery
Erieye Project: Inspection Cleanup per Evo Delivery. Getting all causes of bad quality at early stages The deliveries in the graph below are ordered in time. Observe also that the deliveries differ quite a lot in size (e.g. numbers 6 and 20 are very small). The graph shows the total Major defects/page for all documents types for all inspections in each delivery. The total number of inspections is 994. Source: Leif Nyberg, Project manager, Ericsson Sweden, in a case study [Personal Communication to TG]

Value delivery in Omar Project

An example of a typical one-week Evo cycle at the HP Manufacturing Test Division during a project. [MAY96]

Impact Table for Step Management

One or more constraints
Evo and Requirements, Conceptually ‘Design’ is what delivers performance, and costs resource Design Y (done on step 2) Design X (done on step 1) One or more constraints Reliability Storage 1 1 2 1 2 Terminal (functions) Other Resources Other Performance 1 2 1 2 Storage 2 Usability 1 2 1 2 Evo development gradually delivers performance, while eating up resources by Implementing ‘design’ n Design _ (done on step n) Evo development gradually delivers performance, while eating up resources by Implementing ‘design’ n n Design _ (done on step n)

Multiple Test Levels of Microsoft Evo
Vital 3rd Office 2002 Level Daily Builds 6->10 Weeks Shippable Quality level Milestones Reference: Cusomano: Microsoft Secrets. Drawing by TG See reference [MacCormack2001]

Intel View of Industrial Evo cycle
We should just put a copyright 2002 Intel Corporation somewhere to keep the legal guys happy. Notes from Erik Simmons Jan : I changed the name to reflect the specific tailoring in this version of the EPLC to the FAB Sort Manufacturing Virtual Factory Software Automation teams. Keep this in mind when you speak to others in regards to Intel and EVO. The movement has just begun... (In fact, it began two years ago with my hire, but only after that much work introducing Planguage and better evolutionary thinking into the mix can we now start in earnest with EVO). Courtesy: Erik Simmons, Intel Oregon

9. Defect Prevention Process DPP
What is DPP? CMM Level 5 Continuous Process learning Maybe 2000 small changes per year (IBM MN) Avoiding defect injection (bad doesn’t happen!) 13x more cost effective than defect removal (Inspection). 50% to 95% of all defects can be prevented Why is it better for Quality? It attacks upstream (requirements, design, contracts) It is completely general (deals with all quality aspects, not just bugs) For more detail on DPP see Gilb, Software Inspection, Ch 7 & 17 (by Robert Mays) DPP Inventor

The Bottom Line for Process Improvement ...
Appraisal cost Prevention cost 10 20 30 40 50 1987 1988 1989 1990 1991 1992 Start Improvement Initiative Cost of rework $15.8 million Savings in rework alone ROI = 770% … And this is what it meant for their bottom line: Every dollar spent on it paid for itself almost eight times over! Raymond Dion, Process Improvement and the Corporate Balance Sheet (July 93) IEEE Software, July 1993, pp 28-35

Reduced Cost of Quality
50% 40% 30% 20% 10% 0% Cost Of Quality = COConformance + CONonConformance COC=Appraisal + Prevention CONC= cost of “fix and check fix” (“rework”) COC (Cost for doing it right) Making these gains was not a totally free process: Raytheon had to take into account the “Cost of Quality”. Philip Crosby defined this as, “The Cost of Conformance, plus the Cost of Non-Conformance.” The Cost of Conformance is how much it costs you to appraise the quality of products and find or prevent quality defects – by Inspection and testing, for example. The Cost of Non-Conformance is the cost of fixing and re-testing defects that your conformance practices allowed through. At Raytheon, the process improvement program initially created an increase in the Cost of Conformance, from about 20% of total project cost to up to 30%. This was entirely funded from within available discretionary funding of the IT department, however, and in any case, by the end of 1994, Cost of Conformance (Inspection and Testing) had fallen below the 20% it started at. Meanwhile, the Cost of Non-Conformance – that is, the cost of bugs – had dwindled away from over 40% of total project cost, to little more than 5%. Overall, the Cost of Quality at Raytheon went down from 65% of total project cost in 1988, to a mere 23% in 1995. CONC (Cost of doing it wrong) Philip Crosby’s “Cost Of Quality” ∑65% ∑23% ∑ Cost of Quality=COC

Half-day Inspection Economics. Gilb@acm.org
Defect Prevention Experiences: Most defects can be prevented from getting in there at all Cleanroom levels: approach zero def. IBM MN 99.99%+ fixes:Key= "DPP" 90% 80% 70% Mays 1993, User 1996 "72% in 2 years" <-tg 50% Mays & Jones (IBM) 1990 % of usual defects prevented 1 2 3 4 5 6 Years of continuous improvement effort North Carolina IBM Research Triangle Park Networking Laboratory Half-day Inspection Economics. 59

Prevention + Pre-test Detection is the most effective and efficient
\ 50% 70% 80% 90% <-Mays & Jones 50% prevented(IBM) 1990 <- Mays 1993, 70% prevented 1 2 3 4 5 6 "Prevented" 70% Detection by Inspection 95% cumulative detection by Inspection (state of the art limit) Test "Detected Cheaply" 100% Use Prevention data based on state of the art prevention experiences (IBM RTP), Others (Space Shuttle IBM SJ 1-95) 95%+ (99.99% in Fixes) Cumulative Inspection detection data based on state of the art Inspection (in an environment where prevention is also being used, IBM MN, Sema UK, IBM UK) Half-day Inspection Economics. 60

8. Motivate by Reward for Quality
What is motivation by reward? Connecting actual delivery of specific quality levels to some sort of personal and team rewards (not necessarily money). Why is it better? We don’t normally do this at all We reward on time delivery of bad qualities

8. Reward Quality (see the contracts in earlier slides)
Example: Define the quality you want in ‘Planguage’ [see refs CE, Posem] Maintainability: Scale: Average minutes to find, correct and regression test for a random bug. Meter [Evo Step Acceptance] at least 10 average bugs and 2 qualified maintainers. Plan [Contract, Each Evo Step] 60 minutes. Then stipulate: In a sub-supplier contract: Payment invoice-able when all defined quality levels are proven delivered. For in-house team Delivery can only be considered as ‘done’ when the defined tests prove that the defined levels of all qualities due are in fact delivered No quality? You are late!

7. Entry Level Defect Control: No Garbage In
What is it? All software engineering processes (contracting to coding) will make sure that the specifications they get are reasonably ‘good’. Good practice is defined by a set of ‘Rules’ (like Clear, Complete, Consistent) A sample ( 1 or more pages) of incoming information will be taken (Inspection) A measure of Major Defects per page will be taken A maximum level of defects will be allowed used Why is it better? Right now we have a Major defect level of about 150 ± 100 Major defects/page against a simple basic set of rules Acceptance levels should be at less than 1.0 Average cost of a Major defect is about 3-10 hours project time lost Current levels of Major Defects have delayed real projects by 2 years (Ohio Case).

7. No Garbage In (continued)
Policy “No software process shall use input specifications with more than one major defect per page (300 non commentary words)” “exceptions shall be documented and approved formally” Practice (how to measure garbage level) 1. Rules agreed (3 go a long way) 2. Sample size set (1 page is fine) 3. Processes are officially redefined to include this Entry control 4. Time level is set (up to 30 minutes is fine)

“Rules”: Best Practice Strong Advice
Introduce the following three rules for inspecting a requirements document: Three Rules for Requirements: 1. Unambiguous to intended Readership 2. Clear enough to test. 3. No Design specs (= ‘how to- be good’) mixed in Mixed up with Requirements (= ‘how good - to be’)

Report for page 82 (reported inspection results on requirements document, 4 managers)
Total Defects, Majors, Design (part of Total and M&m) M+m Maj. Design 41, , D=1 33, , D=5 44, , D=10 24, , D=5 Team would log unique Majors about ~2x30=60 (2X high score) Which is 30% of total , so total this page is about ~180 Majors If we attempt to fix 60 we log, and correctly fix 5/6 then ~10 are failed fixes, so: The total remaining after inspection and editing = = ~130 Majors per page.

Extrapolation to Total Majors in Whole Document
Page 81: 120 majors/p (3/4 page checked by 4 other managers) Page 82: 180 Majors/p Average 150 Majors/physical page x 82 pages = 12,300 Majors in the document. If a Major has 1/3 chance of causing loss downstream 4,100 majors will cause a loss And each loss is avg. 10 hours; (9.6 hours median at one client for 1,000 majors) then total project Rework cost is about 41,000 hours loss. (This project was in reality over a year late) 1 year = 2,000 hours for 10 people

More feedback “Love the slides on in-process document review.
We are using this with requirements documents, and have been able to double the quality of the documents with only a few hours of effort.” " Erik Simmons, Intel, Oregon " January 9th 2002 Added Jan 9th from his

6. Exit Level Defect Control: No Garbage Out
What is Exit about? Same as Entry control except you do quality control on your own work You check a spec against your rules for good specs You determine the defect density (defect injection rate) We can perform checks using samples During a work, so we don’t get surprised at the end Why is it better? It discovers problems very early It works at all levels of the development and maintenance processes Not just test and operate for code It can impact all types of quality (not just ‘bugs’) Very inexpensive and fast (10-30 minutes/check)

The NO ‘G.O.’ Policy Practical Implementation Policy (kept simple)
We will not release any work which has unacceptable defect density. We will check our work as it emerges, not just at the end. If bad work is being produced, we will change ‘whatever it takes’ to avoid defect injection. (=CMM5 DPP) Practical Implementation Exit Condition: “Maximum 1 Major defect/300 NC words” Sampling Rate: Check a page about every 10 pages Checkers: author and/or one colleague

How to Inspect a large amount of specification or code!
Sampling for Dummies “Do a page and then decide what to do.”

Sample “During” Authoring 1
The Author is expected to write about 45 pages. First we write only 5 of these. Then sample one page with Inspection Sample Write New Pages Exit? Re-Write all 5 pages No Too many defects Good Enough 4 Majors What to do based on the results If the one sampled page is good enough to exit! Then exit the 5 written pages, And let the author write another ~5 pages Again sample 1 page from the newly written pages. If the one sampled page is not good enough for exit! Then give the 5 pages back to the author Let the author re-write all the 5 pages based on the feedback from the one page sampled Then inspect a different page than the one inspected before, and measure if all the 5 pages can exit, based on this page. Don’t let the author write new pages before they get the first ones right.

Sample “During” Authoring 2
Exited Pages Exit? The Author is expected to write about 45 pages Now the Author can write 5 more pages. Then sample one page with Inspection Write New Pages Sample Re-Write all 5 pages Yes No Too many defects Good Enough 5 Major I’ve been driving for 2 hours without an accident, so I can now close my eyes while driving. What to do based on the results If the one sampled page is good enough to exit! Then exit the 5 written pages, And let the author write another ~5 pages Again sample 1 page from the newly written pages. If the one sampled page is not good enough for exit! Then give the 5 pages back to the author Let the author re-write all the 5 pages based on the feedback from the one page sampled Then inspect a different page than the one inspected before, and measure if all the 5 pages can exit, based on this page. Don’t let the author write new pages before they get the first ones right.

5. Quantify Quality Requirements
What does that mean? Specify a number, on a scale of measure indicating how much quality you want Do this for all types of quality you want to manage (reliability, maintainability, usability) Use ‘Planguage’ [CE reference back of slides] for example as a format. How do we do it? Identify the critical quality types (name tag it) Availability: Define a scale of measure for them Scale: Hours MTBF Decide on a good enough level of quality for the application Plan [First One] 30,000

5. Quantify Quality 2 Policy Practical
All critical quality requirements will always be specified quantitatively We will measure the level of quality actually delivered During development At acceptance In operation Practical Train people in Planguage Make specification templates (next slide) available Make knowledge of good scales of measure and practical meters (tests) available.

Scalar Requirements Template + <Hints>
<name tag of the objective> Ambition: <give overall real ambition level in 5-20 words> Type: <quality|objective|constraint> Stakeholder: { , , } “who can influence your profit, success or failure?” Scale: <a defined units of measure, with [parameters] if you like> Meter [ <for what test level?>] ==== Scalar Benchmarks ============= the Past Past [ ] <estimate of past> <--<source> Record [ <where>, <when record set> <estimate of record level> ] <-- <source of record data> Trend [ <future date>, <where?> ] <prediction of level> <- <source of prediction> ========= Scalar Constraints ============== Fail borders Limit [ ] <- Source of Limit Must [ ] <-- <source> ===== Scalar Targets ============= the future value and needs Wish [ ] <- <source of wish> Plan […] <target level> <-- Source Stretch [ ] <motivating ambition level> <- <source of level>

Erieye Project: Usability.Intuitiveness Requirement (Real Example)
Ambition: High probability in % that operator will <immediately> within a specified time from deciding the need to perform the task (without reference to handbooks or help facility) find a way to accomplish their desired task. Scale: Probability that an <intuitive>, TRAINED operator will • find a way to do whatever they need to do, • without reference to any written instructions (i.e. on paper or on-line in the system, • other than help or guidance instructions offered by the system on the screen during operation of the system) • within 1 second of deciding that there is a necessity to perform the task. • <-- MAB "I'm not sure if 1 second is acceptable or realistic, it's just a guess" Meter: To be defined. Not crucial this 1st draft - TG Past [GRAPES] ~80% ?  LN Record [MAC] 99%?  TG Assumption: we have human operators! Must [TRAINED, RARETASKS [{<1/week,<1/year}] ] %? MAB Plan [TASKS DONE [<1/week (but more than 1/Month)]] 99% ? LN [TASKS DONE [<1/year]] 20% ? - JB [Turbulence, TASKS DONE [<1/year] ] 10% ? - TG

4. Contract Towards Quality
What does that mean? When you contract for software work, you will define the work partly by quantified quality levels expected. This is the same as the quantified qualities in the last point, just that we do it in legal contracts. It gets taken more seriously than mere requirements! Why is it better for software quality? You are more likely to get the quality levels you want At least you shouldn’t pay if you don’t! All aspects of the development process will have to find a way to deliver the contracted levels.

Symbolic ‘Quality” Contract
The Availability will be at 99.98% The Maintainability will be 60 minutes/bug to find, fix and test. The Usability will be at 30 seconds for average task familiarization.

3. Reuse Known Quality What does that mean?
The various quality dimensions of a reusable software component are known, measured, predictable, quantified, documented Why is it better for quality? The qualities you get are ‘by selection’, rather than ‘by process’. This is a conventional engineering paradigm (use known components with known attributes)

2. Evolve Towards Quality
What does that mean? It means that your projects should be divided up into small (2%) stakeholder-result-delivery increments Each one to deliver at planned quantified levels Optionally going initially for the ‘final quality levels’ at an initially low level of functionality It means that you have to prove you know how to get your quality levels, early and frequently. Why is it (Evo) better for quality? You have to prove all mechanisms; early and frequently for Contracts Requirements Design Reused components Development process Staff Subcontractors Stakeholder reactions

Microsoft IE 3.0 During December, detailed coding of the individual modules started. But the IE3 team was still making decisions about the overall product architecture — decisions that would not only affect the features in the final product but also the development process itself. A team member explained, “We had a large number of people who would have to work in parallel to meet the target ship date. We therefore had to develop an architecture where we could have separate component teams feed into the product. Not all of these teams were necessarily inside the company. The investment in architectural design was therefore critical. In fact, if someone asked what the most successful aspect of IE3 was, I would say it was the job we did in ‘componentizing’ the product.” The first integration of the new component modules into a working system occurred in the first week of March 1996. Although only about 30% of the final functionality was included in IE3 at that point, it was enough to get meaningful feedback on how the product worked. It also provided a base-line product, or alpha version, that could be handed to Microsoft’s development partners. From that point on, the team instituted a process of “daily builds,” which integrated new code into a complete product every day. Once new code was “checked in” (integrated into the master version), getting performance feedback through a series of automated tests typically took less than three hours. With the rapid feedback cycle, the team could add new functionality to the product, test the impact of each feature and make suitable adjustments to the design. In mid-April,Microsoft distributed the first beta version of IE3 to the general public. That version included about 50% to 70% of the final functionality in the product. A second beta version followed in June and included 70% to 90% of IE’s final functionality. The team used the beta versions (as well as the alpha version) to gather feedback on bugs and on possible new features. Customers had a chance to influence the design at a time that the development team had the flexibility to respond. A significant proportion of the design changes made after the first beta release resulted from direct customer feedback. Some of the changes introduced features that were not even present in the initial design specification. The cycle of new-feature development and daily integration continued frenetically through the final weeks of the project. As one program manager said, “We tried to freeze the external components of the design three weeks before we shipped. In the end, it wasn’t frozen until a week before. There were just too many things going on that we had to respond to…but, critically, we had a process that allowed us to do it.” Models of the Software-Development Process The Explorer team’s process, increasingly common in Internetsoftware development, differs from past software-engineering approaches. (See “The Evolution of the Evolutionary-Delivery Model,” p. 78.) The waterfall model emerged 30 years ago from efforts to gain control over the management of large customsoftware- development projects such as those for the U.S. military. 9 (See “The Waterfall Model of Software Development Is the Traditional Approach,” p. 78.) The model features a highly structured, sequential process geared to maintaining a docu- Specs Architecture Design Architecture Evolution An Internet Explorer 3.0 team member declared, “If someone asked what the most successful aspect of IE3 was, I would say it was the job we did in componentizing the product.” Source: MacCormack, Product-Development Practices That Work: How Internet Companies Build Software in WINTER 2001 MIT SLOAN MANAGEMENT REVIEW

Linux Evolution Source: MacCormack, Product-Development Practices That Work: How Internet Companies Build Software in WINTER 2001 MIT SLOAN MANAGEMENT REVIEW

Design to Quality What does that mean?
It means we get the qualities we want by actively designing/engineering and architecting That means by choosing the design ideas which predictably will give us the qualities we require. It means defining all critical quality dimensions quantitatively It means evaluating all design options quantitatively in relation to our quality requirements levels. Why is it better for software quality? Because your design process is then focused on the qualities you want and on the designs which will give those qualities. Because this is the historically proven way to get quality in engineering and architectural disciplines Because current so-called ‘software engineering’ (example CMM, RUP) does not even have this ‘design’ idea on the agenda!

Design process example: An example of considering two alternatives, based on their impacts on qualities, their cost, and their risk. (Impact Estimation tool [CE, Posem]) 99%-99.9% means that the target is 99.9% and the benchmark (old system) is at 99% The % impact (in the right hand 2 columns) is the % impact from the benchmark (0% impact if = to benchmark) and 100% impact means meeting the target (on time etc.), so the A impact of 50% means we expect to get halfway to the target ( to a Reliability level of 99.45% in other words) A’s Credibility=0.8 (High) B’s Credibility=0.2 (Low) See slide note for explanation

Requirements and Architecture
The Head:Body Model of Evo. Architecture-level design combined with step level design. Requirements and Architecture P r o j e c t A h i u a n d M g m L v l " H e a d Plan/Study/Act A S t e p " B o d y ” r “ m i c - p j e t Requirements Design Quality Control (Construction/Acquisition) Testing Integration Delivery -> Stakeholder Measure & Study Results PLAN DO S t u d y

Some Better Ways to Get Software Quality you might like to learn more about.
10. Evolutionary Testing 9. Defect Prevention Process 8. Motivate by Reward for Quality 7. Entry Level Defect Control: No Garbage In 6. Exit Level Defect Control: No Garbage Out 5. Quantify Quality Requirements 4. Contract Towards Quality 3. Reuse Known Quality 2. Evolve Towards Quality 1. Design to Quality

Next slides are for extra detail later.
End of Talk! Next slides are for extra detail later.

A Use Case Critique Summary By Don Mills [Mills01]
This Appendix lists the “problems with use cases” that I found in my brief, and unscientific, survey of “the literature” (a mixture of books on my and my employer’s shelves, with articles found by browsing the Internet). The first eight entries come from the UI Design.net editorial for October ( Solutions to all of the problems exist, but not within the RUP or the UML (or only clumsily, ambiguously, or inconsistently), while outside those strictures many competing solutions have been proposed. Note that this is not intended as an exhaustive list ...

Use Cases ? 1 [The precise role of use cases is defined in The UML User Guide to be the description of a set of actions performed by a system to deliver value to a user: that is, system process design (at the user interface level).] Understanding the problem -- the business and its rules -- must happen first. Defining business process, system operating procedures or lines of communication is secondary. Use Cases lead to definition of procedures without proper understanding of the problem domain. Developing Use Cases with a User Group or Business Analyst group leads to premature interaction design by unskilled practitioners. It’s hard to determine the completeness of Use Cases because of their “single path” nature. This can lead to developers using their imagination to complete exception handling cases or rarely taken paths. This can quickly ruin a good Interaction Design. Use Cases do not lend themselves to OO development due to their nature as procedural descriptions of functional decomposition.

Use Cases ? 2 The User Group defining them are required to second guess the future system operation. They find this difficult or even impossible. This leads to new systems which don’t make an adequate improvement in operations procedures and can miss the opportunity to simplify a process and remove unnecessary people. Use Cases because of their procedural nature lend themselves to action-object User Interface designs. If you need or want to have an object-action UI Design (aka OOUI) then Use Cases are a poor foundation. Use Cases can end up as the repository for the whole requirements. Everything goes into the Use Cases and the Business Analyst group will claim, “the design is done already, now write the code”. This is very very bad for Interaction Design. Use Cases are poor input for Object Modeling. They can lead to poor definition of classes from noun extraction as you may otherwise be hoping to eliminate some of the domain terms used within the object model. The UML Specification is so non-specific and lacking in obligatory integrity checking that it is easy to produce fragmentary, inconsistent, ambiguous use cases while still following an arguably correct interpretation of all of the UML’s requirements. Cockburn identified 18 different definitions of Use Cases, yielding over 24 different combinations of Use Case semantics.

Use Cases ? 3 Use cases do not require backward or forward traceability of requirements. Standard UML specifications of use cases, together with descriptions in the Rational Object Technology Series of publications, lack a number of important testability elements, such as domain definitions for input and output variables, testable specifications of input-output relationships, and sequential and interactional constraints and dependencies between use cases. Use cases, by definition in the UML Specification, emphasise ordering (“sequences of messages exchanged ... [and] actions performed by the system”, V1.3). Physical sequence of operations is normally a process restriction, not a true requirement, and when truly required can be defined more abstractly by preconditions. Early emphasis on ordering is among the worst mistakes an O-O project can make, but is hard to avoid if use cases are relied on for analysis, since the UML Specification provides no standard way of expressing the common situation of optional or flexible sequences of action. Because the UML can neither express structure between use cases nor a structural hierarchy of use cases in an easy and straightforward way, use cases are developed as an “uncoordinated sprawl” of (by definition) discrete and unrelated functions. This creates a loose collection of separate partial models, addressing narrow areas of the system requirements, and presenting problems of relating these partial models and keeping them consistent with each other.

Use Cases ? 4 The UML Specification provides no clear semantics of what a use case really is (“representing a coherent unit of functionality” — but representing in what way(s)?), and no consistent guidelines on how it should be described. This “flexibility” may be seen as a good thing, but as the scale of design problems rises, with larger design teams and more and more use cases, the sort of “studied sloppiness” that can be beneficial for rapid design of modest problems begins to become a stumbling block. The UML Specification requires a use case to “represent” “actions performed by the system”, but (despite a popular interpretation) does not restrict these to externally visible actions. It is not clear what kind of events we should concentrate on while describing use cases: external-stimuli and responses only, or internal system activities as well. Use cases may not overlap, occur simultaneously, or influence one another, although actual uses of a computer system may do all of these. The level of abstraction of use cases, and their length, are a matter of arbitrary choice — “just enough detail, but not too much”. The only level of detail that is “enough” is a level that removes all ambiguity.

Use Cases ? 5 Furthermore, no modularisation concepts are given to manage large use case models. The include and extend concepts are presented as a means to provide extensibility, but no rigorous semantics are provided for these concepts, allowing for multiple disparate interpretations and uses. Use cases in general are descriptions of specific business processes from the perspective of a particular actor. As such they do not give a clear picture of the overall business context and imperatives that actually generate the requirements for these business processes. This means that they can be quite incomprehensible to non-domain experts. For the same reasons, the important business requirements and imperatives underlying the use case model become invisible when taken out of business context and expressed in discrete use cases. Subsequent readers of the use case model may be quite unable to explain the forces and business requirements that shaped the model. Developing Use Cases with a User Group or Business Analyst group leads to a focus on how users see the system’s operation. But the system doesn’t exist yet. (A previous system might exist, but if it were fully satisfactory you would not be asked to change or rewrite it.) So the system picture that use cases will present is based on existing processes, computerised or not. The system builder’s task is to come up with new, better scenarios, not to perpetuate antiquated modes of operation.

Use Cases ? 6 of 6 slides A UML use case model can’t specify interaction requirements where the system initiates an interaction between the system and an external actor. Because the UML Specification forbids interactions between actors, use cases cannot model a rich system context involving such interactions. The UML requires use cases to be independent of one another, which means that it offers no way to model persistent state across use cases, or to identify how the initial system state required by a use case (specified in Pre-conditions) is to be achieved.

References 1 RPL: www.result-planning.com (Gilb site)
Requirements Slides Evo method slides Inspection slides and papers Planguage Glossary (part of CE book) CE: Competitive Engineering book by Tom Gilb Forthcoming 2002 Addison Wesley A systems engineering and software engineering handbook, based on Planguage. (parts at Inspection: GG: Gilb and Graham: “Software Inspection” (1993) RR: Ronald A. Radice: “High Quality Low Cost Software Inspections” 2002, Paradoxicon Publishing, Andover MA, USA PoSEM: Gilb: Principles of Software Engineering Management (1988, Addison Wesley)

References 2 Mills01:”What’s the Use of a Use Case?”
RUPSE: Rational Unified Process for Systems Engineering RUP• SE1.0 A Rational Software White Paper (possibly avialble via TP 165, 8/01 This paper attempts to tackle the problem of system architecture for multiple quantified quality requirements. TG It fails in that it is not dealing with multiple quality requirements simultaneously, and is not doing much more than arm waving.It does not do what I would calla good job of quantifying quality. It does not do a good job of what I would consider showing the releation between a design and multiple qualities and costs. But it is the best attempt to recognize the need and the problem to come out of Rational so far. TG Mills01:”What’s the Use of a Use Case?” Don Mills Copyright © Software Education Associates Ltd Wellington, New Zealand, 2001 Should be available at [MacCormack2001] Evo in MIT Sloan Review Winter 2001 Product-Development Practices That Work: How Internet Companies Build Software

Slides added after printed documentation made for conference

Kent Beck eXtreme Programming (QUOTED WITH PERMISSION)
On 18/01/02 14:25, "Kent Beck" wrote: > I think you are conflating two concepts--how you create a process and how > you create a community to use the process. > > I was quite "scientific" in my creation of XP. First I read voraciously and > asked lots of questions about a topic. Then I experimented with a technique > myself, generally to extremes so I understood the range of possible > behavior. Whatever worked best for me I taught to a few people I trusted. If > they reported good results I taught it to people I didn't know. Only if they > reported good results would I begin recommending the practice in speeches > and in print. I tried combinations of practices (not exhaustively, but I > tried to be aware of interactions when they occurred). > I put "scientific" in quotes above, because it isn't science like physics is > science, but it is science as described by Sir Francis Bacon, and as > contrasted to Aristotelian pure reasoning. My notebooks certainly wouldn't > survive review by a physical scientist. But we aren't in the physical > science business. > Now I had some tested ideas, and I was ready to see them implemented on a > large scale (we can get into motivation later). Given my resources, viral > marketing driven by storytelling was the only option. > Does that answer your question? Yes, it would be fine for you to quote the message. I haven't really talked about it in public, but mostly because no one else ever asked. Kent Jan ======== an earlier for which I do not have permission to quote publicly because he would rather craft it to that level, which he has done Jan see main one. “I don't have data on pair programming vs. inspections. I would love for someone to do the studies, but it won't be me. I'm a storyteller, not a scientist. One thing I like about pair programming is it is addictive. After a while, as the stress level increases you are increasingly likely to search out a partner. Some people say XP requires discipline- I think it mostly requires an addictive personality :-)/2 Kent” “I think you've thought a lot more about evolutionary delivery than I have,” Kent January 2001. Jan I don't believe we've ever met in person. I would like to some time soon. We seem to have a lot in common. We have just moved to rural southern Oregon after two years in Zurich. My work is still evolving, but it looks like it will be something like long term consulting/coaching with small teams, so far mostly from .com startups. I'm amazed at how slow these guys move. They have so much fear that they are afraid to evolve their systems, when that is really their only chance. I think you've thought a lot more about evolutionary delivery than I have, and that XP has more to say about how technically to achieve predictable evolutionary development indefinitely. I'll be very interested to hear your comments on XP. Return to main sequence

CMM Level 3 Results From Ralph Young, Effective Requirements Practices, originally from Paulk SEI

This is the last slide of the set of slides!

Tom Gilb Result Planning Limited

Similar presentations

Presentation on theme: "Tom Gilb Result Planning Limited"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tom Gilb Result Planning Limited

Similar presentations

Presentation on theme: "Tom Gilb Result Planning Limited"— Presentation transcript:

Similar presentations

About project

Feedback