3The impossible dream: 1 Software contains no more errors When the dream is realised, software delivered to a customer will contain no more errors.
4The impossible dream: 1 Software contains no more errors software is the most reliable component in any system or product that contains itEverybody will regard software as always the most reliable component of any device or system or product that contains it.
5The sordid reality: 1 if it’s switched on and it stops working the fault is probably in the software.Whatever it is!Contrast this dream with the sordid reality of a great deal of today’s software, running on the computers embedded in devices that everywhere surround us – in televisions, refrigerators, and even in beds. Here is a general rule of thumb. If your device is switched on and it stops working, the fault is most probably in the software.
6The sordid reality: 1 If it’s switched on and stops working probably the fault is in software.If you switch it off and on again,and it now works again,certainly the fault is in the software.Whatever it is!For confirmation of the diagnosis, just switch the device off and on again. If necessary, unplug it and take out the battery. If it now works again, you can be practically certain that the fault is in the software. So certain that you will stop looking for any other cause. And this is a test that can be easily performed and understood by your grandmother – and unfortunately it often is. It is she that knows how bad software can be today. Surely we don’t want things to stay this way forever.
7A more possible dream: 1 Software contains no more errors than any other engineering productHere is a more possible version of the same dream. Software does not have to be perfect – it only has to be just as reliable as every other engineering product or device or component. So when there is a fault, it will be just as often in the hardware as in the software. For example, when it stops working, your grandmother will blame the switch rather than blame than the software.
8A more impossible dream:2 Programmers make no more mistakesI have a second and even more impossible dream, that the improved reliability of software is achieved because programmers themselves make no more mistakes.
9The impossible dream: 2 Programmers make no more mistakes programs work the first time they are run,and forever after.even when you change them.The programs that they write work perfectly the very first time that they are submitted to test; and when delivered, they continue to run correctly forever after. We are freed from the tyranny of debugging. And when the program is adapted to meet new needs, the improved program works just as well as before.
10The sordid reality: 2 programmers spend half their time detecting, removing or working roundmistakes made by themselves(or their colleagues)in the other half of their time.Compare this again to the sordid reality of the present day. The programming profession as a whole probably spends up to half its time detecting, removing, or working around errors that they or their colleagues have made in the other half of their time. You could check this assessment by your own experience at Microsoft. Count up the time spent by developers in unit test. Add the full time of the full-time testers. Add the time of integration testing after code complete. Add the time spent on analysis and correction of errors detected in the field. Compare this with the time spent by developers in design and coding. What is your assessment? Can I take a vote? How many of you think I am too pessimistic?
11A more possible dream: 2 Programmers make no more mistakes than any other professional engineerFortunately, there is a more possible version of this dream too, that may come true within the next fifty years. We only want that programmers make no more mistakes than any other professional engineer: no more mistakes than airline pilots, surgeons, structural engineers, hardware logic designers, ... Surely, this is a goal that we should pursue, simply as a matter of professional pride.
12$100 billion per year world-wide annual cost of software error. 40% falls on developers, 60% on users.Estimate based on survey of US industryPlanning report 02-03, prepared by NIST forUS Department of Commerce, May 2002But there are also other more sordid and financial reasons for pursuit of this goal – the saving of money. In 2002, the US Department of Commerce commissioned a survey with the strange title ‘The Economic Impacts of Inadequate Infrastructure for Software Testing’. The resulting report summarised figures derived from several major US industrial sectors – Automotive, Aerospace, Transportation and Financial Services. Extrapolating to all of US industry, the total added up to $60 billion per year. To cover the rest of the world, I have extrapolated again by rounding up to a hundred billion. It is the target of what could be saved by fulfilment of my dream of zero defect programming.Over a half of this significant cost falls upon the users of the software. Much of this is spent on work-arounds and precautions against the effect of errors. This is what would be saved by realisation of my first dream, that software contains no more errors. Just under a half of the cost falls on the producers of software, and is spent on testing and debugging. That is what would be saved by my second dream, that programmers make no more mistakes.
13Still impossible: 3 The program verifier An intelligent programmers’ assistant,that knows what the program should doand what it should not do.Verifies that the program is correct,with the certainty of mathematical proof,and gives a simple counterexample if not.Applied also to requirements and designsMy third impossible dream explains how the two previous dreams are going to be realised. It is a dream of an intelligent programmer’s assistant, also known as a program verifier. This is a programmer productivity tool like the type-checkers of modern programming languages, that detects any possible error even before the program is tested. In my dream, you have to tell the verifier is what your program is intended to do, or at least tell it what it should not do. Program assertions are one of the ways of conveying this essential information. The verifier then certifies that the program is correct according to these declared intentions. The certification is based on a machine-generated and machine-checked mathematical proof, which is as certain as mortal man is allowed to be. If there is an error, the verifier generates test cases that reveal it as simply as possible. I do not think either of my previous dreams could be realised without this one.The extra bonus of the verifier is that it can be applied right from the beginning of a software project, to analyse the consistency of requirements, to detect feature interactions, and to ensure the correctness of a software architecture, as well as checking the ultimate code.
14The sordid reality: 3 Computers can’t understand the real world It’s too hard to tell them what we want.They’re bad at proof,And worse at counter-examples.…but still we dream…Let us again contrast the sordid reality. Computers have almost no understanding of the real world environment in which they operate. They have even less understanding of the people who populate their environment. It is therefore very laborious to specify what we want a program to do – sometimes it is just as difficult as writing the program itself, and just as prone to error. And the rewards of doing so are at present slight to non-existent. Computers are extraordinarily bad at mathematical proof, and the theorems required for program correctness, though often not deep, are extraordinarily large. And if the program is in fact incorrect, it is still difficult to generate an automatic counter-example that is simple enough to help in removal of the error.All this is true, but still we dream. And as scientists, it is our duty to do so.
15Impossible dreams of science Physics: accuracy of measurementMany of the amazing scientific advances, on which our modern technological society depends, have been originally motivated by impossible dreams. For hundreds of years, physicists have been pursuing the dream of accurate measurement of physical properties – speed, temperature, distance, etc. If they already know how to measure things to an accuracy of 99.9 percent, then just for the sake of science, the scientist wants to increase this accuracy to percent. Or to ninety nine point nine nine …nine.Why do they always seek the extra nine? For the same reason as sportsman seeks to break a record, not just by the minimum one hundredth of a second but by a tenth of a second or even by a whole second. This is because they want their record to remain unsurpassed for many years to come. Similarly, scientists want their own publication to be cited again and again for many years, and not to be quickly superseded by a superior achievement of another scientist.
16Impossible dreams of science Physics: accuracy of measurementChemistry: purity of materialsSimilarly, chemists pursue the goal of purity of the chemical ingredients used in their laboratories. They aim at levels of purity far beyond the current needs of the market place. Because one day the market evolves to exploit their discoveries. Now there are whole industries whose existence is due to the success of their research, including our own IT industry. Modern computer chips are made in vast factories that achieve a purity of air filtration that was not long ago just an impossible dream, even for the chemical laboratory.16
17Impossible dreams of science Physics: accurate measurementChemistry: purity of materialsBiology: rational drug designPerhaps, and the closest analogy with my dreams for zero defect programming is provided by the molecular biologists, particularly the human geneticists. They have long been pursuing the dream of rational drug design, which enables the design of a medical treatment to be targeted directly at particular pathogens and even tailored to the genetic make-up of particular patients. Their impossible dream includes that of drugs that have no more side-effects. And most impossible of all, they want this to be done by pure application of pure science, without the massive and expensive testing regimes that are required by law today for use of drugs on human beings.
18A Grand Challenge The human genome project (1991-2003) planned 15 years aheadinvolving worldwide collaborationdedicated to open publication of resultsand radical improvement of toolsto answer fundamental questionsof Nature’s blueprint for the human being.The impossible dream of biologists was the inspiration for an extraordinarily ambitious Grand Challenge project to transcribe the entire Human Genome, around a gigabyte of it. The project started in 1991 with plans stretching ahead for fifteen years. It involved explicit collaboration of laboratories of at least seven officially participating countries. During the course of the project, the experimental tools made such great advances that it was completed two years ahead of schedule.This was acknowledged to be a purely scientific project, following scientific ideals of immediate and open publication of all results. And it was driven by pure scientific curiosity, to answer fundamental questions of biology, indeed, to discover Nature’s blue-print for every single human being on this planet. The project made no practical promises to deliver profitable drugs or to cure a single human ill. It is only after the result has been obtained that the drug industries are beginning to exploit it in the interests of human health, -- and their own commercial advantage.
19Impossible dreams of science Physics: accuracy of measurementChemistry: purity of materialsBiology: rational drug designComputer Science: zero defect programsFollowing the example of Physics and Chemistry and Biology, why shouldn’t Computer Science pursue its own impossible dreams: and why should not one of these dreams be that of zero defect programming? Perhaps Computer Scientists need our own Grand Challenge project to help us to realise our dream?
20Verified Software: Theories, Tools, Experiments IFIP Working Conference,Zurich, October 10 – 13, 2005.A hundred leading researchersfrom around the worlddiscussed a possible Grand Challenge.Follow-up meetings: US, China, EC,...Microsoft Research a leading participantThat was the question discussed by a Conference sponsored by the International Federation of Information Processing, held in Zurich, Switzerland in October The title was Verified Software, Theories, Tools and Experiments, to emphasise a commitment to the normal goals and methods of scientific research. We gathered a hundred leading researchers from around the world, covering a range of specialist areas of Computer Science, all of which can contribute towards a Grand Challenge project to achieve verified software. Since then, there have been follow-up meetings in US, Asia, and the European community. Researchers from the Microsoft Laboratories are playing an important role in planning and initiating the project.
21A glimmer of hope Programs have already been verified For a control system for Paris MetroMondex cash-cardprograms simulating hardware designsSizewell B nuclear power station...Praxis Ltd. guarantees their softwareThe project starts with a glimmer of hope. This slide shows a number of examples in which verification technology has already delivered benefit in the achievement zero defect programs. They provide promising evidence of the feasibility and the desirability of program proofs, even if they have to be conducted with significant manual assistance. An early example was a control program for the Paris Metro, developed with the aid of the B refinement tool. There was a manual proof of the software of the Mondex cash card, ensuring that it could not be used to forge money. The Bank of England had to believe the proof. After a notoriously expensive error in a floating point unit, hardware simulation programs are often proved correct, as a protection against bugs in delivered computer chips. The software controlling the UK Sizewell B nuclear power station was exhaustively checked by hand, with machine assistance. And the UK software Company Praxis routinely offers a conventional guarantee for their delivered software, that they will without charge correct errors detected in software after delivery -- and there are usually still one or two errors to correct.
22But proofs are often manual programs have been limited in size and do not evolveA Grand Challenge must solve these problemsBut of course, these achievements are on a relatively small scale. The actual proofs have required a lot of manual assistance; and manual proofs are only possible for programs of limited size. Proofs are fragile, and once a code has been proved, people are naturally reluctant to change the program after delivery. These limitations must be overcome if the Grand Challenge project is to succeed in the pursuit of its ideals.22
23Progress at Microsoft Programmer Productivity tools driven by immediate needexploiting results of earlier pure researchto find obscure bugsbefore delivery of software.Up to this point in my talk, I have concentrated on scientific research, motivated by scientific ideals. In the remainder of my talk, I wish to concentrate on a far more practical approach to the achievement of zero defect software. It is the engineering approach adopted by our own Company. Engineers driven not just by dreams, but rather by the immediate and pressing needs of their clients, The responsibility of engineering research is to exploit the results of earlier scientific research, but not to advance the science itself. In the case of Microsoft, the clients of the research are many thousands of Microsoft developers and testers. And their most pressing needs are to discover as many as possible of the obscure errors that afflict their programs, before they are discovered by our customers.In the remainder of this lecture, I would like to summarise for you some of the ways in which Programmer Productivity Tools developed initially by Microsoft Research are delivering present benefit to Microsoft, which is increasing as a result of improvements made in the light of practical experience. These tools are exploiting the same mathematical theories and proof techniques that are used in a program verifier. I want you to share in my hope that the natural evolution of these tools will approximate closer and closer to the ideal of the program verifier.
24Progress at Microsoft Programmer Productivity tools Four steps driven by immediate needexploiting results of earlier pure researchto find obscure bugsbefore delivery of software.Four stepsIn the remainder of this lecture, I would like to summarise for you some of the ways in which Programmer Productivity Tools developed initially by Microsoft Research are delivering present benefit to Microsoft, which is increasing as a result of improvements made in the light of practical experience. In the last five years, we have taken four major steps, each involving installation of a new tool. These tools are exploiting the same mathematical theories and proof techniques that are needed for a program verifier. I want you to share in my hope that the natural evolution of these tools will approximate closer and closer to the ideal of the program verifier.
25First step Program analysers like PREfix, PREfast detect obscure bugs, reduce the cost of testing.They evolve by reducingfalse positivesfalse negativesThe first major advance made by Microsoft was the introduction of automatic program analysers. Routine use of tools like PREfix and PREfast has detected many thousands of errors, which would have been very expensive to find by any strategy of testing, however systematic. These tools are subject to continuing improvement in the light of experience of their use. They are finding more errors more quickly, and their proportion of false alarms is being reduced.
26First step Program analysers like PREfix, PREfast detect obscure bugs, reduce the cost of testing...and they are improvingBut removing bugs is also error prone.There is still a problem with program analysis in the current state of the art. The removal of errors often runs the risk of introducing more errors. This will get worse, because the tools will improve their power to find even more obscure errors. Often it seems better to leave an error in the code, if it is unlikely to lead to much harm. But it can be just as difficult to predict the harm as it is to correct the error. And the consequences of an incorrect prediction that a warning can be ignored can be severe – a world-wide infestation of malware.
27First step Program analysers like PREfix, PREfast detect obscure bugs, reduce the cost of testing...and they are improvingBut removing bugs is also error prone.Analysis favours malware attackersThe second problem is that advances in program analysis actually give more advantage to the writers of viruses, worms, bots and other malware, because they can be perverted to detect potential vulnerabilities in delivered code. The problem is that just a single vulnerability can be exploited by an attacker, whereas the developer of the code has to protect against all of them.
28The next step Program analysers like ESP certify absence of some generic errorslike buffer overflowwith the certainty of mathematical proofTo meet the problem posed by the threat of malware, the next step is to increase the power of the program analyser till it can guarantee the complete absence of some generic kind of vulnerability, such as buffer overflow. For that particular kind of error there are guaranteed to be no false negatives – if the program passes the check, it is in fact free of that kind of error. The guarantee is based on mathematical calculation and proof. That is the achievement of the Microsoft tool ESP, which has been used extensively an effectively by the developers of Vista.
29The next step Program analysers like ESP certify absence of some generic errorslike buffer overflowwith the certainty of mathematical proofproof is automatic in 96% of casesThe performance of this tool is world-leading. It can provide an automatic check of the impossibility of buffer overflow in ninety six percent of all cases in Windows code. That is the good news. But that is also the bad news. The remaining four percent of the cases is still an awful lot to check by human intuition.
30The next step Program analysers like ESP certify absence of some generic errorslike buffer overflowwith the certainty of mathematical proofproof is automatic in 96% of cases(improving to 99% or 99.9% or...)We really need to improve this figure to ninety nine percent – or ninety nine point nine ....nine....nine percent. For this we need to harness the idealism of scientists, pursuing an impossible dream.
31The next step Program analysers like ESP certify absence of specific kinds of errorlike buffer overflowwith the certainty of mathematical proofproof is automatic in 96% of casesprogrammer annotation is requiredA second disadvantage is that use of this tool requires the programmer to provide a certain amount of extra annotation to the source program. Specialised notations have been designed to specify how knowledge of the length of each buffer is transmitted across every API. Fortunately, further advance in the technology of automatic program analysis can help with this aspect of the problem.
32Automatic annotation Program analysers like SLAM use abstract symbolic interpretationto discover plausible annotationsand then check them by proof.Counter-example driven predicate abstraction.That is the achievement of another kind of program analyser, known as an abstract interpreter. A good example is the SLAM tool developed by Microsoft Research. It has attracted favourable attention from academic researchers, and has been incorporated in the Static Driver Verifier to detect errors that may lead to a crash of Windows. The errors include the violation of the calling conventions for the kernel API. SLAM uses symbolic interpretation to discover plausible assertions to annotate the relevant parts of driver code, and it then uses mathematical proof technology to check the assertions, and strengthen them if necessary, they are strong enough to prove absence of the specified defect. The technical name of this is process is counter-example-driven predicate abstraction, and researchers in Microsoft have made an enormous contribution to this field.
33Automatic annotation Program analysers like SLAM use abstract symbolic interpretationto discover plausible annotationsand then check them by proof.specialised to one application areadevice driversThe implementation of SLAM exploits the particular features of its application area in driver verification, and it does not easily scale out to wider use. Fortunately, the general technology of abstract interpretation that it has incorporated is being developed for more general use, both by Microsoft researchers and by academics.
34A prototype program verifier The most advanced program analysers,like Spec# in Microsoft Research,certify absence of any kind of errorfor any kind of applicationIt a prototype program verifier for C#The most advanced program analysers combine and generalise the achievements that I have described. A good example is the Spec# system, which is under development by Wolfram Schulte and his team in the Research Division of Microsoft. It works on programs that are written in C#, without limitation to any particular area of application. Its purpose is to certify that all the assertions included in the program will be true on all possible runs of the program. So it is capable of detecting any kind of defect, provided that its occurrence can be signalled by violation of an assertion. A program analyser with these ambitions may be called a program verifier, because its aim is to produce no false negatives, and no false positives either. When a program has passed the check of a program verifier, it is known to have zero defects, and the guarantee is backed with the confidence of machine-checked mathematical proof.
35The long-term goal Certify the absence of any kind of error for any kind of applicationfor any programming languagewith the certainty of mathematical proofIt is my hope that the practical needs and practical experience of Microsoft will lead to the gradual evolution of a comprehensive general-purpose program verifier. Here is summary of the properties of such a verifier. The gaps in the slide symbolise the gaps that remain to be filled. And I believe they can only be filled by a combination of engineering progress and scientific breakthroughs contributed by a Grand Challenge project, conducted by our colleagues in the academic world.
36Filling the gaps Certify the absence of any kind of error that can be specified by assertions/contractsfor any kind of applicationfor any programming languagewith the certainty of mathematical proofThe first gap is in our capability to specify what errors we want to avoid, for example by means of assertions. It is only specified errors that a program verifier can guarantee against. It cannot protect against errors that are entirely unpredictable or unpredicted. So the languages in which we specify our programs need to be extended in power and usability.36
37Filling the gaps Certify the absence of any kind of error that can be specified by assertions/contractsfor any kind of applicationwhich is well enough understoodfor any programming languagewith the certainty of mathematical proofSecondly, we need libraries of generic re-usable specifications relevant to all specific areas of significant application. These will provide a standard terminology and a design trajectory and a framework of reasoning to guide applications programmers away from their habit of starting every new project from scratch. Furthermore, the specifications can be understood by the tools, to enable designs to be subjected to mathematical analysis of correctness.37
38Filling the gaps Certify the absence of any kind of error that can be specified by assertions/contractsfor any kind of applicationwhich is well enough understoodfor any programming languagewhose mathematics is fully understoodwith the certainty of mathematical proofThirdly, the programming language in all its complexity must be fully understood, so that the methods used in program verification are logically sound, and as complete as possible. An error in the verifier itself could have incalculable consequences. A common framework of theory is required for current languages in wide-spread use.38
39Filling the gaps Certify the absence of any kind of error that can be specified by assertions/contractsfor any kind of applicationwhich is well enough understoodfor any programming languagewhose mathematics is fully understoodwith the certainty of mathematical proofin a theory covered by an automatic proverFourthly, the theorem proving capability of the verifying compiler must be powerful enough to prove the necessary theorems. This will probably require an amalgamation of all the known technologies of logical proof and mathematical symbol manipulation.39
40The dream is possible! by combining the research of scientists who pursue long-term idealswith the work of engineerswho pursue immediate advantageto develop a program verifier,and realise the dreamof zero defect programming.The time has come to summarise the message of my talk. It is that my dream is possible after all. It can be achieved by a long-term collaboration. Firstly collaboration of scientists motivated by long-term ideals, as I described in the first part of my talk. And secondly by collaboration of engineers, pursuing the immediate advantage of their clients using early versions of the a programmer productivity tool, that evolves gradually towards the power of a program verifier.40
41The dream is possible! by combining the work of scientists who pursue long-term idealswith the work of engineerswho pursue immediate advantageto develop a program verifier,and realise the dreamof zero defect programming.within the next fifty yearsSurely by combination of top-down scientific research with bottom-up practical development we can solve this problem within the next fifty years.41
42The dream is possible! by combining the work of scientists who pursue long-term idealswith the work of engineerswho pursue immediate advantageto develop a program verifier,and realise the dreamof zero defect programming.within the next fifteen yearsDid I say fifty years? We can’t wait that long. We need the result within fifteen years, surely. When every year that we delay is costing a hundred billion dollars, surely we can get everybody to help. Researchers, developers and testers at Microsoft are currently leading the field. I hope that all of you in this audience will find a way some time soon of contributing the endeavour, making experimental use of new tools that emerge, suggesting improvements to them, and above all, and benefitting from progress of the research.42