Wednesday, July 20, 2011
Coding errors - how many errors are on average in a lengthy code?
by
Hans von Storch
This problem is discussed on this webpage
One statement there reads:
The book by Steve McDonnell (Code Complete, 2nd Edition. Redmond, Wa.: Microsoft Press, 2004. 960 pages. ISBN: 0735619670} has a brief section about error expectations. He basically says that the range of possibilities can be as follows:
(a) Industry Average: "about 15 - 50 errors per 1000 lines of delivered code." He further says this is usually representative of code that has some level of structured programming behind it, but probably includes a mix of coding techniques.
(b) Microsoft Applications: "about 10 - 20 defects per 1000 lines of code during in-house testing, and 0.5 defect per KLOC (KLOC IS CALLED AS 1000 lines of code) in released product (Moore 1992)." He attributes this to a combination of code-reading techniques and independent testing (discussed further in another chapter of his book).
(c) "Harlan Mills pioneered 'cleanroom development', a technique that has been able to achieve rates as low as 3 defects per 1000 lines of code during in-house testing and 0.1 defect per 1000 lines of code in released product (Cobb and Mills 1990). A few projects - for example, the space-shuttle software - have achieved a level of 0 defects in 500,000 lines of code using a system of format development methods, peer reviews, and statistical testing."
Obviously, the references cited are rather old, but from browsing through different posts on that webpage, i got the impression that a rate of 0.1 bugs per KLOC would be a conservative (likely underestimate) estimate.
Are estimates known for climate models, and satellite retrieval products?
One statement there reads:
The book by Steve McDonnell (Code Complete, 2nd Edition. Redmond, Wa.: Microsoft Press, 2004. 960 pages. ISBN: 0735619670} has a brief section about error expectations. He basically says that the range of possibilities can be as follows:
(a) Industry Average: "about 15 - 50 errors per 1000 lines of delivered code." He further says this is usually representative of code that has some level of structured programming behind it, but probably includes a mix of coding techniques.
(b) Microsoft Applications: "about 10 - 20 defects per 1000 lines of code during in-house testing, and 0.5 defect per KLOC (KLOC IS CALLED AS 1000 lines of code) in released product (Moore 1992)." He attributes this to a combination of code-reading techniques and independent testing (discussed further in another chapter of his book).
(c) "Harlan Mills pioneered 'cleanroom development', a technique that has been able to achieve rates as low as 3 defects per 1000 lines of code during in-house testing and 0.1 defect per 1000 lines of code in released product (Cobb and Mills 1990). A few projects - for example, the space-shuttle software - have achieved a level of 0 defects in 500,000 lines of code using a system of format development methods, peer reviews, and statistical testing."
Obviously, the references cited are rather old, but from browsing through different posts on that webpage, i got the impression that a rate of 0.1 bugs per KLOC would be a conservative (likely underestimate) estimate.
Are estimates known for climate models, and satellite retrieval products?
Subscribe to:
Post Comments (Atom)
4 comments:
Well, Steve Easterbrook comes to mind (University of Toronto, http://www.easterbrook.ca/steve/). His blog suggests, that there has been a master thesis by a Jon Pipitone on the topic. Easterbrook blogged a preliminary number of 0.03 defects/KLoC for the "Met Office Hadley's Centre's Unified Model" and compares that number to the 0.1 defects/KLoC for the space shuttle flight software.
The number of errors is not necessarily a particularly useful measure as the nature their consequences varies greatly. Some coding errors may lead to occasional failures to produce any results, while others lead to erroneous results.
When we are mainly interested in the correctness of the results that have been obtained (rather than in the certainty that every attempt gives results) it may be more effective and in some cases sufficient to verify the results in some external way over a wide enough range of inputs. That approach may leave many errors in the code, but still provide enough certainty that the results are correct or close enough to correct for the actual use the results. On the other hand total lack of coding errors would not be of much help if a model has been built based on erroneous physics or mathematics (in the mathematics I include the validity of the numerical methods).
Hans, I'd note that the Stackoverflow discussion was closed as 'Not a real question', and that I think is an accurate summary.
I'd classify it as 'thinking like an accountant', as, just for starters, how will you ever get anyone to agree on exactly what constitutes an 'error'?
This issue became important in connection with Edward Teller's 'Star Wars', where on receipt of a signal of an approaching Russian missile, a parabolic mirror would be launched into space and aimed, so that the X-rays emitted by an exploding H-bomb would be focused on the travelling missile. One of the many technical issues was that the device involved some millions of lines of computer code, and could not be tested in advance. One computer expert said that the maximum number of lines of untested code without a bug was about 9; hence the probability of correct functioning was very low. In spite of such unanimous expert criticisms, the 'Star Wars' project had Reagan's support. Worse, at the Rekjavik conference with Gorbachov, negotiations broke down because the Russians insisted that the USA abandon the project. The interaction of fantasy with hardware has been particularly important in the modern period.
Post a Comment (pop-up window,non-moderated)