‘* commentary’ Category Archives


Reading Programs

by Tony in * commentary, The Psychology of Computer Programming

A young novelist of our time was recently asked who were this favorite authors. He responded simply that he never read novels, as his ideas were so new and superior to anyone else’s that reading would only be a waste of his time. As you might expect, his work did not support his thesis. Perhaps the same could be said for some of our radical young programmers. Perhaps there is something to be gained from reading other people’s programs—if only the amusement engendered by their bad examples. Perhaps if we want to understand how programmers program—to lift the veil of the programming mystique—we could fruitfully begin by seeing what is to be learned from the reading of programs.

— Gerald Weinberg, The Psychology of Computer Programming, Chapter 1

The meme that programming is a write-only skill is one that recurs from time to time. With relatively little software these days needing to be highly optimised, and developer time orders of magnitude more expensive than hardware, the dictum that software should be written primarily for humans to understand, and only secondarily for machines, is truer than ever. The context, and examples, in this book are rather quaint and almost humorous now (like noting that now that programmers actually work at a terminal they can just see what code does rather than having to read it offline).

But there is a great psychological point raised here too that I’ve never heard anyone talk about before and still holds as true today. Often in a software project, things are implemented in a clumsy, awkward, or otherwise non-standard manner. A lot of time this is because the programmer was unaware of the better way to do things, or needed to get on to working on other things as soon as the code “worked”, without having time to tidy it up. But sometimes there was as a good reason for the approach at the time it was written that is no longer true now (to work a limitation in a library being used that has long since been fixed, for example). If it’s in a relatively stable area of the codebase, it can languish there in that form for many years, until one day, needing to make a change nearby, a junior programmer will come in, see what to them is a needlessly complex part of code and congratulate themselves for knowing more than the programmer who originally wrote this nonsense. It’s not unusual, however, for that other programmer to now be their boss, and subtly contribute to building an unhealthy relationship there.


Computer Programming as a Human Activity

by Tony in * commentary, The Psychology of Computer Programming

This book has only one major purpose—to trigger the beginning of a new field of study: computer programming as a human activity, or in short, the psychology of computer programming. There are, by various estimates, hundreds of thousands of programmers working today. Each of them could be functioning more efficiently, with greater satisfaction, if he and his manager would only learn to look upon the programmer as a human being, rather than another one of the machines.

At the moment, programming—sophisticated as it may be from an engineering or mathematic point of view—is so crude psychologically that even the tiniest insights should help immeasurably.

— Gerald Weinberg, The Psychology of Computer Programming, Preface

This book was originally published in 1971. A decade ago there was a “Silver Anniversary Edition” published with additional commentary by the author from a perspective of 25 years on, but I don’t have that version; here I’m working from the original text.

There are very few computing-related books that are still relevant almost 40 years on. Hardware and software have changed beyond recognition multiple times, with the steady drive of Moore’s Law leading to computers that are a hundred million times more powerful. Many of today’s young programmers can barely conceive of a time before Google was the first port of call when something went wrong, never mind when computer time itself was so expensive that you had to do all your programming on paper, and, when you were ready, give it to an operator who would load it in for you and let you know the result! At the time this book was written, the debate was still raging over whether the new fangled approach of letting programmers actually work directly at the computer led them to grow lazy and adopt careless and inefficient work habits.

However, although the technical references in this book are often so dated they’re almost unintelligible, the observations on the human side of programming are still scarily accurate. Update the code snippets from FORTRAN to Ruby and the casual reader wouldn’t even realise for large chunks of the book that it wasn’t written last year. The reviews of the Silver Anniversary Edition seem almost to think this is a good thing, as it means the book is still relevant. To me it’s a hugely scary thing. How can we have learned so little in forty years? We should be looking back in horror at how programming was managed then, and the stories should sound as archaic as the technology.

In many ways, then, this book was a failure. The psychological aspects of programming are still poorly understood. Weinberg himself has developed many of the ideas much further in later works (such as the excellent Quality Software Management series), but these are even less widely read. But that doesn’t mean it should be abandoned. It’s every bit as important now as it was then, and the underlying promise quoted above still holds true: “even the tiniest insights should help immeasurably”.


Measuring Programming Quality and Productivity

by Tony in * commentary, * papers

In the field of computer programming, the lack of precise and unambiguous units of measure for quality and productivity has been a source of considerable concern to programming managers throughout the industry.

This paper, from the IBM Systems Journal in 1978, is one of the earliest by Capers Jones on Software Productivity, but it still seems that little has changed from this assessment in the last 25 years.

Jones discusses the problems with the two most common units of measurements used in IBM at the time: Lines of Code per Programmer Month, and Cost per Defect, showing that these measures can slow down the acceptance of new methods because the methods may – when measured – give the incorrect impression of being less effective than former techniques, even though the older approaches actually were more expensive.

In the last 25 years, Jones has devoted much of his time to studying productivity, and promoting a vast array of metrics, but in this paper he promotes one key coarse measure: “Cumulative defect removal efficiency”, which is the defects found before release divided by the defects found after release. Plotting this against “total defects per KLOC” then produces a useful graph of the “maintenance potential” of a program.

More interestingly, this seems to have been one of the first papers to note that productivity rates decline as the size of the program increases. Jones details that programs of less than 2 KLOC usually take about 1 programmer month/KLOC, whereas programs of over 512 KLOC take 10 programmer months/KLOC. Similarly, when it comes to maintenance, smaller changes imply larger unit costs, because it is necessary to understand the base program even to add or modify a single line of code, and “the overhead of the learning curve exerts an enormous leverage on small changes”.


Sackman Revisited

by Tony in * commentary, * papers

[Continued from Substantiating Programmer Variability]

The Dickey paper helpfully reproduces a table of the key data from the Sackman experiment. (I haven’t been able to find the original version of the Sackman paper yet, so I haven’t been able to verify the data, but I’ll assume for now that IEEE Transactions verified it when they published Dickey’s paper!)

I’ve eliminated the developers who produced their solutions in machine code, the one developer who had no prior experience of time-sharing, and the developer whose first experience of JTS was this test, leaving a result sample of 7 developers. I’ve also combined the time taken to code the solution, with the time taken to debug it. The average debug time for the on-line vs. the off-line group for the more difficult test (Algebra) was 29 hours vs 28 hours, so I’m chosing not to further subdivide according to platform.

The results are quite illuminating:

Developer Hours
1 26
11 30
8 50
12 54
5 81
10 87
4 115
Developer Hours
8 4
11 5.5
12 11
5 13
10 18
1 24.5
4 32

In each case the distance between the worst time and the median is approx 2:1. From the median to the best is just over 2:1 for Algebra, and just over 3:1 for Maze: the “superprogrammers” don’t seem that better any more.

Even more notable is the performance of Programmer 1. Although he is the fastest at solving the Algebra task, he is one of the worst at the Maze task (this was due to a much higher time spent in development of the Maze solution than all the other programmers, so the issue of on-line vs off-line debugging seems not to be relevant here either).

When we take the total time spent on the two tasks combined the picture is even more surprising:

Developer Hours
11 35.5
1 50.5
8 54
12 65
5 94
10 105
4 147

Now we have factor of 1:2.25 from median to worst, and simply 1.8:1 from best to median.

In case all these numbers have made your eyes glaze over, I’ll restate it: this is the test that is often cited as showing a productivity variance of 28:1!


Substantiating Programmer Variability

by Tony in * commentary, * papers

[Continued from Programmer Variability]

In the same issue as the Dickey paper there was another small follow-up article by Bill Curtis attempting to put forward other data in support of the high degree of variability, in light of the problems with the data from Sackman.

The approach this time was simpler, although still aimed at debugging: 27 programmers were given a modular-sized Fortran program with a simple bug, and the time taken to find it was measured. (There were actually two such experiments, but the first was deemed too difficult). The times taken were then grouped and tabulated:

Mins to Find # of People
1-5 5
6-10 10
11-15 4
16-20 3
21-25 1
26-30 0
31-35 0
36-40 1

(one programmer could not find the bug at all, giving up after 67 minutes)

Although there is again a factor of 20+:1 between best and worst, Curtis points out this relies on having both a brilliant programmer and a horrid one in the same sample, and that this is thus not a particularly sustainable measure of performance variability among programmers.

In addition he points out that:

Substantial variation in programmer performance can be attributed to individual differences in experience, motivation, intelligence etc. Thus, important productivity gains could be realized through improved programmer selection, development, and training techniques. These gains would be achieved through eliminating the skewed tails often observed in distributions of programmer performance data.

As with the original Sackman conclusion, the emphasis here is on removing the weaker programmers (although potentially by training, rather than not hiring – Curtis points out that the programmer who failed to find the bug at all substantially improved in later trials when he had gained more programming experience), not on attempting to find the brilliant ones.

[Continued in Sackman Revisited]


Programmer Variability

by Tony in * commentary, * papers

[Continued from: Exploratory Experimental Studies Comparing Online and Offline Programming Performance]

In July 1981, thirteen and a half years after the Sackman paper, Proceedings of the IEEE published a little-known response from Thomas Dickey.

In it, he points out that the now oft-quoted 28:1 productivity difference is an inaccurate reading of the data. The CACM article usually referenced was only a summary of the full paper, excluding the actual data, so Dickey returned to the original sources and discovered that the ratios cited are misleading, as they do not differentiate between the impact of:

  • programmers on the time-sharing system versus those on the batch system
  • those who programmed in JTS, an ALGOL variant (one of whom learnt the language in order to do the experiment), and those who programmed in machine code
  • the programmers who had poor, or in some cases no, knowledge of the time-sharing system.

In this case, the 28:1 figure (which, we should remember, only applies to debugging time), was the difference between the 6 hours taken by one programmer to debug his JTS solution on a time-share platform, versus 170 hours taken to debug a machine-language solution in a batch environment!

After accounting for the differences in the classes, only a range of 5:1 can be attributed to programmer variability. The casual researcher, in encountering Sackman’s paper, seizes on the 28:1 figure primarily to support arguments to the effect that programmer variability is “orders of magnitude” larger than effects due to language or system differences.

Dickey also goes on to show how the figure made it into common use: The CACM paper was cited at the NATO Conference on Software Engineering, 1968, which in turn provoked an article in Infosystems: “The Mongolian Hordes Versus Superprogrammer” (J L Ogdin, December 1973), bringing the number to the wider industry, to be picked up and used by Yourdon, Boehm, Brooks, Constantine, Weinberg et al, often mutating in the process.

Strangely neither this paper nor its conclusions seem to have made much of an impact on the popular view.

[Continued in Substantiating Programmer Variability]


Exploratory Experimental Studies Comparing Online and Offline Programming Performance

by Tony in * commentary, * papers

I spent a very productive morning at the library in search of the original articles on “order of magnitude productivity differences”.

The original paper that everyone seems to point back to, either directly, or by pointing to other references that in turn point this one, recursively, is this article by Sackman, Erikson, and Grant in CACM 1968 (two issues before Dijkstra’s famous “Go To statement considered harmful”). This claims to be a report on one of the “first known studies that measure programmers” performance under controlled conditions for standard tasks, conducted by the authors at DARPA.

The background of the research was to investigate experimentally the differences in productivity between time-shared computing systems over batch-processed ones. Time-sharing was becoming more and more popular, and there was much spirited controversy on both sides of the debate. The proponents of time-sharing claimed that the productivity benefits easily outweighed the associated costs of moving to such a system. Detractors claimed that “programmers grow lazy and adopt careless and inefficient work habits under time sharing”, leading to a performance decrease.

The issue of whether to move to time-shared systems was fast becoming one of the most significant choices facing managers or computing systems, but little scientific research had been done.

DARPA therefore carried out two studies of on-line versus off-line debugging performance, one with experienced programmers (averaging 7 years experience), and the other with trainees.

The experienced programmers were divided into two sub-groups and each given two tasks, one group working (individually, not as a team) on task A on the time-sharing system, and task B off-line, the other group the reverse of this. The tasks were moderately difficult: finding the one and only path through a 20×20 cell maze given as a 400 item table, with each cell containing the directions in which movement is possible from that cell, and interpreting inputted algebraic equations and computing the value of the single dependent variable. However, in the Algebra problem, the developers were referred to a published source for suggested workable logic to solve the problem.

The study was mostly interested in debugging time, which was considered to begin when the programmer had successfully compiled a program with no serious compiler errors, and ended when tests were shown to be successfully run. (The underlying assumption being that the time to actually write the code would be unchanged whether an on-line or off-line environment was being used, whereas the approach to finding and fixing bugs would differ significantly.)

The results showed that, for the experienced programmers, using an on-line environment dropped the mean debug time for the Algebra problem from 50.2 hours to 34.5 hours, and for the Maze program from 12.3 hours to 4.0 hours, thus validating the idea that the productivity gains from developing on a time-sharing platform would indeed probably outweigh the costs of setting up such an environment.

Almost in passing however, the researchers also discovered another interesting fact: that the difference between the best, and the worst, developer, on any given metric, was much higher than expected:

Algebra Maze
Coding Time 16:1 25:1
Debug Time 28:1 26:1
Size of Program 6:1 5:1
CPU Time 8:1 11:1
Run Time 5:1 13:1

To paraphrase a nursery rhyme:

When a programmer is good
He is very, very good,
But when he is bad,
He is horrid

The authors thus concluded that “validated techniques to detect and weed out these poor performers could result in vast savings in time, effort and cost”. Interestingly they do not seem to consider any benefit from attempting to detect the best performers – only the worst!

[Continued in: Programmer Variability]


Better Productivity through Avoidance

by Tony in * commentary, * papers

I also found an interesting recent article by Barry Boehm on “Managing Software Productivity and Reuse” [pdf], that details the results of an extensive analysis conducted with the DOD to discover savings over a business-as-normal approach.

In this study, they discovered that you could achieve an 8% improvement through “working harder”, a 17% improvement through “working smarter”, and a 47% improvement through “work avoidance”.

Better still, all three are mostly complementary, and the gains can by accumulated by avoiding whatever work is possible and working smarter and harder on the rest.

He also provides a useful graph of how productivity has risen over the last 40 years, through the use of assembler, high-level languages, databases, regression testing, prototyping, 4GLs, sub-second time sharing, small-scale reuse, OOP and large scale reuse, providing an order-of-magnitude increase in productivity, on the general scale, every 20 years.

He also argues that with “stronger technical foundations for software architectures and component composition; more change-adaptive components, connectors, and architectures; creating more effective reuse incentive structures; domain engineering and business case analysis techniques; better techniques for dealing with COTS integration; and creating appropriate mechanisms for dealing with liability issues, intellectual property rights, and software artifacts as capital assets” even greater gains can be achieved.


Assessing the Impact of Reuse on Software Quality and Productivity

by Tony in * commentary, * papers

I’ve been trying to find more information on the “factor of 10” productivity differences, between either teams or individuals, frequently cited, but most of the primary articles don’t seem to be available on-line. A trip to the library is probably in order again early next week.

I did come across this study from 1995, however, which set out to measure the impact of reuse in OO systems. A graduate class was divided into teams, each of which was set the same programming task: to develop a system for a video rental store. Generic and domain specific libraries were made available for reuse, but they were free to choose whether or not to use these.

So far so good. Where the study seems to go bizarre, however, is in how productivity was actually measured: a team’s productivity was taken as “lines of code delivered” divided by “hours spent on analyzing, designing, implementing and repairing the system”. The authors point out that other measures than LOC “could have been used, but this one fulfilled our requirements and could be collected easily. More importantly, we are looking at the relative size of systems addressing similar requirements and, therefore, of similar functionality.”

This seems most bizarre. If the systems are all the same, why does the lines of code produced matter in the slightest? Surely all that matters is the time taken to produce the system – especially as this is meant to be testing re-use. If one team could reuse sufficient quantity of code to enable them to write the system in 10,000 lines, taking 100 hours (productivity = 100), but another team wrote an entire 250,000 LOC system from scratch taking 1250 hours (productivity = 200), is the second team really twice as productive?

But the paper seems even stranger than that. It counts the reused code within the total LOC for the team, thus distorting the productivity of a team who pull in a 10,000 line library that provides more functionality than they actually need (compare a team writing a 1,000 line subset of this in 40 hours, with another team who only need to write 10 extra lines to use this library, but who then have an extra 10,000 lines in their final total)

Using this methodology, the paper manages to show a productivity difference of 8.74 between the top and bottom teams, with a factor of 4.8 in LOC submitted. However, if you work back from their figures to calculate the actual lines of code written by the team (as opposed to being in their completed system), there only ends up being a factor of 1.77 in the LOC, and 2.6 in the time taken: still significant, but hardly as impressive:

Project LOC Delivered Reused Productivity Reuse rate LOC Written Time Spent
1 24698 16776 159.34 67.92% 7922 155
2 5105 113 18.23 2.21% 4992 280
3 11687 3061 32.01 26.19% 8626 365
4 10390 1545 34.3 14.87% 8845 303
5 8173 3273 51.4 40.05% 4900 159
6 8216 3099 31.12 37.72% 5117 264
7 9736 4206 69.54 43.20% 5530 140
8 5255 0 19.9 0.00% 5255 264

(italicised columns extrapolated from published results).

It’s also notable that the fastest/slowest teams in question are entirely different with each approach, and the “outlying” teams which deserve special explanation in the paper fare considerably differently.

Team 6 which seems to have a low productivity in the original paper, “considering its reuse rate”, is explained in terms of the team providing a particularly sophisticated “gold-plated” GUI. By solely measuring time taken to do the task, however, this team is one of the fastest.

Gotta go find those other papers…


Boo Hoo

by Tony in * commentary, * one-offs

I was starting to wonder how we had ever believed we were only weeks away from launch. It was a mass delusion. We either hadn’t seen, or had simply closed our eyes to, all the warning signs. Instead of focussing singlemindedly on just getting the website up and running, I had tried to implement an immensely complex and ambitious vision in its entirety. Our online magazine, the rollout of overseas offices, the development of new product lines to sell on our site – these were all things that could have waited until the site was in operation. But I had wanted to build utopia instantly. It had taken eleven Apollo missions to land on the moon; I had wanted to do it all in one.

— Ernst Malmsten, boo hoo

This is a scary book. Malmsten retells the story of how boo spent $135m over 18 months, to achieve total sales of less than $1.5m, and never really seems to understand just how badly they went wrong. He actually seems to believe they achieved something important, or at least interesting. At the point of Boo’s collapse, we’d built BlackStar to a turnover of $1m per month, with a total operating spend (excluding marketing) of less than $2m in the two years we’d been trading. Our product development costs (i.e the website, and all our fulfilment and customer service systems etc) had been less than $200k, whereas Boo had spent $250k solely on the feasibility study for theirs! By the time they were on the verge of collapse, even after significant cuts, Boo still needed $2m per week to survive. BlackStar, with 100 employees, and still growing fast, needed less than $100k per week.

Although Malmsten attempts to take responsibility for many of the shortcomings, it’s mostly in an “it was someone else’s fault, but I should really have sorted it out” way. Other than the quote above, he never really seems to realise that it wasn’t the execution that was flawed – it was the entire approach.

Malmsten even has the gall to finish the book with the final press release on Boo’s bankruptcy which finished: We believe very strongly that in boo.com there is a formula for a successful business and fervently hope that those who are now responsible for dealing with the company will be able to recognize this.

Hopefully the readers of the book will be able to see what Malmsten can’t.