Exploratory Experimental Studies Comparing Online and Offline Programming Performance
I spent a very productive morning at the library in search of the original articles on “order of magnitude productivity differences”.
The original paper that everyone seems to point back to, either directly, or by pointing to other references that in turn point this one, recursively, is this article by Sackman, Erikson, and Grant in CACM 1968 (two issues before Dijkstra’s famous “Go To statement considered harmful”). This claims to be a report on one of the “first known studies that measure programmers” performance under controlled conditions for standard tasks, conducted by the authors at DARPA.
The background of the research was to investigate experimentally the differences in productivity between time-shared computing systems over batch-processed ones. Time-sharing was becoming more and more popular, and there was much spirited controversy on both sides of the debate. The proponents of time-sharing claimed that the productivity benefits easily outweighed the associated costs of moving to such a system. Detractors claimed that “programmers grow lazy and adopt careless and inefficient work habits under time sharing”, leading to a performance decrease.
The issue of whether to move to time-shared systems was fast becoming one of the most significant choices facing managers or computing systems, but little scientific research had been done.
DARPA therefore carried out two studies of on-line versus off-line debugging performance, one with experienced programmers (averaging 7 years experience), and the other with trainees.
The experienced programmers were divided into two sub-groups and each given two tasks, one group working (individually, not as a team) on task A on the time-sharing system, and task B off-line, the other group the reverse of this. The tasks were moderately difficult: finding the one and only path through a 20×20 cell maze given as a 400 item table, with each cell containing the directions in which movement is possible from that cell, and interpreting inputted algebraic equations and computing the value of the single dependent variable. However, in the Algebra problem, the developers were referred to a published source for suggested workable logic to solve the problem.
The study was mostly interested in debugging time, which was considered to begin when the programmer had successfully compiled a program with no serious compiler errors, and ended when tests were shown to be successfully run. (The underlying assumption being that the time to actually write the code would be unchanged whether an on-line or off-line environment was being used, whereas the approach to finding and fixing bugs would differ significantly.)
The results showed that, for the experienced programmers, using an on-line environment dropped the mean debug time for the Algebra problem from 50.2 hours to 34.5 hours, and for the Maze program from 12.3 hours to 4.0 hours, thus validating the idea that the productivity gains from developing on a time-sharing platform would indeed probably outweigh the costs of setting up such an environment.
Almost in passing however, the researchers also discovered another interesting fact: that the difference between the best, and the worst, developer, on any given metric, was much higher than expected:
|Size of Program||6:1||5:1|
To paraphrase a nursery rhyme:
When a programmer is good
He is very, very good,
But when he is bad,
He is horrid
The authors thus concluded that “validated techniques to detect and weed out these poor performers could result in vast savings in time, effort and cost”. Interestingly they do not seem to consider any benefit from attempting to detect the best performers – only the worst!