‘* papers’ Category Archives


Twenty dirty tricks to train software engineers

by Tony in * papers

Many employers find that graduates and sandwich students come to them poorly prepared for the every day problems encountered at the workplace. Although many university students undertake team projects at their institutions, an education environment has limitations that prevent the participants experiencing the full range of problems encountered in the real world. To overcome this, action was taken on courses at the Plessey Telecommunications company and Loughborough University to disrupt the students’ software development progress. These actions appear mean and vindictive, and are labeled ‘dirty tricks’ in this paper, but their value has been appreciated by both the students and their employers:

  1. Give an Inadequate Specification
  2. Make Sure All Assumptions are Wrong
  3. Change the Requirements and Priorities
  4. Present an Uncertain and Naive Customer
  5. Have Conflicting Requirements and Pressures
  6. Present Customers with Conflicting Ideas
  7. Present Customers with Different Personalities
  8. Ban Overtime
  9. Give Additional Tasks to Disrupt the Schedule
  10. Change the Deadlines
  11. Introduce Quality Inspections
  12. Present a ‘Different Truth’
  13. Change the Team
  14. Change the Working Procedures
  15. Upgrade the Software
  16. Change the Hardware
  17. Crash the Hardware
  18. Slow the Software
  19. Disrupt the File Store
  20. Say “I Told You So!”

— Ray Dawson, Proceedings of the 22nd International Conference on Software engineering


Sorting Out Software Complexity

by Tony in * papers

Why are there so many different correct approaches to designing the solution to a problem? Because the solution space is so complex.

Why is estimation so difficult? Because our solutions are so much more complicated than our problems appear to be.

Why is reuse-in-the-large so elusive? Because complexity magnifies the solution diversity that limits the value of large-scale reuse.

Why do top people matter? Because it takes considerable intelligence and skill to overcome complexity.

Why do the best designers use iterative, heuristic approaches? Because there are seldom any simple and obvious design solutions, and even more seldom any optimum ones.

Why is software maintenance such a time consumer? Because it is seldom possible to determine, at the outset, all the ramifications of a problem solution.

Why does software have so many errors? Because it is so difficult to get it right the first – of even the Nth time.

Complexity, I would assert, is the biggest factor involved in anything having to do with the software field. It is explosive, far reaching, and massive in its scope.

— Robert L Glass, Communications of the ACM, November 2002


How To Pick Eagles

by Tony in * papers

All available research indicates that the ability of a manger to predict how a future employee will perform, based upon a one hour interview, is very low. Yet most managers have great confidence in their predictive ability based upon impressions formed in a brief interview.

There appear to be two main reasons for this mismatch of effectiveness. First, interviewees tend to give sociably desirable answers to the interviewer. Second, the interviewer’s biases are formed by a poor research methodology. Assume an interviewer interviews five people for a high-scope job. After interviewing each candidate, the interviewer selects the person who was rated the highest. This person joins the company and is an above-average employee for the next two or three years. The interviewer’s impressions about his predictive ability are reinforced because he sees the positive results of his interview. What the interviewer doesn’t know is how the other four would have performed. They may even be superior to the person hired.

— Robert A. Zawacki, “How To Pick Eagles”, Datamation, September 15, 1985.


Two Subcultures of Software Engineering

by Tony in * papers

… software engineering is polarized around two subcultures – the speculators and the doers. The former invent but do not go beyond publishing novelty, hence never learn about the idea’s usefulness – or the lack of it. The latter, not funded for experimentation but for efficient product development, must used proven, however antiquated, methods. Communication between them is sparse …

— L A Belady and R Leavenworth, “Program Modifiability”, IBM Research Report RC8147, cited in Robert Glass, “Reuse: Software Parts – Nostalgia and Deja Vu”

(If anyone can find me a copy of the original Belady and Leavenworth report, I’d be very grateful!)


Benchmarking Software Development Productivity

by Tony in * papers

This interesting paper by Maxwell and Forselius (IEEE Software, Jan/Feb 2000) attempted to produce benchmarks on productivity taken from a sample of 26 companies in Finland.

Unfortunately there wasn’t really enough data for many of the benchmarks to convey much meaning (e.g. they discovered that programming language used wasn’t significant, but had to note with this that this result may have been due to the fact that COBOL was used in most of the language combinations that had enough observations to analyse).

One interesting result from it, though, was the Mean Productivity index (in Function Points per month), that they produced from various business sectors (which was the 2nd most significant contribution to productivity variance, after the company itself):

Manufacturing 0.337
Retail 0.253
Public Admin 0.232
Banking 0.116
Insurance 0.116

Measuring Programming Quality and Productivity

by Tony in * commentary, * papers

In the field of computer programming, the lack of precise and unambiguous units of measure for quality and productivity has been a source of considerable concern to programming managers throughout the industry.

This paper, from the IBM Systems Journal in 1978, is one of the earliest by Capers Jones on Software Productivity, but it still seems that little has changed from this assessment in the last 25 years.

Jones discusses the problems with the two most common units of measurements used in IBM at the time: Lines of Code per Programmer Month, and Cost per Defect, showing that these measures can slow down the acceptance of new methods because the methods may – when measured – give the incorrect impression of being less effective than former techniques, even though the older approaches actually were more expensive.

In the last 25 years, Jones has devoted much of his time to studying productivity, and promoting a vast array of metrics, but in this paper he promotes one key coarse measure: “Cumulative defect removal efficiency”, which is the defects found before release divided by the defects found after release. Plotting this against “total defects per KLOC” then produces a useful graph of the “maintenance potential” of a program.

More interestingly, this seems to have been one of the first papers to note that productivity rates decline as the size of the program increases. Jones details that programs of less than 2 KLOC usually take about 1 programmer month/KLOC, whereas programs of over 512 KLOC take 10 programmer months/KLOC. Similarly, when it comes to maintenance, smaller changes imply larger unit costs, because it is necessary to understand the base program even to add or modify a single line of code, and “the overhead of the learning curve exerts an enormous leverage on small changes”.


Sackman Revisited

by Tony in * commentary, * papers

[Continued from Substantiating Programmer Variability]

The Dickey paper helpfully reproduces a table of the key data from the Sackman experiment. (I haven’t been able to find the original version of the Sackman paper yet, so I haven’t been able to verify the data, but I’ll assume for now that IEEE Transactions verified it when they published Dickey’s paper!)

I’ve eliminated the developers who produced their solutions in machine code, the one developer who had no prior experience of time-sharing, and the developer whose first experience of JTS was this test, leaving a result sample of 7 developers. I’ve also combined the time taken to code the solution, with the time taken to debug it. The average debug time for the on-line vs. the off-line group for the more difficult test (Algebra) was 29 hours vs 28 hours, so I’m chosing not to further subdivide according to platform.

The results are quite illuminating:

Developer Hours
1 26
11 30
8 50
12 54
5 81
10 87
4 115
Developer Hours
8 4
11 5.5
12 11
5 13
10 18
1 24.5
4 32

In each case the distance between the worst time and the median is approx 2:1. From the median to the best is just over 2:1 for Algebra, and just over 3:1 for Maze: the “superprogrammers” don’t seem that better any more.

Even more notable is the performance of Programmer 1. Although he is the fastest at solving the Algebra task, he is one of the worst at the Maze task (this was due to a much higher time spent in development of the Maze solution than all the other programmers, so the issue of on-line vs off-line debugging seems not to be relevant here either).

When we take the total time spent on the two tasks combined the picture is even more surprising:

Developer Hours
11 35.5
1 50.5
8 54
12 65
5 94
10 105
4 147

Now we have factor of 1:2.25 from median to worst, and simply 1.8:1 from best to median.

In case all these numbers have made your eyes glaze over, I’ll restate it: this is the test that is often cited as showing a productivity variance of 28:1!


Substantiating Programmer Variability

by Tony in * commentary, * papers

[Continued from Programmer Variability]

In the same issue as the Dickey paper there was another small follow-up article by Bill Curtis attempting to put forward other data in support of the high degree of variability, in light of the problems with the data from Sackman.

The approach this time was simpler, although still aimed at debugging: 27 programmers were given a modular-sized Fortran program with a simple bug, and the time taken to find it was measured. (There were actually two such experiments, but the first was deemed too difficult). The times taken were then grouped and tabulated:

Mins to Find # of People
1-5 5
6-10 10
11-15 4
16-20 3
21-25 1
26-30 0
31-35 0
36-40 1

(one programmer could not find the bug at all, giving up after 67 minutes)

Although there is again a factor of 20+:1 between best and worst, Curtis points out this relies on having both a brilliant programmer and a horrid one in the same sample, and that this is thus not a particularly sustainable measure of performance variability among programmers.

In addition he points out that:

Substantial variation in programmer performance can be attributed to individual differences in experience, motivation, intelligence etc. Thus, important productivity gains could be realized through improved programmer selection, development, and training techniques. These gains would be achieved through eliminating the skewed tails often observed in distributions of programmer performance data.

As with the original Sackman conclusion, the emphasis here is on removing the weaker programmers (although potentially by training, rather than not hiring – Curtis points out that the programmer who failed to find the bug at all substantially improved in later trials when he had gained more programming experience), not on attempting to find the brilliant ones.

[Continued in Sackman Revisited]


Programmer Variability

by Tony in * commentary, * papers

[Continued from: Exploratory Experimental Studies Comparing Online and Offline Programming Performance]

In July 1981, thirteen and a half years after the Sackman paper, Proceedings of the IEEE published a little-known response from Thomas Dickey.

In it, he points out that the now oft-quoted 28:1 productivity difference is an inaccurate reading of the data. The CACM article usually referenced was only a summary of the full paper, excluding the actual data, so Dickey returned to the original sources and discovered that the ratios cited are misleading, as they do not differentiate between the impact of:

  • programmers on the time-sharing system versus those on the batch system
  • those who programmed in JTS, an ALGOL variant (one of whom learnt the language in order to do the experiment), and those who programmed in machine code
  • the programmers who had poor, or in some cases no, knowledge of the time-sharing system.

In this case, the 28:1 figure (which, we should remember, only applies to debugging time), was the difference between the 6 hours taken by one programmer to debug his JTS solution on a time-share platform, versus 170 hours taken to debug a machine-language solution in a batch environment!

After accounting for the differences in the classes, only a range of 5:1 can be attributed to programmer variability. The casual researcher, in encountering Sackman’s paper, seizes on the 28:1 figure primarily to support arguments to the effect that programmer variability is “orders of magnitude” larger than effects due to language or system differences.

Dickey also goes on to show how the figure made it into common use: The CACM paper was cited at the NATO Conference on Software Engineering, 1968, which in turn provoked an article in Infosystems: “The Mongolian Hordes Versus Superprogrammer” (J L Ogdin, December 1973), bringing the number to the wider industry, to be picked up and used by Yourdon, Boehm, Brooks, Constantine, Weinberg et al, often mutating in the process.

Strangely neither this paper nor its conclusions seem to have made much of an impact on the popular view.

[Continued in Substantiating Programmer Variability]


Exploratory Experimental Studies Comparing Online and Offline Programming Performance

by Tony in * commentary, * papers

I spent a very productive morning at the library in search of the original articles on “order of magnitude productivity differences”.

The original paper that everyone seems to point back to, either directly, or by pointing to other references that in turn point this one, recursively, is this article by Sackman, Erikson, and Grant in CACM 1968 (two issues before Dijkstra’s famous “Go To statement considered harmful”). This claims to be a report on one of the “first known studies that measure programmers” performance under controlled conditions for standard tasks, conducted by the authors at DARPA.

The background of the research was to investigate experimentally the differences in productivity between time-shared computing systems over batch-processed ones. Time-sharing was becoming more and more popular, and there was much spirited controversy on both sides of the debate. The proponents of time-sharing claimed that the productivity benefits easily outweighed the associated costs of moving to such a system. Detractors claimed that “programmers grow lazy and adopt careless and inefficient work habits under time sharing”, leading to a performance decrease.

The issue of whether to move to time-shared systems was fast becoming one of the most significant choices facing managers or computing systems, but little scientific research had been done.

DARPA therefore carried out two studies of on-line versus off-line debugging performance, one with experienced programmers (averaging 7 years experience), and the other with trainees.

The experienced programmers were divided into two sub-groups and each given two tasks, one group working (individually, not as a team) on task A on the time-sharing system, and task B off-line, the other group the reverse of this. The tasks were moderately difficult: finding the one and only path through a 20×20 cell maze given as a 400 item table, with each cell containing the directions in which movement is possible from that cell, and interpreting inputted algebraic equations and computing the value of the single dependent variable. However, in the Algebra problem, the developers were referred to a published source for suggested workable logic to solve the problem.

The study was mostly interested in debugging time, which was considered to begin when the programmer had successfully compiled a program with no serious compiler errors, and ended when tests were shown to be successfully run. (The underlying assumption being that the time to actually write the code would be unchanged whether an on-line or off-line environment was being used, whereas the approach to finding and fixing bugs would differ significantly.)

The results showed that, for the experienced programmers, using an on-line environment dropped the mean debug time for the Algebra problem from 50.2 hours to 34.5 hours, and for the Maze program from 12.3 hours to 4.0 hours, thus validating the idea that the productivity gains from developing on a time-sharing platform would indeed probably outweigh the costs of setting up such an environment.

Almost in passing however, the researchers also discovered another interesting fact: that the difference between the best, and the worst, developer, on any given metric, was much higher than expected:

Algebra Maze
Coding Time 16:1 25:1
Debug Time 28:1 26:1
Size of Program 6:1 5:1
CPU Time 8:1 11:1
Run Time 5:1 13:1

To paraphrase a nursery rhyme:

When a programmer is good
He is very, very good,
But when he is bad,
He is horrid

The authors thus concluded that “validated techniques to detect and weed out these poor performers could result in vast savings in time, effort and cost”. Interestingly they do not seem to consider any benefit from attempting to detect the best performers – only the worst!

[Continued in: Programmer Variability]