Examples of Size Measures

In our exploration of different units of measurement for software, we'll take a look at some of the most commonly used ones, including:

• Function points;

• Number of bubbles on a data flow diagram (DFD);

• Number of entities on an entity relationship diagram (ERD);

• Count of process/control (PSPEC/CSPEC) boxes on a structure chart;

• Number of "shalls" versus "wills" in a government specification;

• Amount of documentation;

• Number of objects, attributes, and services on an object diagram.

There are lots of academic arguments over which is best, and there is merit to measuring whichever fits your application best. It doesn't matter which one you choose as long as you use it consistently and remember that historical data is your best ally.

Lines of Code as a Unit of Size

How do you know how many LOC will be produced before they are written or even designed? Why does anyone think that LOC has any bearing on how much effort will be required when product complexity, programmer ability/style, or the power of the programming language are not taken into consideration? How can a vast difference in the number of LOC needed to do equivalent work be explained? Questions like these have caused the LOC measure to become infamous, yet it is still the most widely used metric. The relationship between LOC and effort is not linear. Despite the introduction of new languages, the average programmer productivity rate remains, over the last two decades, at about 3,000 delivered LOC/programmer year. This tells us that no matter how languages improve, how hard managers push, or how fast or how much overtime programmers work, cycle time improvements cannot come directly from squeezing more out of programming productivity. The real concerns involve software functionality and quality, not the number of LOC produced.

Estimating LOC Using Expert Opinions and Bottom-Up Summations

We will assume that our WBS contains many levels of decomposition in the product/project hierarchy. Requirements on the WBS product hierarchy have been decomposed into actual software system components, beginning at a generic subsystem level (such as Accounts Receivable or Accounts Payable, in a financial system) and refined into a very precise level or primitive level of abstraction (such as Editor or GET/PUT I/O routines or Screen Formatter). This lowest level can rarely be known at the time of the initial sizing; it is usually completed much later in the project. However, typically several levels of abstraction can be determined even at very early stages of project planning. The most complete WBS that can be derived will be the most helpful in accurate sizing that leads to accurate estimating.

When the WBS has been decomposed to the lowest level possible at this time, a "statistical" size may be created through a sizing and summing process. The size of each component may be obtained by asking experts who have developed similar systems, or by asking potential developers of this system to estimate the size of each box on the lower levels of the WBS. When the sizes are summed, the total is called a "bottom-up" size estimate. A much better size estimate is usually obtained if each estimator is asked to provide an optimistic, pessimistic, and realistic size estimate. Then a beta distribution may be formed by multiplying the realistic size estimate by 4, adding the optimistic and pessimistic, and dividing the total by 6. This weighted average is a comfort to the inherent uncertainty of estimating. For example, if a given window object appears on the WBS for a system, the supporting code required to process the editing for that window might be estimated at between 200 and 400 lines of code, with a belief that it will be closer to 200. Requesting the estimator to think about optimistic and pessimistic scenarios might produce this final estimate:

The number of thousands of source lines of code (KSLOC) delivered is a common metric, carried through to estimations of productivity, which are usually expressed as KSLOC/SM or KLOC/SM (where SM = staff-month). Barry Boehm, one of the most highly regarded researchers in this area, has been looking for many years for a better product metric to correlate with effort and schedule, but he has not found one. LOC is a universal metric because all software products are essentially made of them.

Guidelines for Counting LOC

Counting lines of existing code manually is far too tedious and time-consuming, so most organizations purchase or build an automated LOC counter. This can raise some tricky questions about what exactly is a line of code. Again, it doesn't matter so much how you define LOC, as long as the definition is used consistently. The following counting guidelines have been in use for many years, both for the recording of existing program size and for the estimation of size for programs to be developed:

® Ensure that each "source code line" counted contains only one source statement (if two executable statements appear on one line, separated by a semicolon, then the count is two; if one executable statement is spread across two "physical" lines, then the count is one. Programming languages allow for all manner of coding options, but it is usually pretty easy to determine a single executable statement because the compiler or interpreter has to do it.

® Count all delivered, executable statements—the end user may not directly use every statement, but the product may need it for support (i.e., utilities).

® Count data definitions once.

® Do not count lines that contain only comments.

® Do not count debug code or other temporary code such as test software, test cases, development tools, prototyping tools, and so on.

® Count each invocation, call, or inclusion (sometimes called compiler directive) of a macro as part of the source in which it appears (don't count reused source statements).

® Translate the number of lines of code to assembly language equivalent lines so that comparisons may be made across projects.

The first and second columns of Table 10-1 represent a widely used method of translating SLOC in various languages to the average number of basic assembler SLOC. (Note that SLOC and LOC are used interchangeably.) Many project managers want a translation of all languages into basic assembler so that an apples-to-apples comparison may be made across projects. Another use of this data is to project from a known language into a conversion language. For example, suppose a 50,000 LOC system written in C will be converted to C++. Using numbers from Table 10-1, the basic Assembler SLOC for C is 2.5, so the 50,000 SLOC system written in C would be equivalent to 125,000 if written in Assembler (50,000 x 2.5). A 125,000 Assembler language system, if written in C++, would be equivalent to 125,000/6, or 20,833 SLOC.

Estimating LOC by Analogy

One way to estimate the size of an undeveloped software system is to compare its functionality with existing ones. Imagine that you have an existing software component, Module A, which will have to be rebuilt for a new system. A is 2,345 LOC, and you believe that the new Module A will be more efficient (you've learned through maintaining the original A how to make the code tighter), yet you also know that there are some additional features that can be added. Then, A may be estimated at 3,000 LOC.

This is certainly not a very accurate method because A may be written in a different programming language, in a different application domain, using different algorithms, with a different level of complexity, with untried functionality, in a different level of reality (simulation, emulation, actual application).

Consider another example: software converted from COBOL, using no design technique, to software written in C++, using an object-oriented design. The size decreased because it was designed better the second time, and the functionality and quality went up. However, the cost per line of code was 10% higher. Is this a productivity loss as it might appear? Of course it is not. It was an improvement in productivity as well as functionality and maintainability.

Advantages of Using LOC as a Unit of Measure

Advantages of using lines of code as a unit of software measurement include: ® It is widely used and universally accepted.

® It permits comparison of size and productivity metrics between diverse development groups. ® It directly relates to the end product. ® LOC are easily measured upon project completion.

® It measures software from the developer's point of view—what he actually does (write lines of code).

® Continuous improvement activities exist for estimation techniques—the estimated size can be easily compared with the actual size during post-project analysis. (How accurate was the estimate? Why was it off by a certain percent? What can be learned for the next project's size estimation?)

Table 10-1. Conversion from Programming Language to Basic Assembler SLOC to SLOC per

Function Point

Table 10-1. Conversion from Programming Language to Basic Assembler SLOC to SLOC per

Function Point

Language

Basic Assembler SLOC (Level)

Average SLOC per Function Point

Basic Assembler

1

320

Autocoder

1

320

Macro Assembler

1.5

213

2.5

128-150

DOS Batch Files

2.5

128

Basic

3

107

LOTUS Macros

3

107

ALGOL

3

105-106

COBOL

3

105-107

FORTRAN

3

105-106

JOVIAL

3

105-107

Mixed Languages (default)

3

105

Pascal

3.5

91

COBOL (ANSI 85)

3.5

91

RPG

4

80

MODULA-2

4.5

80

PL/I

4.5

80

Concurrent PASCAL

4

80

FORTRAN 95

4.5

71

BASIC (ANSI)

5

64

FORTH

5

64

LISP

5

64

PROLOG

5

64

LOGO

5.5

58

Extended Common LISP

5.75

56

Language

Basic Assembler SLOC (Level)

Average SLOC per Function Point

RPG III

5.75

56

C++

6

53

JAVA

6

53

YACC

6

53

Ada 95

6.5

49

CICS

7

46

SIMULA

7

46

Database Languages

8

40

CLIPPER DB and dBase III

8

40

INFORMIX

8

40

ORACLE and SYBASE

8

40

Access

8.5

38

DBase IV

9

36

FileMaker Pro

9

36

Decision Support Languages

9

35

FOXPRO 2.5

9.5

34

APL

10

32

Statistical languages (SAS)

10

32

DELPHI

11

29

Object-Oriented Default

11

29

OBJECTIVE-C

12

27

Oracle Developer/2000

14

23

SMALLTALK

15

21

awk

15

21

EIFFEL

15

21

UNIX Shell Scripts (PERL)

15

21

Language

Basic Assembler SLOC (Level)

Average SLOC per Function Point

4th Generation Default

16

20

Application Builder

16

20

COBRA

16

20

Crystal Reports

16

20

Datatrieve

16

20

CLIPPER

17

19

Database Query Languages (SQL)

25

13-16

HTML 3.0

22

15

Information Engineering Facility (IEF)/lnformation Engineering Workbench (IEW)

23

14

EASYTRIEVE+

25

13

SQL (ANSI)

25

13

Spreadsheet Languages (EXCEL)

50

6

QUATTRO PRO

51

6

Graphic Icon Languages

75

4

Disadvantages of Using LOC

Disadvantages of using lines of code as a unit of software measurement include the following:

® LOC is difficult to estimate for new software early in the life cycle.

• Source instructions vary with the type of coding languages, with design methods, and with programmer style and ability.

® There are no industry standards (such as ISO) for counting lines of code.

® Software involves many costs that may not be considered when just sizing code—"fixed costs" such as requirements specifications and user documents are not included with coding.

• Programmers may be rewarded for large LOC counts if management mistakes them for productivity; this penalizes concise design. Source code is not the essence of the desired product—functionality and performance are.

® LOC count should distinguish between generated code and hand-crafted code—this is more difficult than a "straight count" that could be obtained from a compiler listing or code-counting utility.

® LOC cannot be used for normalizing if platforms or languages are different.

® The only way to predict a LOC count for new software to be developed is by analogy to functionally similar existing software products and by expert opinion, both imprecise methods.

® Code generators often produce excess code that inflates or otherwise skews the LOC count.

Unfortunately, productivity is often measured by LOC produced. If a programmer's average output increases from 200 LOC per month to 250 LOC per month, a manager may be tempted to conclude that productivity has improved. This is a dangerous perception that often results in encouraging developers to produce more LOC per design. Not only is the developer rewarded with a seemingly higher productivity rating, but he is also perceived to produce cleaner code. Many organizations use this metric to measure quality:

Project Management Made Easy

Project Management Made Easy

What you need to know about… Project Management Made Easy! Project management consists of more than just a large building project and can encompass small projects as well. No matter what the size of your project, you need to have some sort of project management. How you manage your project has everything to do with its outcome.

Get My Free Ebook


Post a comment