Complexity Metrics and Software Maintenance - a Systematic Literature Review

Autor: Tim • September 7, 2018 • 5,838 Words (24 Pages) • 745 Views

Page 1 of 24

...

Decades of research has failed to produce a consensus on which one complexity metric best captures the inherent com-plexity of a software system. A large number of metrics are used to measure the complexity of a given piece of code and a few that are used relatively more than the others are de-scribed below.

2.2 Lines of Code

This is the most commonly used metric and it is a count of all the lines in the code including comments and blank lines. Over the years, researchers found out that there is a relationship between the lines of code in a system and the defect rate of the said system. Early studies [2] show that this relationship is a negative one i.e. bigger the size of the module, lower will be the defect rate. However, more recent studies [3] have corrected this notion of existence of a neg-ative relation between defect rate and LOC. The relation-ship between the two is now accepted to be more curvilinear which means that defect density decreases with increase in size but when modules become very large, the defect rate goes up again. The data in Table 1 [3] [9] represents this curvilinear relation between size and defect rate.

2.3 Source Lines of Code

This is similar in nature to LOC but it ignores the blank lines, comments etc. in the code and is a count of only the instructions statements or executable lines in the code.

2.4 Number of Functions

This represents the total number of distinct functions that are part of a le.

2.5 McCabe’s Cyclomatic Complexity

---------------------------------------------------------------

Maximum SLOC Modules

Avg Defect/KSLOC

1.5

100

1.4

158

0.9

251

0.5

398

1.1

630

1.9

1000

1.3

>1000

1.4

Table 1: Curvilinear relationship between module size and defect rates in Ada modules

Metrics

De nition

Length(N)

N= N1 +N2

Volume(V)

V= N log2 (n1+ n2)

Level(lv)

V =V = (2=n1) (n2=N2)

Di culty(D)

D=1/lv

E ort(E)

E= V/lv

Table 2: Di erent Halstead's Software Sciene Met-rics

McCabe's complexity metrics [12] are de ned over com-pleter functions unlike other metrics which are de ned at the le level. The cyclomatic complexity of a program is de ned as the number of regions in the control ow graph of that program. A program with no bifurcation in its CFG i.e. no loops, no if statements etc. has a cyclomatic com-plexity of 1. Formally, suppose that the CFG of a program is de ned as G. For a graph G with n vertices, e edges and p exit points the cyclomatic complexity v is de ned as follows: v(G) = e-n+2p

For e.g. [13] the program with its CFG shown in Figure 1 has a cyclomatic complexity of 3.

[pic 1]

Figure 1: A graph with cycolmatic complexity of 3

2.6 Halstead’s Software Science Metrics

These metrics are based on an analogy between program-ming and natural language processing. The basic idea is that similar to a natural language, if the number of distinct operators and operands used in program is less, then it is easier for a programmer to read and understand that pro-gram. In other words, higher the redundancy in the code, lower is the mental e ort required to understand that code. The measures used are as follows:

n1 - the number of distinct operators n2 - the number of distinct operands N1 - the total number of operators N2 - the total number of operands

---------------------------------------------------------------

- SOFTWARE METRICS CORRELATION

In [6], Herraiz and Hassan try to nd the metrics out of a set of nine code metrics which are irrelevant and do not contribute any more information compared to other metrics. They evaluated the metrics using the ArchLinux repository containing more than a million les written in C. They di-vided the sample into header les and non-header les. They found that for non-header les all the complexity metrics are highly correlated with lines of code and therefore provide no extra information. Cyclomatic complexity is related to the structure of the code, the more bifurcations and loops, the higher their values will be. Since header les in C mostly consist of de nitions of di erent functions and hardly con-tain any implementations, correlation between cyclomatic complexity and other metrics is generally very low irrespec-tive of the lines of code. Figure 2 clearly demonstrates that all metric have a high degree of correlation with Lines of Code. The only relatively poor correlation co-e cients are the one indication a relation betwenn MCYCLO (maximum cyclomatic complexity in a le) and LOC and ACYCLO (av-erage cyclomatic complexity of a le) and LOC. However, this was attributed to inclusion of header les in calculating co-e cients for these two metrics. When non-header les were anlayzed separatey, it was con rmed that both ACY-CLO and MCYCLO are highly correlated to Lines of Code. Since the header les are generally at, cyclomatic complex-ity may not be the best metrics for measuring complexity of header les. Hence, one of the main conclusions

...

Download: txt (36.9 Kb) pdf (92.9 Kb) docx (31.3 Kb)

Continue for 23 more pages »

Read Full Essay Save to my library

Only available on Essays.club