Code Metrics

Meaningful code metrics

The software industry has a cyclic love-hate relationship with code metrics. At times, metrics appear to be quite helpful and at other times, doubt regarding their efficacy casts a short-lived shadow, which, once gone, the cycle starts over.  Because it’s my bag, baby, it appears that closing in on 2009, we’re entering a period of disco-induced code metrics love. In particular, because two interestingly hip blog entries and an innovative open source project suggest simple, yet helpful ways to leverage metrics.

First, my friend Erik Doernenburg, posted an interesting entry entitled “How toxic is your code?” in which he suggests utilizing a simple bar chart that measures the toxicity of each class in a code base. Toxicity, in this case, is a function of a particular metric compared to a threshold. For instance, Erik notes that

the method length metric has a threshold of 30. If a class contains a method that is longer it gets points proportional to the length of the method in relation to the threshold, e.g. a method with 45 lines of code would get 1.5 points because its length is 1.5 times the threshold. The score for all elements is added up. So, if a class comprises two methods, one that is 45 lines and another that is 60 lines long, the method length component of the score for the class will be 3.5 points.

Erik then goes on to list 11 various metrics and their corresponding threshold– indeed, using 11 hip metrics sounds a bit toxic as I’ve found, many metrics are inter-related and thus, cyclomatic complexity, method size, and fan-out are usually adequate indications of issues (that is, should these metrics be large for a class). Nevertheless, the applicability of the metrics he’s listed is easily enough to obtain and Eric’s even included a tiny VBA utility to generate the toxicity chart.

James Shore, the author of Oreilly’s The Art of Agile Development, recently wrote an engaging blog entry dubbed “An Approximate Measure of Technical Debt” in which he suggests that one can measure technical debt via lines of code. Accordingly, the more bogue lines of code a code base has, the higher the technical dept conceivably becomes.

James asserts that

If we define technical debt as high cost of change, then SLOC fits the bill perfectly. Estimation gurus have long known that lines of code are correlated to effort and defects. In fact, many estimation tools work by taking a size estimate (either in lines of code or its language-neutral equivalent, function points), then running it through an algorithm that estimates project length, effort, and cost.

Indeed, baby! The fewer lines of code, the better! James goes on to suggest a new metric, Spag, which is a convenient way to count executable lines of code (i.e. “one spag equals 1,000 statements”). His emphasis on executable lines of code is important as it does drastically reduce the overall SLOC metric; for instance, in James’s HelloWorld example, executable code is 1/5 the total SLOC.

Lastly, I was recently informed of a fairly captivating open source project dubbed Sonar — it’s a dashboard that reports various metrics over time with snazzy charts and data points. Quality management dashboards aren’t new and frankly, from my experience, it appears few teams actually use them (and believe me when I say even fewer teams will buy them!); nevertheless, this is another attempt at capturing a user base. Currently, Sonar requires the project you wish to monitor be built using Maven 2, which is bound to scare off a large portion of the population; however, I’m betting eventually they’ll provide some Ant targets to facilitate adoption. They’ve even provided a Hudson plug-in!

All in all, code metrics are what you make of them — they can be useful and they can also be meaningless. Applying them in simple and innovative ways can unveil a whole new world of meaning and ultimately help you manage and maintain a code base more effectively.

Code coverage coterie confab

Every once in a while the topic of code coverage surfaces, which invariably leads to a number of interesting views and comments. Recently, my friend Meera Subbarao mused about the subject by rhetorically questioning “Is Code Coverage Important?

As Meera points out, code coverage unveils a number of interesting aspects about code including:

how effective our tests are, what parts of our source code are thoroughly executed.

Yet, the most important facet of code coverage is that which isn’t measured; that is:conversation

You can also look at the code coverage report and find out specific areas of code which are not exercised by our tests.

And therein lies the most telling metric that a code coverage report can covey — that which isn’t tested.

Interesting, Meera’s article generated quite a lot of comments, which are all excellent reads as various members of the code coverage coterie weigh in, including Rasmus Grouleff, who notes:

You can achieve 100% coverage without writing a single assert, so coverage as a measure of code quality isn’t really all that good, unless you factor in some sort of measure of how good the tests are along with different kinds of metrics such as cyclomatic complexity, lines of code per method and so on.

Which, of course, is exactly why the numbers code coverage by itself provides are essentially useless– the real value is in what a code coverage report doesn’t divulge– that which isn’t tested!

Interesting, npellow asserts that:

I would rather have a project with low code coverage, and high quality tests, than one with high coverage and low quality tests. Since coverage is a negative metric, the uncovered code in a coverage report actually holds more information than the covered lines. ie if a statement is uncovered, you can be certain it is not tested. The same is not true for a covered line.

Indeed, and even more dangerous is the fact that a hip conditional, while actually “touched” still isn’t proven to be correct– especially if that conditional has a short-circuit operation in it!

Jean-Francois adroitly pointed out that:

having specific unit tests providing 100% coverage of Java Beans methods is generally a total waste of time. Whereas complex methods (with business rules) should call for extra testing efforts.

Right on, baby! Hence, metrics alone are usually boring– it is the combination of various metrics that imparts real meaning.

And weighing in on the value of percent coverage numbers, my good friend Alex Ruiz remarks:

I use code coverage tools as a reference only. I don’t have a specific percentage in mind. What I do is look at the least tested code, and depending on its complexity or how critical that code is, I decide to add a test or not. In another words, I don’t aim for a specific number blindly. I use coverage only to analyze where tests are needed.

You got it, man! Indeed, the confab in this article alone is a wealth of great information regarding code coverage. Once you’ve pondered this article and its manifold comments, take a peek at my friend Jason Rudolph’s blog write-up entitled “Testing Anti-Patterns Potpourri – Quotes, Resources, and Collective Wisdom” — this is a veritable Holy Writ on all things related to where we stand as an industry on testing and related metrics.

The subject of code coverage pops up from time to time– as to whether or not it is important is a matter of opinion; suffice to say, what’s most important is what a coverage report doesn’t tell you directly. As I wrote about in IBM developerWorks’ “Don’t be fooled by the coverage report“, the value of a code coverage report is in exposing

code that hasn’t been tested, on a micro level and on a macro level. You can facilitate deeper coverage testing by analyzing your code base from the top level as well as analyzing individual class coverage. Once you’ve integrated this principle you and your organization can use coverage measurement tools where they really count, such as to estimate the time needed for a project, continuously monitor code quality, and facilitate QA collaboration.

Can you dig it, man?

Irish Continuous Integration gabfest

hudson-groovy

On June 10th, I’ll be giving a hip tutorial on CI at the International Conference on Agile Processes and eXtreme Programming in Software Engineering.

The tutorial will walk students (with or without hangovers) through a series of exercises on a project where an automated build system is created that facilitates compilation, testing, inspection, and deployment. This copasetic build system will then be plugged into a CI server (Hudson in this case, baby) and students will code a series of features using Agile techniques like developer testing, which will ultimately demonstrate how a Continuous Integration process reduces risk and improves software quality. Students will then toast to CI over yet another pint of Guinness for lunch!

All you need for this tutorial is a laptop with Java installed (Java 1.5, please)– I will provide everything else (Hudson, Ant, required libraries, etc) expect beer.

If you are in Ireland (or just feel like going there) the week of June 9th (and you have a high tolerance for alcohol), then you’ll want to come to the International Conference on Agile Processes and eXtreme Programming in Software Engineering, baby! Drop me a line if you are attending (or are located in Ireland)– if you are up for a round of Guinness, even better!

Unambiguously analyzing metrics

Software metrics are objective measurements of particular aspects of code– for instance, Cyclomatic complexity measures complexity without any regard for why code contains a certain number of paths. For metrics to be useful, baby, they must be applied subjectively. In the case of complexity, there may be circumstances that warrant such code (although, I’ve yet to find complex code that still can’t benefit from refactoring). I’ve also found that, on the whole, metrics are more copasetic when combined with other metrics and trended– for instance, complexity alone is somewhat interesting, but pairing complexity with code coverage paints a much more detailed metric that bears understanding. High complexity with low coverage is clearly more risky than the same complexity with high code coverage– even the CRAP metric holds this relationship.

One particular hip metric that I find helpful is the ratio of copy and pasted code within a code base as unknown copy and pasted code will haunt you, man. For instance, copy and pasted code replicates bugs and poorly coded algorithms to name a few nefarious aspects; consequently, understanding what code has been replicated can help teams refactor offending code. Having run various copy and paste analyzers on more code bases than I care to admit, (and because it’s my bag) I’ve found that all code bases have a certain level of offending code that triggers a copy and paste detection. One particular tool, CPD, is nice enough to create a report containing the offending code like so:

<duplication lines="7" tokens="53">
 <file line="36" path="cbd4/blackjack/src/com/stelligent/blackjack/Hand.java"/>
 <file line="42" path="cbd4/blackjack/src/com/stelligent/blackjack/Hand.java"/>
 <file line="48" path="cbd4/blackjack/src/com/stelligent/blackjack/Hand.java"/>
 <codefragment>
 <![CDATA[
    } else if (first.equals("Jack")) {
      if(!second.equals("King") && !second.equals("Queen") && !second.equals("Jack")){
         return Integer.parseInt(second) + 10;
     } else {
      return 20;
    }
   }else if(first.equals("9")){
]]>
</codefragment>
</duplication>

As you can see, CPD reports the total number of lines of copy and pasted code and where that bogue code can be found. This data is certainly helpful; however, it doesn’t paint the entire story– while 7 lines doesn’t seem like all that much code, you’d probably reconsider if it were 7 lines of code in a 30 line code base or more realistically– 700 lines in a few thousand line code base. Therein lies the catch– CPD’s data is really only helpful when viewed on the whole (or a ratio– that is, total lines of copy and pasted code over total lines of code). Unfortunately, CPD doesn’t report the total lines of code scanned– only the total lines of copy and pasted code. For instance, in this sample code base, there were 9 suspected copy and pasted code fragments totaling about 120 lines of code (or CPLOC).

Luckily, there’s another handy tool which reports the total lines of code (or LOC) in a code base– JavaNCSS. Running JavaNCSS yielded a value of about 610 LOC; therefore the ratio of copy and pasted code is CPLOC/LOC or 120/610, which is roughly 20%.

20% CPLOC is probably a bad thing– at a minimum is is worth knowing about. 20% today might not be too important to know, but knowing that it increased to 25% next week would be an indication that things are degrading– likewise, seeing a value decrease over time indicates the code base is actively being improved. Yet, how can teams possibly monitor this trippin’ data?

Reports are hip, but in truth, reports by tools like CPD are essentially read once– the first time they are generated. After that, it’s anyone’s guess when someone will actively read the report again. Hence, I find it particularly helpful to essentially throw the report out and let the build itself proactively tell me when a particular metric gets out of hand. This essentially means that my build has to monitor a particular metric– and in the case of the CPLOC ratio, my build has to gather data from two sources– JavaNCSS’s report and CPD’s.

Fortunately, this is easy with Groovy– if your instance, you are using Ant for builds, you can first generate the two reports as follows:

<target name="cpd">
 <mkdir dir="target/reports"/>
 <taskdef name="cpd" classname="net.sourceforge.pmd.cpd.CPDTask"
   classpathref="classpath"/>
 <cpd minimumTokenCount="10" outputFile="target/reports/cpd.xml" format="xml">
  <fileset dir="src">
   <include name="**/*.java"/>
  </fileset>
 </cpd>
</target>

The code above generates a CPD XML report from all the code in a src directory and the following code creates a JavaNCSS report from the same code base:

<target name="javancss">
 <taskdef name="javancss" classname="javancss.JavancssAntTask"
   classpathref="classpath" />
 <javancss srcdir="src" generateReport="true"
   abortOnFail="true" ccnPerFuncMax="100"
   outputfile="target/reports/javancss_metrics.xml" format="xml" />
</target>

The only high-level step left to do is to put the two metrics together; however, this step actually takes a few sub-steps, baby. For instance, obtaining the total lines of CPLOC requires iterating over a collection of duplication elements in the CPD xml file. Consequently, the following steps detail the effort required to obtain this metric:

  • parse the JavaNCSS xml report and obtain the total LOC
  • parse the CPD xml report and obtain the total CPLOC
  • divide the two and compare the result to some threshold
  • if the threshold is exceeded, fail the build

Groovy, by the way, is particularly well suited for such a task (as if you didn’t know that, man?)– parsing XML with Groovy is practically effortless– like disco dancing, eh? For instance, obtaining the total LOC from JavaNCSS’s xml file is as easy as

int ncss = Integer.parseInt(jncssroot.packages.total.ncss.text())

Note, I’m coercing integer values as I’d like to divide (and round) my result– if I don’t explicitly specify int‘s I’ll be left with String division, which doesn’t work so well.

Parsing CPD’s xml document is slightly more complex– slightly in that it takes 3 times as much code:

def cpdtot = 0
cpdroot.duplication.each { elem ->
 cpdtot += Integer.parseInt(elem.@lines.text())
}

Again, parsing an XML document yields String values; accordingly, I need to use Integer‘s parseInt method.

Next, all I need to do is divide the two and, in my case, I’m aggressively rounding up via Java’s ceiling call as follows:

def ratio = Math.ceil((cpdtot / ncss) * 100)

Multiplying the result by 100 gives me a percentage value, of course, and lastly, I compare that to a threshold value:

if(ratio > Double.parseDouble(properties.cpd_threshold)){
  ant.fail(message:
   "cut and paste ratio was greater than ${properties.cpd_threshold}%, it was ${ratio}%")
}

Puttin’ it all together, baby, yields a groovy Ant script with a hip target:

<target name="cpd-threshold" depends="metrics">
 <groovy>
 def jncssroot = new XmlSlurper().parse("target/reports/javancss_metrics.xml")
 int ncss = Integer.parseInt(jncssroot.packages.total.ncss.text())

 def cpdroot = new XmlSlurper().parse("target/reports/cpd.xml")

 def cpdtot = 0
 cpdroot.duplication.each { elem ->
  cpdtot += Integer.parseInt(elem.@lines.text())
 }

 def ratio = Math.ceil((cpdtot / ncss) * 100)

 if(ratio > Double.parseDouble(properties.cpd_threshold)){
  ant.fail(message:
   "cut and paste ratio was greater than ${properties.cpd_threshold}%, it was ${ratio}%")
 }
 </groovy>
</target>

Reports are hip, but they are usually only read once– the first time they are generated. Rather than waiting to find out that there’s a problem, proactively analyzing a hip metric (such CPLOC/LOC) enables rapid feedback and rapid corrections– is that unambiguous or what?

Chewing the fat over cyclomatic complexity

The hip folks at Enerjy talked with a copasetic crowd recently asking their thoughts on cyclomatic complexity. It seems most (including this disco dancer) find that CCN is an excellent indicator of risk– I haven’t found a better metric yet– in fact, a recent addition to the hip metrics crowd, dubbed C.R.A.P, which was donated by the folks at Agitar, even builds upon CCN by combining code coverage. What do you think, man?

This podcast doesn’t stink, man

I recently had a copasetic conversation with Alberto Savoia regarding the hip CRAP metric– my parents would be appalled with our language (I think the word in question is used at least 135 times); however, we had a good time discussing the efficacy of the metric, its future, and of course, its malodorously applied name.

In short, the CRAP metric effectively marries code coverage with cyclomatic complexity and an effort to delineate risk associated with change. Have a listen (courtesy of JavaWorld) and stay tuned for more podcasts, baby!

Solving the code coverage dilemma with Emma

Because it’s my bag, I pointed out recently that copasetic coverage tools (like Cobertura) can inadvertently hide defects by reporting specific lines of code as covered. But, while I often use Cobertura in my examples, I have found that Emma is fairly smart in its reporting of code coverage values. As such, I often find myself running both hip tools for projects.

For example, the same branchIt method from my previous posting is displayed slightly different in Emma as shown below. Specifically, note line 10– it’s colored yellow in an attempt to show that not all conditions of the conditional were executed.


(Click the picture to view a larger version)

While Emma probably reports coverage more accurately (note, it reports block coverage), I still find Cobertura’s reports more aesthetically pleasing. In truth, I’m a firm believer that coverage reports are more effective at telling you what’s not covered; accordingly, both tools are quite accurate in this regard. Can you dig it, man?

Short-circuiting code coverage

As I’ve written about before, code coverage numbers can be misleading. 100% line coverage and 100% branch coverage doesn’t necessarily mean your code is defect free– all it means is that that code was touched. In fact, code coverage values are far more effective at telling you what’s not covered by a test, man.

There are actually quite a few hip different ways that coverage values can be misleading. One particular coding construct can easily mislead the unsuspecting eye: the short-circuit operator. These operators, such as the short-circuit AND (&&) and OR (||) are quite handy in conditionals. For example, here’s a fictional code snippet with a copasetic OR short-circuit operator in the first conditional.

public void branchIt(int value){
 if((value > 100) || (HiddenObject.doWork() == 0)){
  this.doIt();
 }else{
  this.doItAgain();
 }
}  

In the snippet above, both the doIt and the doItAgain methods aren’t particularly important– what’s key here is the second part of the if conditional. Let’s imagine HiddenObject is a 3rd party API call, an object in another package, etc– the point being, you don’t necessarily know how doWork does its work, you just know that because it’s its bag that if it returns 0 it is a valid condition.

It just so turns out that the doWork method on the HiddenObject method isn’t perfect– it can throw an exception. I’ll force one, but the scenario demonstrates a trippin’ point.

public static int doWork(){
 throw new RuntimeException("surprise!");
}

Thus far we know that if the doWork method is executed an exception will be thrown. Imagine I don’t know that though. Let me write a quick test (via JUnit) to make sure things are working.

public final void testBranchIt() {
 AnotherBranchCoverage clzzUnderTst = new AnotherBranchCoverage();
 clzzUnderTst.branchIt(101);
}

When I run this test, things work fine. I can even check out the coverage values as reported by Cobertura.

Not bad– I’ve got reasonable line coverage here and 100% branch coverage. It turns out the 100% value for branch coverage is a slight defect within Cobertura, but regardless, I’ve got copasetic coverage, don’t I? Check it out, man, the if statement was touched! Because of the short-circuit OR, I triggered a true condition via the 101 value; accordingly, the second clause was short-circuited. The 75% line coverage makes sense too– I failed to execute the else block, hence the other method within the class wasn’t touched and the line coverage value was accordingly deducted.

As you can see, coverage reports are hip in ascertaining what’s not tested (which, in this case, would practically force me to execute the 2nd condition in the if) but don’t depend too heavily on them telling you what’s tested. Otherwise you could end up short-circuiting yourself into a false sense of security…or is that coverage?

Sketching complexity with Groovy

Not long ago, I wrote up a nifty dashboarding application, à la Groovy, in an effort to abate the visual pain associated with report overload syndrome. These kinds of applications are perfect for languages like Groovy as you can knock them out in a matter of hours (including a test suite to verify kosherness).

One of the applications which adds to an increase in the number of reports one must digest is JavaNCSS. This disco tool analyzes a code base and reports everything relating to code length, including class sizes, method sizes, and the number of methods found in a class. What’s more, JavaNCSS reports method complexities, Cyclomatic style.

While all this data is helpful in some scenarios, it is probably too much to digest at a glace, man. For example, the report generated via Ant is shown to the right. That’s a lot of data, don’t you think?

Interestingly, one scenario came to mind that the data from JavaNCSS could provide– what is the distribution of complexity across the entire code base? Because it’s my bag, this actually can be helpful in understanding, at a high level, what’s going on. A “healthy” code base would not have a high degree of highly complex methods; consequently, a highly complex code base would have a high number of complex methods.

Yet, the default report generated via Ant doesn’t really tell this story– while the data is there, it doesn’t stand out. What’s needed is a visual guide that quickly demonstrates the distribution of complexity– in this case, something like a bar chart would work just fine.

Before a chart can be generated; however, the data needs to be mined. This is where Groovy comes in. Via its copasetic built-in XML parsing, I can quickly get all the data I need to understand the distribution of complexity. For example, I was interested in five different ranges of complexity:

  • Methods with a CCN of 1
  • Methods with a CCN between 2 and 5
  • Methods with a CCN between 6 and 10
  • Methods with a CCN over 10 but less than or equal to 20
  • Everything greater than 20

These ranges could, of course, vary depending on what you’re trying to understand. In my case, I wanted to get a feel for the number of really simple methods (like getters and setters) and constructors (which JavaNCSS treats just like normal methods by reporting CCN). A healthy code base, in the Age of Aquarius, would probably have the majority of its methods falling within the first two buckets and hopefully not have any data in the last two. The middle one may have a few methods here and there.

Obtaining the method complexity distribution in a JavaNCSS XML report via Groovy is absurdly simple as this code snippet shows:

def range1 = 2..5
def range2 = 6..10
def range3 = 11..20

this.doc.functions.function.each{ mthd ->
 int ccn = Integer.parseInt(mthd.ccn.text())

 if(ccn == 1){
  ones << mthd.name.text()
 }else if (range1.contains(ccn)){
  low << mthd.name.text()
 }else if(range2.contains(ccn)){
  medium << mthd.name.text()
 }else if(range3.contains(ccn)){
  midMax << mthd.name.text()
 }else{
  max << mthd.name.text()
 }
}

Using Groovy’s range feature, I can define some simple ranges that correspond with my desired distribution. Plus, in a closure that iterates over each function element in the JavaNCSS XML document, I can obtain the element ccn‘s value and place it within the proper collection via the handy << syntax.

With this hip data, I can then feed it to a charting utility, like JFreeChart, and generate a bar chart like so:

dataset.addValue(val1 * 100, disdta[0].name, disdta[0].name)
dataset.addValue(val2 * 100, disdta[1].name, disdta[1].name)
dataset.addValue(val3 * 100, disdta[2].name, disdta[2].name)
dataset.addValue(val4 * 100, disdta[3].name, disdta[3].name)
dataset.addValue(val5 * 100, disdta[4].name, disdta[4].name)

def chart = ChartFactory.createBarChart(
 "Complexity Distribution",
 "CCN Range",
 "% Total Methods",
 dataset,
 PlotOrientation.VERTICAL,
 false,
 false,
 false
)

In this snippet the val variables above are each multiplied by 100 to create a percentage value (i.e 66%). In earlier code, the total number of methods is obtained (i.e. 22,334) and each obtained collection value is divided by this number to create a ratio (i.e. 543/22,334). What’s more, after creating the chart instance, you can customize various aspects of the chart like its colors, etc. For example, to change the five bar’s colors, you can obtain a BarRenderer instance from the chart‘s plot instance and set the series color as follows:

def rendr = plot.getRenderer()

rendr.setSeriesPaint(0, new Color(102,205,000))
rendr.setSeriesPaint(1, new Color(000,100,000))
rendr.setSeriesPaint(2, new Color(255,215,000))
rendr.setSeriesPaint(3, new Color(255,140,000))
rendr.setSeriesPaint(4, new Color(139,000,000))

Lastly, you can save the chart instance to a file too like so:

ChartUtilities.saveChartAsPNG(
  new File("C:\\dev\\projects\\acme\\target\\mc.png"),
    chart, 375, 200)                            

The resulting bar chart is shown to the right and displays the distribution of complexity across all methods within a code base. In this case, this code base has roughly 55% of its methods with a CCN of 1, man. One could infer that there are a lot of smokin’ JavaBean style classes, which in this case is true. A small portion of methods, unfortunately do have some high complexity values, which does cause some concern.

Of course, this is only a partial picture, right? This bar chart doesn’t tell me anything about the associated coverage of those complex methods and it’s only a snap shot in time, man– tomorrow, if this utility is run and the far right bar grew, you’d know that things are getting worse.

As you can see, Groovy is an excellent choice for generating simple reports as you can knock them out in a snap. Plus, by building intelligent charts, you can further help save people from report overload syndrome. Dig it?

Curtail complexity with a rules engine

Complexity can manifest itself within a software application in a number of hip ways, including dependency management (i.e. 3rd party libraries required for runtime, etc), architectural adherence patterns (think old style EJBs), and even coding constructs (in particular, excessive use of conditionals). When it comes to coding constructs, the resulting complexity is often related to the problem being solved. For example, imagine a recommendation wizard for sales associates selling hip disco LPs. Quite simple, right? You have two groovy choices– anything from Donna Summer or the Bee Gees. If only life were this easy, eh?

Now imagine a recommendation wizard for smokin’ sales associates trying to move beer. Now that’s more real, isn’t it? Imagine the store is trying to move (i.e. sell to customers) seven different types of beers– all varying in taste and characteristics. The store wants to develop an application that walks someone through a series of questions and based upon their answers, will recommend one of seven beers. Think of this application as an expert beer system– and while it may start with only seven beers, over time more beers will be added, especially if the system proves itself to move beers efficiently. What’s more, you the developer aren’t a beer aficionado (i.e. a domain expert on beers)– your job is to make an application that beer experts can modify so customers can pick beers more easily.

Logically, you can build a hip beer expert system with a couple of conditionals– if you like this characteristic, then you should buy this beer, right? In pseudocode, your logic could look like this (after you’ve had a beer session with the beer experts):

Do you like a light beer or a dark beer?
 if light beer:
  Do you like crisp, smooth beers or more prefer a more hoppy one?
   if crisp:
    then Pils
   else:
    Do you like light hops or more aggressive hops?
      if light:
       then Pale Ale
      else:
       IPA
else:
  Do you like the taste of coffee?
   if yes:
     Chocolate Stout
   else:
     Do you like spiciness?
       if yes:
         try Winter Ale
       else:
          Do you like high alcohol content?
            if yes:
              try an Eisbock
            else:
              try a Lager

This particular block of code (which enables one to pick one of seven beers and is by no means an accurate expert system), if isolated in a method, would have a cyclomatic complexity of at least 13, which presents a challenge– methods over 10, with conditional nesting, are havens for defects, especially if this code changes often. What if next week, the Pilsner brand is sold out? You’ll need to modify the logic to select perhaps another type of beer. In fact, the logic may not be as easy as replacing the Pilsner with another neat-o beer– it may involve a new series of questions.

It turns out that in these scenarios, a rules engine may actually be beneficial– in fact, rules engines (or expert systems) are well suited to replace excessive if, else, switch logic, especially if that logic is the domain of non-technical experts (in the case above, the beer experts haven’t a clue about coding nor hygiene, for that matter).

Using a rules engine, however, requires you to flatten business logic somewhat; in fact, in the copasetic beer expert system above, it requires you to focus on particular goals (i.e. moving a particular beer brand) and work backwards from that. For example, if I want to move an IPA, the attributes are:

  • Likes a light beer as opposed to a dark one
  • Prefers a hoppy taste
    • And tends to like a more aggressively hopped one too

Keep in mind, that in a real expert system for making recommendations, the number of attributes would most likely be greater. Based on the attributes of beer elaborated in the pseudocode above, however, I can group them into three categories, which I’ll designate as Java 5 enums:

public enum Color {
  LIGHT, DARK
}
public enum Taste {
  CRISP, HOPPY, AGGRESSIVE_HOPS,
  LIGHT_HOPS, COFFEE, SPICY, MALTY
}
public enum ABV {
  HIGH_ALCOHOL, NORMAL_ALCOHOL,
  LIGHT_ALCOHOL, NO_ALCOHOL
}

These enumerations will live inside of a BeerPreference object holds a Color, a Collection of Testes, and an ABV:

public class BeerPreference {
 private Color color;
 private Collection  tastes;
 private ABV abv;
 //...
}

The class will also hold a recommendedBeer property, which the rules engine will appropriately set based upon the other attributes’ values:

private String recommendedBeer;

public String getRecommendedBeer() {
  return recommendedBeer;
}
public void setRecommendedBeer(String recommendedBeer) {
 this.recommendedBeer = recommendedBeer;
} 

In my case, I’ll use Drools, which is an excellent open source expert system, to define my rules. For example, below is a tripped out rule for determining if the choices present mean a person should try out an IPA.

rule "Mendocino White Hawk IPA Rule"
 when
   $beer: BeerPreference(color == Color.LIGHT,
   tastes contains Taste.HOPPY,
   tastes contains Taste.AGGRESSIVE_HOPS,
   tastes excludes Taste.SPICY,
   tastes excludes Taste.COFFEE)
then
   $beer.setRecommendedBeer("Mendocino White Hawk IPA");
end

Note that the copasetic rules syntax isn’t too hard to pick up– it’s quite logical: if the BeerPreference‘s color property is light and the collection of Tastes includes Taste.HOPPY and Taste.AGGRESSIVE_HOPS and also doesn’t contain Taste.SPICY and Taste.COFFEE, then the rule engine will take the BeerPreference instance (which is $beer) and set the recommended beer to "Mendocino White Hawk IPA" (which, by the way, is an excellent beer). Drool’s rules syntax is simple– object attributes are obtained via their proper name, rather than by a getter method, logical ands are denoted via commas and binding variables is done via the : operator.

Testing rules is most easily done via table based frameworks like Fit. Writing tests via JUnit or TestNG, while possible, can become laborsome due to the number of combinations one must code. Nevertheless, I can code a simple sunny-day scenario test case via JUnit to demonstrate Drool’s in action.

First, I must initialize Drools, which involves loading my rules (find in the file beer-guide.drl) and adding them to a Drool’s RuleBase like so:

public class BeerPreferenceTest {
 private static RuleBase ruleBase;

 @BeforeClass
 public static void setUpBeforeClass() throws Exception {
  Reader source =
   new InputStreamReader(
     BeerPreference.class.getResourceAsStream("beer-guide.drl"));

  PackageBuilder builder = new PackageBuilder();
  builder.addPackageFromDrl(source);
  final Package pkg = builder.getPackage();
  BeerPreferenceTest.ruleBase = RuleBaseFactory.newRuleBase();
  BeerPreferenceTest.ruleBase.addPackage(pkg);
 }
}

Now that Drool’s is read to go and because it’s my bag, I can create an instance of WorkingMemory and pass in my BeerPreference instance. Remember, you must call the fireAllRules method on your WorkingMemory instance to force things to happen.

@Test
public void verifyIPA() throws Exception{
 WorkingMemory workingMemory =
 BeerPreferenceTest.ruleBase.newWorkingMemory();

 BeerPreference beer = new BeerPreference();
 beer.setColor(Color.LIGHT);
 beer.addTastePreference(Taste.HOPPY);
 beer.addTastePreference(Taste.AGGRESSIVE_HOPS); 

 workingMemory.assertObject(beer);
 workingMemory.fireAllRules();

 assertEquals("Should be Mendocino White Hawk IPA",
   "Mendocino White Hawk IPA",
    beer.getRecommendedBeer());
}

Using a hip rules engine doesn’t necessarily reduce complexity– it just isolates portions of it into a format that can be manipulated by non programmers. In essence, a rules engine creates flexibility, while also providing for more testability. Note how in the test above, I was able to isolate my logic for IPAs without having to deal with any of the other six beers. With normal conditionals, I might have had to concern myself with the other choices, so as to force the IPA one. Luckily, my logic is quite simple so this testing challenge may not be entirely apparent.

If you find excessive logic that’s bag:

  • Changes often
  • Is the privy of domain experts who don’t write the code

then you may want to look into an expert system, which can centralize human-readable logic into one location. Rules engines aren’t a sliver bullet nor are they perfect for all scenarios; however, if applied correctly, they can decrease conditional complexity quite nicely.

Next »