September 2007

Book review: Generating [hip] Parsers with JavaCC

Before the age of Disco, I once found myself in need of obtaining specific data elements from a series of log files generated by a large order processing system. Essentially, a chain of copasetic state machines would log their status while they processed various aspects of an order– line items, billing, notification, etc. It turned out that the log format was uniform– it followed a format that enabled one to understand who was writing, when it was written, and why. As you can probably imagine, the folks in operations would monitor these logs and when problems arose, they’d ping development.

Invariably, a hip developer would ssh onto a production box and literally tail -f said log file and watch things progress (in real time). Some developers were more savvy and would pipe the contents into a grep command looking for error messages, but ultimately, it varied from person to person how they’d actually assess the situation.

I, indubitably being of the lazy disco type, wanted to press a button (or run a simple command) and receive a report when I found myself in the hot seat. Of course, this problem has been solved the world over and I had to do it my way because I’m a developer so I started investigating parsing libraries and ran across, what was at the time, some WebGain documentation on JavaCC. Unfortunately for me, I didn’t have the attention span to figure things out and eventually went the regex route via Jakarta’s ORO library.

Tom Copeland’s “Generating Parsers with JavaCC” has, without a doubt, shown me the error of my ways all those years ago. His masterpiece on JavaCC serves as the reference for this handy library– indeed, a major portion of this book documents every detail of generating parsers by clearly unveiling the particulars of tokenizing, parsing, error handling, and even testing JavaCC parsers, just to name a few. I particularly enjoyed the chapter on JavaCC’s JJTree preprocessor as it tied in a lot of the details, for me personally, of writing custom PMD rules.

Indeed, now that DSLs are all the rage these days (I’d go so far as to label them hip, baby), “Generating Parsers with JavaCC” can easily enable adventurous types to assemble mini-languages (and obviously parse and handle them via JavaCC). Because it’s his bag, Tom does a great job in chapter 11 of enumerating a few examples of doing so, in fact. What’s particularly amusing for me is that he shows an example of parsing Apache’s web logs.

I was eventually able to keep on truckin’ by running a single command to receive a detailed report of various goings on in the order processing application I mentioned earlier– my trippin’ little utility made heavy use of regular expressions and served its purpose well enough. But, after reading “Generating Parsers with JavaCC” I realize that my job could have been a bit easier had I just relied on JavaCC to do the heavy lifting of parsing the application’s log files. You can bet that if I find myself in a similar situation in the future, you’ll find me coding away with a well marked up, heavily worn copy of “Generating Parsers with JavaCC” by my side. Give this groovy book a read– you’ll find yourself smarter for it.

The three step CI boogie

The process of Continuous Integration (or CI, man) is about building software components often– in many instances, this means anytime code within a hip repository (such as Subversion, ClearCase, Perforce, etc) changes. The benefit of CI is simple: by building software often, issues can be found early as opposed to later in a software developmental life-cycle where issues (like defects) are more expensive to address. 


While CI is a process, the term often gets associated with a tool– but please keep in mind that CI is much more than a tool, man. In fact, the tool is probably the least important aspect of CI, because all the tool does it run your hip build (which, as you’ll find, is far more important) when a change is detected within a code repository.

Getting started in Continuous Integration then requires three things, baby:

  1. An automated build process with a platform like Ant or Maven, for example
  2. A code repository, like Subversion
  3. A CI server such as Hudson, CruiseControl, LuntBuild, etc, although a cron job can suffice

As the process of Continuous Integration is about integrating software often, it stands to reason that the integration of software is fulfilled through the concept of a build. In the Java world, Ant stands as the ubiquitous build platform. With Ant, you can reliably perform (in an automated fashion) otherwise manual tasks like compilation, testing, and even more interesting things like software inspections and deployments. There are plenty of players in this space– Maven, Raven, etc so don’t get hung up on needing Ant, man. Once everything has been wired together though, your build strategy is by far the most important aspect of a successful CI process– without a solid build that does more than compilation, CI will wither in the absence of something interesting to do (almost like Rock music in the face of a Disco inferno).

Next, for CI to properly take shape, a repository for storing code (or SCM) must be in place to monitor. Essentially, a hip CI server polls a given SCM for changes; consequently, if any are found, the CI server will perform a checkout (or an update of a local sandbox) and execute a build (which is, more often than not, the same build developers can also execute in their local environment).

Lastly, for a copasetic CI process, it helps to have an actual automated process that monitors an SCM and runs builds when changes are detected. There are a host of CI servers available for the Java platform both open source and commercial– all are similar in their basic configuration, that is that they aim to monitor a particular SCM and run builds when changes are detected. They all differ with various bells and whistles; Hudson for example, is particularly interesting given its ease of configuration and its compelling plug-ins, which provide increased visibility into such aspects as test result trends, for instance.

The process of CI isn’t all that esoteric after all is it, man? Three simple things are needed and you are disco dancing– of course, as I’ve mused about before, CI is really about your build process. If that isn’t copasetic, spend time making builds a non-event and life will be trippin’. Dig it?

Poll: is NUnit good enough or what, man?

The copasetic gurus who brought you NUnit have decided to create a hip new framework dubbed xUnit.net. Clearly, xUnit.net has a lot of compelling features based upon the many lessons learned from having created a rather successful framework (that is, NUnit, man); however, it seems to me that NUnit is as ubiquitous in the .NET community as Disco music is to the general world population (did you know that disco music sales were up 400% in the year ending 1976?).

Is it your bag to switch to xUnit.net or to stay with NUnit, man? I want to know.

Will you stop using NUnit in favor of xUnit.net?
View Results

If you are using another trippin’ framework instead of NUnit (like MSTest, etc) let me know if you plan on jumping ship as well. And if the term NUnit is strangely foreign to you, man, you may have better luck at the Java test framework poll. As always, copious thanks for participating in this most scientific poll.

The weekly bag– September 21

Here you go, man:

Stacking it up with BDD, baby

As I’ve mentioned before, test-driven development (or TDD, man) is a copasetic idea in practice, but some jive turkeys just can’t get over the conceptual leap associated with that word test. Check out this month’s “In pursuit of code quality” article, entitled “Adventures in behavior-driven development” to see, what’s arguably, a more natural way to integrate the momentum of TDD into your programming practice. Get started with behavior-driven development (aka BDD) (via JBehave, baby) and see for yourself what happens when you focus on program behaviors, rather than outcomes.

As always, don’t forget to let it all hang out at the “Improve Your Java Code Quality” forum while you are at it, man!

The weekly bag– September 14

Good reads this week, baby:

The weekly bag– September 7

Back on track:

Don’t belie CI

Because it’s his bag, a few weeks back Paul Duvall pointed out that “Continuous Integration is NOT about the CI server [man]” but about the process. I happen to agree with Paul, and in fact, as I like to point out (when given the opportunity) one of the most important aspects of hip CI process is the build itself. Think about it: if you take the bells and whistles out of a CI server, all you have is a sophisticated cron job that runs a build anytime an SCM changes.

With that painfully obvious fact looming, I’m still impressed with the number of people who get excited about neat-o CI servers without having examined what it is the CI server will actually execute when something changes. If your CI server is an automated compilation engine, that’s certainly a great start, but that’s not going to save your tail anytime soon, man.

To drive home this point, man, I like to demonstrate two scenarios when I give talks on CI. In scenario one, I execute a series of steps that show that life with CI isn’t any different for a developer than life without it (on the surface, that is). I:

  • check out the latest version of a project in SVN
  • add a feature
  • write a test to verify that feature
  • run a local build that compiles, tests, inspects, and even deploys
  • execute an SCM update, if there is a change, I’d keep on truckin’ and run another build
  • check the modified code in along with the corresponding test (assuming everything was kosher, baby)

So far, nothing new, right? These are the steps most developers take with or without CI in place. It is always fun to show the dashboard of my flavor-of-the-day CI server, which demonstrates it found the change and also ran a build without any errors. I then attempt to show how the process of Continuous Integration saves one’s tail by executing a few hurried steps. First, let me set the tone…

There is a problem in production; customers and management are in panic mode! We’ve got to figure out the problem ASAP or else!

With the ever so mellow ambience established, I:

  • check out the latest smokin’ version of a project in SVN
  • add a feature
  • run a local build that compiles
  • quickly check the modified code and prepare to split the joint (it’s Friday night baby)
  • demand that my hip code be pushed into production as soon as possible

As you can probably see, in this case, I didn’t run a full build (which ostensibly includes tests); however, thankfully, the CI server then turns out to run a full build and, presto! — a test fails, causing a flurry of emails, IMs, etc to go out indicating there’s an issue. CI saved the day! Or did it?

As it turns out, in my case, it was the copasetic test that saved the day, man. All CI did for me was to execute it (via a build process that defined a test target) when there was a change– if there wasn’t a test that actually hit the modified code (and consequently failed), chances are that the offending code would have been pushed into production as soon as possible.

Continuous Integration is all about your build, man. Before you download that hip CI server, make sure you’ve got a solid build process that does more than just compilation– and as I’ve noted before, if you want to find issues quickly, start by writing some tests. Dig it?

The weekly bag– August 31

Back on track:

Lastly, my apologies to those that were hip enough to comment on my recent post “Is BDD TDD done right?“– apparently my hosting provider’s database went down and they had to restore it with a backup that didn’t include that post nor its comments. Luckily, I had a copasetic copy of the post, however, I didn’t have the comments. Thanks again folks– your feedback and links were excellent.