Objectively Measuring 'Good Enough'

FSGD

For any development team - especially a product team - the pressure to ship is a very real factor every day. Most developers can relate to the internal battle between the pragmatist and the perfectionist. Left to their own stereotypical devices, the pragmatist will ship to the detriment of future flexibility and adaptability while the perfectionist will ship one week after the company closes down (if that).

Not satisfied to view these as mutually exclusive choices, we've continually endeavored to improve the model we follow internally at LeanKit - aiming for the best of both worlds. Over the last year, Jon Terry and Daniel Norton coined the acronym "FSGD" (which we affectionately pronounce "fizz good"): Frequent, Small, Good & Decoupled. Jon has a great slide deck exploring this in depth - I highly recommend going through it.

FSGD has proven to be a great mental framework through which we can evaluate many aspects of how we work. Frequently shipping beats shipping only quarterly or annually, for example. Keeping what we ship small helps us to focus on maintaining flow while continually & incrementally expanding our product with new features. Decoupling tasks (and teams) can be challenging, but it has not only helped us to creatively address cross-cutting concerns, it's enabled our teams to work independently and not lose flow due to waiting on external dependencies. But what about the "G" - what constitutes "Good Enough"?

Good Enough

In the FSGD model, we consider these two things as the opposite of "Good" (aside from the obvious Bad): Damaging & Gold-plating.

Damaging

By "Damaging", we mean the kinds of changes where one team tweaks an HTTP service, for example, and it breaks contracts that other teams depend on. The feature might otherwise be exactly what the customer needs, but if it breaks how other teams consume the behavior, it's not good enough yet.

Gold-plating

Despite the best of intentions, I've yet to be a part of a team that hasn't, in some way, faced the siren call of adding features that weren't explicitly requested. It's a healthy thing for a team to develop a collaborative discipline, policing themselves against giving in to this temptation. However, we often over-compensate and run the car off the road into the ditch on the opposite side by shipping something too minimal. Over time, we noticed a pattern emerging when a "too-minimal" feature was shipped:

  • The effort for a developer to set the new feature up locally (infrastructure, new services, etc.) was either too time-consuming, or revealed a knowledge bottleneck where only one other developer knew the secret handshake(s).
  • A lack of essential runtime visibility.
  • A high level of volatility/unpredictability when making changes to the code - revealing defects that could have been caught before being shipped to production.

I really liked how memorable "FSGD" had become, and wanted to complement it with a similar definition of "Good" to help provide a mental framework our teams could use to evaluate if a feature was ready to ship. To this end, I hijacked the well-known acronym "TLDR": Tested, Logged, Documented & Reviewed.

You might be thinking, "Wow, that's a tall order. How do you balance that with 'Frequent' and 'Small'". Great question! Let's explore.

Tested

Like many teams, we're faced with a mix of really well-tested code and the digital equivalent of "here there be dragons". I strongly believe in thoroughly tested code, so all new features have a high bar to meet - with test coverage expected to be above 90%. Given that our new feature work is done in node.js, we can take advantage of amazing tools like these:

  • gulp (build tool & task runner)
  • mocha (test runner)
  • chai (spec assertion syntax with amazing plugins)
  • istanbul (code instrumentation and test coverage)
  • jshint (code quality & linting)
  • jscs (format linting/fixing)

We've abstracted the common concerns around testing, linting, formatting and coverage into a project we're able to pull into any of our repos, and we're in the process of creating generators for many of the kinds of modules and test suites we frequently use. In nutshell, nearly everything you need is in place when you start a project, you just have to provide the tests. If you're going to expect 90-100% coverage on every project, you need to do everything possible to facilitate your teams in meeting this goal!

Logged

Logging and metrics collection is a crucial piece many teams attempt to bolt-on after the fact. In addition, there's often a need to log to multiple back-end/remote systems for things like performance monitoring, access logs and marketing. In learning from what we believe has gone well (and not-so-well) for us the past, we've endeavored to make logging and metrics-gathering transparent to the developer implementing a feature. In other words - the libraries we use have to play well with how we log and measure things, and they should do so automatically, while still providing a friendly API for the specific one-off cases where logging/performance behavior has to appear in application logic (instead of infrastructure).

Documented

I've heard the typical developer's adverse reaction to writing documentation (which I share!) described in many ways, my favorite being a variation of the "tree falls in the forest" joke:

If a developer actually writes documentation, are they still a developer?

We've learned to accept the reality that the only thing more painful than writing documentation is having to deal with someone else's code that lacks documentation. We have also rejected the notion that code, itself, is sufficient documentation. We certainly expect code to be readable and self-documenting, but it just isn't the place for documenting how to set up hardware and software dependencies used by a new feature, for example. We're also not expecting a full-length novel, either. Most of the time, a moderately detailed README works just fine. We usually expect this README to cover these kinds of concerns:

  • How to build, run tests, etc.
  • How to setup any dependencies (for example, Riak, RabbitMQ, etc.).
  • Configuration options & examples.
  • API usage examples (especially for internal infrastructure libs, etc.).

Reviewed

If there's any practice we've adopted internally that I believe could qualify as one of the most transformative, it would be our approach to code reviews. The only way a change can make its way into master is via a pull request. All pull requests have to go through a review process, and no one is allowed to merge their own PRs, regardless of position. With a distributed team, the PR review process serves as both a means to disseminate knowledge as well as build team relationships. In addition to spreading app knowledge, it's proven to be invaluable for cross-training. I've personally learned more about CSS & LESS than I would have imagined (and in less time than I would have on my own) from Hunter Heitzman and Doug Neiner through the course of code reviews, for example.

The most critical improvement, however, is that our review process has caught a number of issues before they ever reach our QA team. Our reviewers don't just compare the code difference in git, they have to pull the code and run it locally (this includes setting up any dependencies based on following the README's explanation), and verify that all tests pass, etc. This gives everyone a stake in guarding the code base, and confidence in the predictable nature of getting good test feedback from well-covered code.

The Elephant in the Room

The book "Working Effectively With Legacy Code" defines legacy code as "code without tests". We've unintentionally upped the ante, as our internal definition of legacy code is now code which hasn't been tested, logged, documented or reviewed! When your team is tasked with rolling new features as well as maintaining an existing system that may not always fit these criteria, you're going to have to make some difficult choices. How, then, should we address situations where we have poor test coverage, inadequate documentation and non-existent logging?

Overall, we try to follow the "scout rule", of "leaving the campsite cleaner than when we found it". In the case of a bug fix, this means we address the defect in the "legacy" code and also attempt to improve the situation by adding tests, logging and/or documentation where appropriate. The bulk of our effort, though, is focused on meeting our "TLDR" standards in new features, prioritizing our ability to meet customer needs over any headaches we encounter. New features often mean the opportunity to retire old parts of the system where a heavy investment in retro-fitting would not have made sense. It might frustrate the perfectionists in us, but it's a sensible compromise - one that allows us to continue to ship great features while taking significant strides forward in new architectural directions.