Skip to content

Archives

Links for 2016-10-06

  • Simple testing can prevent most critical failures

    Specifically, the following 3 classes of errors were implicated in 92% of the major production outages in this study and could have been caught with simple code review:

    Error handlers that ignore errors (or just contain a log statement); error handlers with “TODO” or “FIXME” in the comment; and error handlers that catch an abstract exception type (e.g. Exception or Throwable in Java) and then take drastic action such as aborting the system.
    (Interestingly, the latter was a particular favourite approach of some misplaced “fail fast”/”crash-only software design” dogma in Amazon. I wasn’t a fan)

    (tags: fail-fast crash-only-software coding design bugs code-review review outages papers logging errors exceptions)