Resilience in Software Development - A Simple Checklist

When engaging with a team on a pre-existing software development project, it can be challenging to come up to speed with all of the practices and nuances of the development environment. This situation however also represents an ideal opportunity to review these practices, as there is now a new member of the team who can look at things without any prior history and hopefully give a fresh perspective.

As a development consultancy, we frequently find ourselves in this situation - it is common for us to be engaged with projects that are already well underway. This has led to a series of observations of some key indicators of whether good development practices are being used or not. These practices offer a form of resiliency - allowing for development to continue in spite of unexpected difficulties such as key staff leaving, late feature additions, rapid team size increases, and many others.

Following on from the ideas of the Checklist Manifesto, we believe that it is important to constantly be reviewing internal practices to make sure that there aren’t unexplored avenues that could improve performance, team health, accuracy or many other metrics that may be important to the team. We feel that having a set of criteria that is easy to apply will help with this.

The criteria we look to apply are:

  • What is the onboarding process, and how long does it take?

    • How long does it take a new developer to be capable of editing, building and committing code?

    • A long onboarding process is often indicative of complex development processes. Just because development is complex doesn’t mean that the processes need to be.

  • Can an individual's development environment be set up easily?

    • This has the dual advantage of helping you onboard new team members faster, as well as making hardware failures easier to recover from.

  • If a bug is found, is the test system sufficient to allow confirmation of the bug first?

    • While the goal of high test coverage may be worthy, it is often not feasible to achieve immediately from an existing code base. However it is important to implement enough infrastructure that new tests can be readily added if desired.

    • Requiring developers to write test code is often difficult, but it is certainly even more difficult if they first have to implement the underlying test infrastructure. Making sure this is in place, even if only used by superficial example tests is a good way to ease into the process.

  • How manual is the build system?

    • A developer should ideally be able to execute a single command to get a fully built system up and running. This may not be 100% achievable, but minimising the steps here will increase developer productivity, as well as minimising the chance of human error and frustration during the process.

    • Improvements here tend to flow on smoothly into continuous integration systems as well.

  • Are you able to support remote workers?

    • Remote worker and work-from-home requirements often overlap with those for supporting cloud infrastructure. If migrating to cloud infrastructure is on the project road map, then these can often be tackled simultaneously.

    • Making sure this support is available will add resiliency to the development for future unforeseen circumstances, as well as forcing a minimum level of documentation and system separation, since implicit reliance on internal systems will become much more apparent.

It is easy for resilience to erode over time through small incremental changes to development policy, which is why regularly reviewing your list is important.

The list in this article represents the criteria we have built over time and while not exhaustive by any means, we seek to regularly review and test our processes to ensure that when the time comes, we can fully rely on them and continue on with minimal disruption.

Following these processes internally has made our transition to working from home very smooth.