Interrupt-Driven Development: Eight Best Practices That Help Succeed

In this article

1 A little context

2 Discomfort as a team diagnostic tool

3 How to act in an interruption-driven development environment

3.1 1. Create and shorten feedback loops

3.2 2. Any activity leaves publicly available artifacts

3.3 3. We respect each other’s right to focus on their work

3.4 4. We avoid multitasking

3.5 5. We make architectural decisions as late as possible

3.6 6. The code is operational at any time

4 How we maintain constant efficiency

4.1 7. We are a team, not a development group

I’m a team leader, and my job is to ensure team productivity and the highest possible quality of software delivery. This is not easy because there is no ready-made recipe for success. While there are recognized methodologies such as Agile, Lean, and Value Stream Mapping, they still provide general guidelines and tips. That’s not bad, but the tips are not action items, and when it comes to making certain team-related decisions, I’m expected to use my common sense, critical thinking, and basic knowledge of human psychology to find ways to overcome challenges and limitations and deliver value to the business.

In this article, I will share a story of how we managed to turn one of our enterprise client-tailored teams into a well-oiled machine working effectively in a highly interrupt-driven environment. This helped us revisit and refine processes within other dedicated customer teams and fine-tune our staff augmentation offering to be as seamless and all-encompassing as possible.

A little context

At 8allocate, we are engaged in custom web, mobile, and enterprise development. We run a long-standing FinTech team dedicated to one of our EU clients (under NDA), which we scale up and down depending on a project stage, peak time, current budget, and other factors.

As an offshore standalone unit, our Ukraine-based team has been working on building a robust FinTech application for two years now. The client has a core development team in Spain, which defines strategic directions and uses the team in Ukraine as an auxiliary one. Over this time, our team has completely revamped the product, replaced legacy elements, and built new functional features that take advantage of the cutting-edge tech such as data science, AI, and machine learning. The team works in conditions when the project is constantly interrupted either because of the core team issues or lack of funding to support the project. Despite this, it’s one of the most effective teams in our portfolio. Let’s see why.

Specifically, the client’s Ukraine-based team consists of seven people: 3 web developers, one data scientist, one team leader (who is also involved in writing code), one QA engineer, and one PM.

Discomfort as a team diagnostic tool

To find and understand the problems within the team, we use a relatively simple tool – pulling people away from their comfort zones.

Of course, I’m not talking here about a situation when one developer gets air-conditioned, and the other one perspires. I am talking about failures or interruptions in the regular project development workflow.

Let’s say, the release went wrong, although everyone did their job well. Or the project was estimated the wrong way from the very beginning, and now it’s the team that has to take all pains for that wrong decision. Or your startup client didn’t get additional funding they could rely on.

How to act in an interruption-driven development environment

Every software dev team today should be ready to work in an interruption-driven environment, i.e., when there are unexpected bumps in the roads that can lead to project failure or temporary suspension.

So what do we do when such detrimental things happen?

Halt panic and ponder why your team is facing such discomfort at this particular project phase.
Get to the root cause of interruption (by using common sense and/or methods like Five Whys).
Figure out how to solve an issue. Do remember that the key reason why you exist as a team is to bring value to the business. No one needs a happy team that delivers nothing or with delays or of poor quality.
Retrospective! Always conduct one to see whether/how the decision made in the previous iteration affected the solution. If the impact was neutral or negative, go back to the root cause of the problem and try to find an alternative way of solving it. If the impact was positive, then automate and document it in detail for future use.

Now let’s scrutinize key principles that we, as a team, learned from this approach and demonstrate how they can applied in real life.

1. Create and shorten feedback loops

All human interactions with the outside world are based on feedback, without which it is impossible to check the correctness of any performed action. Imagine what our life would be like if we didn’t feel pain jumping from a height of four meters or grabbing a hot kettle.

An example of a good yet short feedback loop in software development is code completion – it informs us about the correctness of the action right at the moment of programming.

Every action in the development process can and should give you feedback: build, lint, automated tests passed, testing performed, a discovery workshop with product stakeholders, successful deployment to the server, monitoring of production – all of this aims to detect errors and correct your further actions.

It is also worth noting that the cost of the error increases as you move forward. If we have released a production bug that spoils the data, the task is not only to fix it, but also to restore the data (if it’s possible al all). The cost of late bug fixing is very high, not to mention the consequences for business continuity.

Therefore, it’s vital to create a lot of fast and informative feedback loops.

Below are examples of feedback loops that we consciously support and shorten if possible. I think you know most of them. But do you really use them to benefit your work?

The ability to run and roll out the project locally;
Fail Fast development;
A fast and informative CI build;
Continuous code review and work with code through pull requests;
Automated testing provides a lot of data about your errors;
Automated deployment;
Frequent releases instead of accumulating and releasing a version weeks after the tasks are completed;
Informative logs, monitoring, diagnostic tools. The whole team has access to them;
Logs filtering and graphical visualization;
Constant tracking of technical and functional parameters of the system as a part of daily work;
Google Analytics for empirical data collection and analysis;
Storage of data change history;
Collaboration between Dev, Ops, QA, and product owners instead of “throwing over the fence” the results of the previous stage;
Conducting regular retrospectives both within the team and with the business owners;
Regular feedback from both target users and stakeholders;
Going to the field to see how potential users interact with your product prototype.

In general, the feedback collected should be as apparent as a broken build.

What’s remarkable is that sometimes a minimal change is enough to improve the feature radically.

Let’s say, you record logs to ELK. They are structured, analyzed, and publicly available – everything is fine. But how often does a developer check them when debugging? Probably very seldom if never.

If you configure warning messages to be displayed directly in the IDE, there is a chance to notice, for example, that the time of query execution has passed. Even if it is not related to the current task. There is a chance to notice the problem earlier, and the cost of fixing it will be lower.

2. Any activity leaves publicly available artifacts

The artifacts must be publicly available and useful. Thanks to this principle, we minimize bus factor, i.e., a measurement of the risk resulting from information and capabilities not being shared among team members, and provide a common understanding of the situation, work consciously, while constantly making conclusions.

Some practices are obvious and common: informative commit messages, the connection of commits with tasks, descriptions of How To Test, Definition of Done, etc.

There are also less apparent points:

You can’t “just screw up”, the failure must be analyzed and pondered over. If the analysis reveals ill-conceived requirements, the artifact will be all conscious clarifications. If the problem is in the architecture of the system, the artifact will be the described technical debt with a clear deadline for elimination.
All clarifications are reflected in the knowledge base or the chosen task tracker. So, when testers accept a task, the changed requirements will not be news to them. When a business accepts the result, everyone understands what they have to get as an outcome. Such a state turns the work into a continuous stream. It’s the task of each team member to find out the details, update the knowledge base and description of tasks, and monitor the process in general.
Test results should be delivered as a publicly available list of all test cases that have been passed, which is compiled and discussed before the test, not during or after it. The list can be explored and supplemented by each participant of the process.

3. We respect each other’s right to focus on their work

We encourage work with headphones/earplugs;
We foster asynchronous communication within the team: do not distract your colleague with a small question, ask it in the task tracker (see the section on publicly available artifacts).

Many things can interrupt any regular workflow: an accident in production, unclear requirements for the task, etc. The signal can be a noisy discussion in the office, which involves three or more people. If such a situation is not resolved in a few minutes, I suggest you appoint one person responsible for clarifying the details. The others return to regular work until the person in charge brings the information for further analysis.

4. We avoid multitasking

Because multitasking doesn’t work. It only devastates, distracts, and causes delays.

Practices that help avoid multitasking:

Reduce the number of your Work In Progress tasks.
Focus on the flow of values, not resources. For example, it can take one developer a day to complete a task, and it can take another one three days to complete the same one. But the first one will not be available for work until a week later. It means that the task takes gets assigned to the second developer. We will spend more time on implementation, but we will deliver the result faster (three days instead of a week and one day) and move on to the next task.
If there are several people involved in one task and the work is 90% done, the number one goal for the team is to do everything to finish the last 10%. Only after that, it can move on.

5. We make architectural decisions as late as possible

This is not our know-how, but one of the basic principles of lean software development.

The decision made and implemented restricts the possibility of further changes. And if a decision is made with a lack of information, the chances of making a wrong decision are much higher.

If failure to make a decision does not block the work and does not lead to exponential growth of technical debt, it should be postponed, leaving the system ready for any decision in the future, when we have more data at our fingertips.

This is the basis of development – we do not build “big” architectures before starting the project. In return, we make the refactoring process safe and turn it into a natural part of the development process.

Similarly, we do not try to guess future requirements for the system or build a universal solution. The possibility of safe refactoring is more universal because it allows us to make any changes in the future.

6. The code is operational at any time

Of course, this state is unattainable in absolute terms, and the system will break down periodically after changes have been made. But this does not mean that this characteristic should not be sought by software teams.

When breakage is an emergency situation, not a norm of life, its causes are easy to find. This is usually the last commit. Therefore, it’s clear who’s responsible, what actions should be taken to eliminate the issue, and it is evident when we return to a stable state.

The resulting confidence in the system’s operativeness gives us a valuable opportunity to make a release at any time.

The second value is that we make more confident promises about the terms of readiness. If we divide the work into two phases: “Development” and “stabilization”, it is difficult to make a specific promise, because “stabilization” is work with problems that we do not know yet. So we can’t accurately assess them.

If stabilization is inextricably linked to development and there are all the necessary tools to ensure it, the situation is more predictable.

How we maintain constant efficiency

Obviously: code review, automated tests, and feature flags.
Any changes are immediately deployed to the test environment. If it is broken, you won’t be able to fix it “later” – QA work is blocked.
Testing takes place right after the tasks have been finished, as long as the developer remembers the job and code and can make quick fixes.
We don’t do the work in parts. If two people are needed for implementation, they work in pairs and deploy the code into the main branch when it is completely ready and covered by tests.
Every team member knows diagnostic tools, knows how to work with them, and knows how to make releases.

7. We are a team, not a development group

Working as a cohesive team means the following:

The whole code is reviewed by at least one team member. If you find a serious problem, you are encouraged to sit down together and do paired programming. Share a tip, a book, an article, or a detailed explanation deems more valuable than expressing own opinion or criticizing.
Instead of working on pieces with subsequent painful integration of the result, we work closely in pairs when necessary.
Development never throws the task “over a fence” but carefully passes it to QA engineers, while checking the happy path by themselves. They help QA understand what and how to test and help break the system artificially if needed.
QA, in its turn, studies the internal structure of the system, knows how to collect all the necessary details (logs, data state) and create an extremely informative bug.

To do the work in progress in the most efficient and focused way, we eliminate the “debts” associated with the work already done:

We deploy tasks to the production server as fast as possible. They’re considered done only after that.
We eliminate technical debt continuously so that it does not increase and block the work, ruining business functionality delivery plans.
We don’t start tasks that we will complete “someday”, and we delete long-lived tasks. The business will definitely come for the task when (if) the time really comes to do it. And just in case, you can restore the deleted task in the task tracker. But this function has never come in handy for us.
We fix or delete unstable tests right away and replace them with lower-level ones.
We track all “creeping” tasks, i.e., those that have been in progress for ages and never get completed.

The result of our long-standing team workflows refinement and revamping is a happy client that keeps their team with 8allocate for two years now and plans to expand its augmented team to add more muscles to its core product team. And how do you improve your team processes to increase productivity?