Executive Summary
A major cornerstone of software process evolution during the last 20+ years has been around the question of how best to address variability in the work.
- In the waterfall approach, we aim to predict and control every detail of the development process, thus (supposedly) driving out variability by creating a perfect plan and sticking to it.
- In first-generation agile approaches (e.g. Scrum by the Book), we work around variability by keeping scope flexible throughout the entire release and – at least canonically – only making short-term commitments that teams feel extremely confident about.
- Today – especially in larger organizations with complex products that require extensive integration and synchronization across multiple teams, product families, functional groups, and legacy systems – market and competitive forces are driving us to seek new approaches:
- We require mechanisms that allow us to make feature-level commitments across a 6+ month time frame and subsequently maintain very high probabilities of being able to deliver the “must have” functionality that those features depend on.
- To achieve this, we need techniques and tools that allow us to quickly and reliably identify patterns of variability, mitigate these when they appear, and – when cost-effective mitigation isn’t attainable — manage the variability in a smart and transparent way.
- In the software industry we are still in the early stages of this evolution, but we have started to see compelling results from system-focused approaches (e.g. as found in certain types of lean thinking and practice) that allow us to optimize flow and systematically drive out unnecessary risk and delays.
Background
In software development we operate in an environment of high complexity and technical uncertainty. We constantly encounter new types of problems, some of which can only be solved by making systems do things that they weren’t originally designed to do. Inevitably, when we make changes to these systems, we trigger side effects that result in large amounts of unplanned work.
Separately, we operate in a highly dynamic market context where customer needs are constantly changing and it is often difficult to know what customers want (and more importantly don’t want) until they themselves have seen some amount of real-world functionality. Even so, many customers – in particular large and influential ones — expect us to make and meet delivery commitments in the 6-12 month time frame (or longer).
All this creates conditions that lead to high stress and high variability around scheduling, which can easily create compounded, cascading delays. As it happens, this issue of schedule risk is one of the top reasons we spend time, effort, and money making use of product management methodologies in the first place.
Waterfall
Historically, waterfall methodologies sought to mitigate schedule risk by creating detailed upfront plans which were intended to help us predict and control every detail of the development process. In essence, the waterfall philosophy is that a well-designed and well-run project will by definition contain no significant variability, or at the very least that whatever variability exists will average itself out over the duration of the project (i.e., regression to the mean). This approach often appeared satisfactory before the work was begun, but in reality the plan was so brittle that there was no way to recover once a major delay was introduced. Accordingly, the overwhelming majority of waterfall projects are thought to have been completed late, above budget, with poor quality, or a mix of all three.
Scrum
First-generation agile approaches like the initial implementations of Scrum (and to some extent Extreme Programming) were revolutionary in that they not only acknowledged that variability would emerge during the work, but actually sought to embrace it. This led to a rewriting of the social contract between development teams and their funders: Scrum promised more throughput and better quality in return for flexible scope.
Properly speaking, in Scrum we abstain from making delivery commitments beyond the horizon that we can see with high clarity and confidence. By not promising what we can’t deliver, we significantly reduce the chance that we will miss release commitments. In other words, in Scrum we assume that systemic variability is not something we can reliably impact at the team level – an assumption that may in fact be quite credible given a specific team’s organizational context.
Just as importantly, by making this assumption explicit, we use transparency and expectation management to help business stakeholders and customers understand that they cannot expect to know in advance when a large and highly defined chunk of work will be ready to ship. That said, we seek to organize our work so that *something* will be shippable at any given time – a condition that is easier to achieve in contexts where release engineering technologies and practices are advanced, and one that is harder to achieve when there are large amounts of legacy code and/or large numbers of legacy sub-systems.
A further innovation of Scrum was based on the insight that engaged, coherent, and motivated teams who are not forced into extended states of peak utilization (i.e., “death marches”) tend to have more positive variability in their capacity. In other words, these teams can stretch and adjust during short periods to optimize the flow of work, in part because they have some reserves to draw on (“slack” in queuing terminology), and in part because they feel a meaningful allegiance to the work and one another. By putting team members’ well-being at the center of the process, mature Scrum teams have in some cases been able to match the variability in the work as it emerged. The caveat is that this tends to happen reliably only insofar as the team and the work are highly synchronized, and neither the team nor the work are constrained by technical or organizational dependencies.
Finally, having a “voice of the customer” directly involved in the work on a day to day basis – a game-changing innovation from XP — has, compared to waterfall methods, tended to improve the odds that wrong assumptions about customer desires and expectations will be discovered more quickly. This helped to reduce at least some of the most egregious waste that tended to occur in large projects (e.g., multiple years of work thrown away).
Evolving Agile
For several years, evidence has been emerging that larger development organizations with multiple products running on complex legacy systems have not reliably seen consistent, sustained improvements after the initial boost that occurred early on in their Scrum adoptions. I believe this is because the Scrum approach to dealing with variability is not well-suited to address the complex level of dependencies and synchronization — both technically and around market interaction — inherent in these sorts of projects.
Instead, more recent approaches – especially those inspired by lean thinking – aim to improve throughput and predictability by studying the variability inherent in the work as well as in the organizational system that surrounds it. By measuring and visualizing these attributes and, in genera, increasing situational awareness about why and how the variability is manifesting, we can begin to discover opportunities that allow us to optimize for the whole (that is, across the entire value stream).
It’s important to understand that these optimization opportunities may themselves display an enormous amount of variability. In some cases we may see improvement by mitigating the factors under our control that prevent smooth flow; in others we may benefit by systematically investing in capabilities that allow us to improve flow; still elsewhere we may be best served by creating appropriate buffers where variability cannot be reduced cost-effectively. The litmus test about whether an intervention is successful should be whether it produces sustained improvements in throughput, quality, AND predictability of schedule across multiple releases. An action that improves only one of these performance metrics is unsatisfactory, as is an action that produces improvement on a one-time basis but fizzles out as soon as conditions change.
I also want to highlight my belief that Scrum and XP-led innovations that helped increase human connection and stimulated more rewarding working conditions are, if anything, even more essential in this project management approach than in past models. It can be tempting for analysts and engineers to tackle system-level problems from a deterministic, mathematical, and tool-driven perspective, which would be fine if we were optimizing an engineered system but is utterly unsuited to understanding and improving living systems. One of the key challenges for us as a generation of software leaders is how to implement effective measures to study and manage variability that do not suppress — and, instead, ultimately enhance — the engagement, alignment, and job satisfaction that are needed to grow and sustain high-performing agile organizations.