Becoming an organism

When I think of “big trends” in technology four things come to my naive mind:

  • Agile
  • Cloud computing
  • Big data
  • Remote-first

Agile was enshrined in the Agile manifesto in 2001 and is now the plaything of the laggards

The cut-throat commercial competition surrounding cloud computing began with Amazon’s AWS in 2002 and manifests today as enterprise organisations running from on-premise IT and startups leveraging serverless.

Big data, to me, includes not only business intelligence and analytics but also contemporary developments in machine learning and artificial intelligence. 

Remote-first gained serious momentum at least a decade ago—see the the Fried/Hansson book and their playbooks for remote-first orgs—and its society-wide adoption has been accelerated because of Covid19.

I would like to nominate a companion to these four “big trends”: team-first.


Contents

  1. The Agile of organisation design
  2. Team-first: in principle and in practice
  3. The remaining option
  4. The team-first approach
  5. Refactoring the org chart
  6. Cognitive load and other performance factors
  7. Team types
  8. Interaction modes
  9. Becoming a sensing organisation
  10. An organism

The Agile of organisation design

The notion of team-first as a “big trend” comes from my recent read of Team Topologies by Matthew Skelton and Manuel Pais. Of the four “big trends” above, the ideas within Team Topologies remind me most of Agile; it seems like the Agile of organisation design. This is for three reasons.

The first reason is that the Team Topologies framework currently has a “The future is already here—it’s just not very evenly distributed yet” vibe. For Agile to be formulated within its manifesto in 2002, it had to have been an actual pattern being utilised by the relevant practitioners. Similarly, the authors of Team Topologies have been preaching and workshopping the framework for several years and the book uses examples that are a result of that experience; it’s a framework that’s already seeing serious use, right now.

The second reason is that, like Agile, Team Topologies contains a minimal framework with much latent power and just as much risk of being diluted by over-complication. The core of Agile was disseminated and developed into a whole landscape of fragmented interpretations, extensions, modifications, re-imaginings and versions; the team-first approach faces a similar future.

The third reason is that, when sincerely adopted, Team Topologies and the team-first approach bring about a paradigm shift for the adoptees.


Team-first: in principle and in practice

The paradigm shift caused by the adoption of the ideas within Team Topologies rests upon two foundational principles. Both of these contain a lot of depth and nuance but briefly stated they look something like this:

  • The team—not the individual—is the fundamental unit of an organisation.
  • Team structures and communication protocols have a disproportionate impact on what and how an organisation ships.

In practice, these principles represent a big change from how organisations are traditionally structured and how they traditionally execute. Below is an overview of the main components of the team-first approach. It comes from a sequence of cuttings from the conclusion of Team Topologies.

“Team Topologies [sets] forth a team-first approach to software delivery predicated on four fundamental team types, three team interaction patterns, and ways of using difficulties in delivery that empower the organization to sense its surroundings.”

Team type – “Stream aligned: a team aligned to the main flow of business change, with cross-functional skills mix and the ability to deliver significant increments without waiting on another team.”

Team type – “Platform: a team that works on the underlying platform supporting stream-aligned teams in delivery. The platform simplifies otherwise complex technology and reduces cognitive load for teams that use it.”

Team type – “Enabling: a team that assists other teams in adopting and modifying software as part of a transition or learning period.”

Team type – “Complicated subsystem: a team with a special remit for a subsystem that is too complicated to be dealt with by a normal stream-aligned team or platform team. Optional and only used when really necessary.”

Interaction pattern – “Collaboration mode: two teams work together on a shared goal, particularly during discovery of new technology or approaches. The overhead is valuable due to the rapid pace of learning.”

Interaction pattern – “X-as-a-Service mode: one team consumes something provided by another team (such as an API, a tool, or a full software product). Collaboration is minimal.”

Interaction pattern – “Facilitating mode: one team (usually an enabling team) facilitates another team in learning or adopting a new approach.”

“The Team Topologies approach treats the team as the fundamental means of delivery, where a team is not simply a collection of individuals with the same manager but an entity with its own learning, goals, mission, and reasonable autonomy. A team learns and delivers together because when this happens, the results far outperform mere collections of individuals. The team considers not just its code as part of its external “API” but also its documentation, onboarding processes, interactions with other teams in person and via chat tools, and anything else that other teams need in order to interact with its members.”

Allow me to take each of the ideas above and unpack them. I’ll aim for brevity (historically not my strong point). For a full and more ordered treatment read the book: Team Topologies

Here’s what I’ll cover (summarise, really) below: the complexity of organisations, the team-first principle, Conway’s law, cognitive load, the team types (stream-aligned, platform, enabling, complicated sub-system), their interaction modes (collaboration, X-as-a-service, facilitating) and organisational sensing. I’ll also throw in some final miscellaneous comments. Included quotes come from Team Topologies unless otherwise stated.

Let’s get to it.


The remaining option

Organisations—like human beings themselves—are living, breathing organisms. They are not static collections of components that come with one-and-done definitions of relationships and expressions of purpose. As an organisation grows it becomes an unruly, complex thing. Software systems are also unruly, complex things.

Efforts to manage this combined complexity and all the challenges and opportunities that come along with it are aided by remembering that, “building and running software systems is a socio-technical activity, not an assembly line in a factory.” More specifically:

“As members of the technology teams managing these interfaces, we must shift our thinking from treating teams as collections of interchangeable individuals that will succeed as long as they follow the “right” process and use the “right” tools, to treating people and technology as a single human/computer carbon/silicon socio-technical ecosystem.”

But, as responsible components of these socio-technical assemblages, we have to do something

We can’t steer the organisation as a whole, abstracted entity—beyond a certain scale, an org’s size and its inevitable complexity mean it can’t be effectively managed as a single unit from the top down. We can’t manage every individual within the organisation on a one-to-one basis, continuously and simultaneously. Even established mechanisms for periodically managing individuals within an organisation are too slow and/or too ineffective for the contemporary landscape; this is especially true in anything directly or indirectly related to software, which is basically everything.

The remaining option? Take the middle way; the team-first approach.


The team-first approach

I’m an advocate for an idea stated by John Boyd, a fighter pilot and conflict theorist. The socio-technical systems producing software could be described as people implementing ideas using technology. Alongside the OODA loop, one of Boyd’s critical ideas—he had many—was not to value each of those things equally. As his canonical phrase goes:

“People, ideas and technology—in that order.”

Typically, the “people” component of the above phrase has been assumed to mean individuals. But it can actually be interpreted as meaning teams.

What is a team? Seven people performing related activities in service of a similar objective is not the same thing as a team of seven. Team Topologies says:

“In this book, “team” has a very specific meaning. By team, we mean a stable grouping of five to nine people who work toward a shared goal as a unit. We consider the team to be the smallest entity of delivery within the organization. Therefore, an organization should never assign work to individuals; only to teams. In all aspects of software design, delivery, and operation, we start with the team.”

The impact of this deliberate change in perception and behaviour?

“By empowering teams, and treating them as fundamental building blocks, individuals inside those teams move closer together to act as a team rather than just a group of people. On the other hand, by explicitly agreeing on interaction modes with other teams, expectations on behaviors become clearer and inter-team trust grows.”

Contrast this with the historical precedent for structuring teams:

“Historically, most organizations have seen software development as a kind of manufacturing to be completed by separate individuals arranged into functional specialties, with large projects planned up front and with little consideration for socio-technical dynamics.”

It’s also worth remembering that the old-school view of teams mirrors the old school view of individuals within those teams: discrete pieces that can be picked up, moved around, replaced, upgraded and removed without a drop in function.

Of course, the switch to team-first brings about a complementary change in processes, too. And this change is how teams start to embody some of the most desirable qualities of the best-managed and most effective software systems.

“Loose coupling—components do not hold strong dependencies on other components
High cohesion—components have clearly bounded responsibilities, and their internal elements are strongly related
Clear and appropriate version compatibility
Clear and appropriate cross-team testing”

Actually, the switch to team-first won’t bring about any changes if the understanding of what a team should be able to do doesn’t shift. 

Teams—as a normative ideal—need to represent something different than the usual siloed, specialised collections of specific or complementary expertise. They need to be built and optimised for autonomy, for throughput, for end-to-end capabilities. They need to be stream-aligned.

Before we get to what a stream-aligned team is, however, we need to take a look at Conway’s law.


Refactoring the org chart

Conway’s law: “Organizations which design systems . . . are constrained to produce designs which are copies of the communication structures of these organizations.”

Here’s a slightly rephrased version that illustrates the connection between software design and organisational communications:

“This quote from Ruth Malan provides what could be seen as the modern version of Conway’s law: “If the architecture of the system and the architecture of the organization are at odds, the architecture of the organization wins.” Malan reminds us that the organization is constrained to produce designs that match or mimic the real, on-the-ground communication structure of the organization.”

The law in practice:

“In other words, the use of a shared DBA team is likely to drive the emergence of a single shared database; and the use of separate front-end and back-end developers is likely to drive a separation between UI and app tiers, due to the nature of the communication taking place. If this single shared database and four, two-tier apps is the software architecture we want, then all is well. However, if we do not want a single shared database, we have a problem. The homomorphic force identified by Conway’s law is exerting a strong pull on the “natural” software architecture to emerge from the current organization design and communication paths.”

Recognising the validity of Conway’s law has an immediate implication that applies to the people at the upper echelons of modern software organisations:

“If we accept that the self-similar force (between architecture and team organization) described by Conway is real, then we also need to accept that anyone who makes decisions about the shape and placement of engineering teams is strongly influencing the software systems architecture. There is a logical implication of Conway’s law here, in the words of Ruth Malan: “if we have managers deciding . . . which services will be built, by which teams, we implicitly have managers deciding on the system architecture.”

…Given that there is increasing evidence for the homomorphism behind Conway’s law, it is very ineffective (perhaps irresponsible) for organizations that build software systems to decide on the shape, responsibilities, and boundaries of teams without input from technical leaders. Organization design and software design are, in practice, two sides of the same coin, and both need to be undertaken by the same informed group of people.”

Naturally, restructuring teams to mimic the desired software system architecture is only one part of the team-first approach. Teams don’t function in isolation—they communicate. And that communication has to enter into the discussion around team and organisational structures.

Anyone who’s worked in any organisation anywhere can affirm that the org chart is an inaccurate representation of reality. 

“Most organizations want or are required to have a single view of their teams and people called the “org chart.” This chart depicts the teams, departments, units, and other organizational entities, as well as how they relate to each other. It usually shows hierarchical lines of reporting, which imply lines of communication running “up and down” the organization.

…The problem with taking the org chart at face value is that we end up trying to architect people as if they were software, neatly keeping their communication within the accepted lines. But people don’t restrict their communications only to those connected lines on the chart. We reach out to whomever we depend on to get work done. We bend the rules when required to achieve our goals. That’s why actual communication lines look quite different from the org chart…

…In practice, people communicate laterally or “horizontally” with people from other reporting lines in order to get work done. This creativity and problem solving needs to be nurtured for the benefit of the organization, not restricted to optimize for top-down/bottom-up communication and reporting.”

The intent behind the org chart is valid—document and limit communication—but its implementation is too black-and-white and too focused on verticality for modern ways of working. Communication still needs to be limited in some way, though.

“One key implication of Conway’s law is that not all communication and collaboration is good. Thus it is important to define “team interfaces” to set expectations around what kind of work requires strong collaboration and what doesn’t. Many organizations assume that more communication is always better, but this is not really the case. …

If we can achieve low-bandwidth communication—or even zero-bandwidth communication—between teams and still build and release software in a safe, effective, rapid way, then we should. …

Communication within teams is high bandwidth. Communication between two “paired” teams can be mid bandwidth. Communication between most teams should be low bandwidth. …

With open-plan offices and, particularly, with ubiquitous, instant communication via chat tools, anyone can communicate with anyone else. In this situation, one can accidentally fall into a pattern of communication and interaction where everyone needs to communicate with everyone else (putting the onus on the consumer to distill what is relevant) in order to get work done. From the viewpoint of Conway’s law, this will drive unintended consequences for the software systems, especially a lack of modularity between subsystems. …

If the organization has an expectation that “everyone should see every message in the chat” or “everyone needs to attend the massive standup meetings” or “everyone needs to be present in meetings” to approve decisions, then we have an organization design problem. Conway’s law suggests that this kind of many-to-many communication will tend to produce monolithic, tangled, highly coupled, interdependent systems that do not support fast flow. More communication is not necessarily a good thing.”

Deliberately and appropriately limiting communication doesn’t only contribute to the shape of desired software systems. It also uplifts the performance of a team and the quality of their work and/or deliverables. This in turn enhances an organisation’s ability to manage and understand complex problems and domains.

“”Technologies and organizations should be redesigned to intermittently isolate people from each other’s work for best collective performance in solving complex problems. —Ethan Bernstein, Jesse Shore, and David Lazer, “How Intermittent Breaks in Interaction Improve Collective Intelligence” …

Bernstein and colleagues found that “groups whose members interacted only intermittently . . . had an average quality of solution that was nearly identical to those groups that interacted constantly, yet they preserved enough variation to find some of the best solutions too.” Intermittent collaboration found better solutions than constant interaction.”

What does “deliberately and appropriately” limiting communication actually look like? Here’s an overview that’s especially applicable to remote-first/distributed teams:

“The virtual environment is increasingly important as many organizations adopt a remote-first policy. The virtual environment comprises digital spaces such as a wiki, internal and external blogs and organization websites, chat tools, work tracking systems, and so forth. Effective remote work goes beyond having the necessary tools; teams need to agree on ground rules around working hours, response times, video conferencing, tone of communication, and other practical aspects that, if underestimated, can make or break a distributed team, even when all the right tools are available.”

That overview looks suspiciously like a construct used to manage interaction with software.

“With stable, long-lived teams that own specific bits of the software systems, we can begin to build a stable team API: an API surrounding each team. An API (application programming interface) is a description and specification for how to interact programmatically with software, so we extend this idea to entire interactions with the team.

The team API includes: 

Code: runtime endpoints, libraries, clients, UI, etc. produced by the team 
Versioning: how the team communicates changes to its code and services (e.g., using semantic versioning [SemVer] as a “team promise” not to break things) 
Wiki and documentation: especially how-to guides for the software owned by the team 
Practices and principles: the team’s preferred ways of working 
Communication: the team’s approach to remote communication tools, such as chat tools and video conferencing 
Work information: what the team is working on now, what’s coming next, and overall priorities in the short to medium term 
Other: anything else that other teams need to use to interact with the team

For effective team-first ownership of software, teams need to continuously define, advertise, test, and evolve their team API to ensure that it is fit for purpose for the consumers of that API: other teams.

An even more stringent team API approach is taken at cloud vendor AWS, where CEO Jeff Bezos insisted on almost paranoid levels of separation between teams. For example, each team at AWS must assume that “every [other team] becomes a potential DOS [denial of service] attacker requiring service levels, quotas, and throttling.””

For larger organisations with a lot of history and established protocols this sounds like a lot of (scary) work. But is it more work, in total, than the confusion and missteps caused by unwieldy, inefficient, no-longer-fit-for-purpose structures of a bygone era? 

For smaller organisations this is also a lot of work. But it is also a tremendous opportunity; something so fundamental and so often de-prioritised has the potential to be a significant contributing factor to an organisation’s core competencies, if not a differentiator in and of itself.


Cognitive load and other performance factors

Sorry to be that guy: one more meta issue before we get to the actual team types and interaction modes. This one is important: performance. 

The primary factor affecting a team’s performance is cognitive load. A simple definition:

“When we talk about cognitive load, it’s easy to understand that any one person has a limit on how much information they can hold in their brains at any given moment. The same happens for any one team by simply adding up all the team members’ cognitive capacities.”

Unfortunately, it’s regularly overlooked.

“The number of services and components for which a product team is responsible (in other words, the demand on the team) typically keeps growing over time. However, the development of new services is often planned as if the team had full-time availability and zero cognitive load to start with. This neglect is problematic because the team is still required to fix and enhance existing services. Ultimately, the team becomes a delivery bottleneck, as their cognitive capacity has been largely exceeded, leading to delays, quality issues, and often, a decrease in team members’ motivation.”

The result of ignoring cognitive load is a drop in team function and/or a big hit to the motivation of individual team members.

“This is not surprising if we consider Dan Pink’s three elements of intrinsic motivation: autonomy (quashed by constant juggling of requests and priorities from multiple teams), mastery (“jack of all trades, master of none”), and purpose (too many domains of responsibility).”

If cognitive load is so important to team performance—and the team is the fundamental unit in an organisation is the team—then the obvious response is to actively manage the cognitive loads of teams. Two related sequences from the Team Topologies book that suggest how to do that.

First:

“One of the least acknowledged factors that increases friction in modern software delivery is the ever-increasing size and complexity of codebases that teams have to work with. This creates an unbounded cognitive load on teams. …

Cognitive load also applies to teams that do less coding and more execution of tasks, like a traditional operations or infrastructure team. They can also suffer from excessive cognitive load in terms of domains of responsibility, number of applications they need to operate, and tools they need to manage. …

For software-delivery teams, a team-first approach to cognitive load means limiting the size of the software system that a team is expected to work with; that is, organizations should not allow a software subsystem to grow beyond the cognitive load of the team responsible for the software. This has strong and quite radical implications for the shape and architecture of software systems, as we shall see later in the book. …

Sweller defines three different kinds of cognitive load:

Intrinsic cognitive load—relates to aspects of the task fundamental to the problem space (e.g., “What is the structure of a Java class?” “How do I create a new method?”)

Extraneous cognitive load—relates to the environment in which the task is being done (e.g., “How do I deploy this component again?” “How do I configure this service?”)

Germane cognitive load—relates to aspects of the task that need special attention for learning or high performance (e.g., “How should this service interact with the ABC service?”)

Broadly speaking, for effective delivery and operations of modern software systems, organizations should attempt to minimize intrinsic cognitive load (through training, good choice of technologies, hiring, pair programming, etc.) and eliminate extraneous cognitive load altogether (boring or superfluous tasks or commands that add little value to retain in the working memory and can often be automated away), leaving more space for germane cognitive load (which is where the “value add” thinking lies). …

When measuring cognitive load, what we really care about is the domain complexity—how complex is the problem that we’re trying to solve with software? A domain is a more largely applicable concept than software size. For example, running and evolving a toolchain to support continuous delivery typically requires a fair amount of tool integration and testing. Some automation code will be needed, but orders of magnitude less than the code needed for building a customer-facing application. Domains help us think across the board and use common heuristics.”

Second: four heuristics for managing a team’s cognitive load:

“The first heuristic is to assign each domain to a single team. If a domain is too large for a team, instead of splitting responsibilities of a single domain to multiple teams, first split the domain into subdomains and then assign each new subdomain to a single team. …

The second heuristic is that a single team (considering the golden seven-to-nine team size) should be able to accommodate two to three “simple” domains. Because such domains are quite procedural, the cost of context switching between domains is more bearable, as responses are more mechanical. In this context, a simple domain for a team might be an older software system that has only minor, occasional, straightforward changes. However, there is a risk here of diminishing team members’ motivation due to the more routine nature of their work. …

The third heuristic is that a team responsible for a complex domain should not have any more domains assigned to them—not even a simple one. This is due to the cost of disrupting the flow of work (solving complex problems takes time and focus) and prioritization (there will be a tendency to resolve the simple, predictable problems as soon as they come in, causing further delays in the resolution of complex problems, which are often the most important for the business). …

The last heuristic is to avoid a single team responsible for two complicated domains. This might seem feasible with a larger team of eight or nine people, but in practice, the team will behave as two subteams (one for each domain), yet everyone will be expected to know about both domains, which increases cognitive load and cost of coordination.”

Two other things to keep in mind regarding team performance. The first is that cognitive load is impacted by inter-team dependencies:

“To achieve teams that have well-defined responsibilities, can work independently, and are optimized for flow, it is essential to detect and track dependencies and wait times between teams. …

In their 2012 paper, “A Taxonomy of Dependencies in Agile Software Development,” Diane Strode and Sid Huff propose three different categories of dependency: knowledge, task, and resource dependencies. …

Whichever tool is used, it is important to track the number of dependencies per area, and to establish thresholds and alerts that are meaningful for a particular situation. The number of dependencies should not be allowed to increase unchecked. Instead, such an increase should trigger adjustments in the team design and dependencies.”

The second is that team performance is like trust: it takes a long time to build and only a little time to demolish. So don’t repeat the following anti-patterns:

“The first anti-pattern is ad hoc team design. This includes teams that have grown too large and been broken up as the communication overhead starts taking a toll, teams created to take care of all COTS software or all middleware, or a DBA team created after a software crash in production due to poor database handling. …

The other common anti-pattern is shuffling team members. This leads to extremely volatile teams assembled on a project basis and disassembled immediately afterward, perhaps leaving one or two engineers behind to handle the “hardening” and maintenance phases of the application(s).”

Teams are perpetually storming, norming and performing, and time is needed for the consequences of alterations to teams to become apparent. The good consequences at least; it can take anywhere between three to twelve months for a team to flourish but only minutes to nuke it.

If we recognise this asymmetry when tampering with teams and decide to give them time then we get an extra benefit. One that makes not interfering a little bit more palatable; compounding.

“A further benefit of taking a team-first approach to software boundaries is that the team tends to easily develop a shared mental model of the software being worked on. Research has shown that the similarity of team mental models is a good predictor of team performance, meaning fewer mistakes, more coherent code, and more rapid delivery of outcomes. As we begin to optimize more and more for the team, the benefits begin to compound in a positive way.”


Team types

Team Topologies proposes four team types—stream-aligned, enabling, complicated sub-system and platform teams. Stream-aligned teams are the most numerous team-type within an organisation and are supported by the other three team types.

First remark; the use of the word “stream”:

“A “stream” is the continuous flow of work aligned to a business domain or organizational capability. Continuous flow requires clarity of purpose and responsibility so that multiple teams can coexist, each with their own flow of work. A stream-aligned team is a team aligned to a single, valuable stream of work; this might be a single product or service, a single set of features, a single user journey, or a single user persona. Further, the team is empowered to build and deliver customer or user value as quickly, safely, and independently as possible, without requiring hand-offs to other teams to perform parts of the work. …

Not only is the term “stream aligned” more suited to a wider range of situations than either “product” of “feature,” but “stream aligned” also incorporates and helps to emphasize a sense of flow (because a stream flows). Finally, not all software situations need products or features (especially those focused on providing public services), but all software situations benefit from alignment to flow. …

In line with the principle “you build it, you run it” popularized by Werner Vogels, CTO of Amazon, “service teams” (as they’re called internally) must be cross-functional and include all the required capabilities to manage, specify, design, develop, test, and operate their services (including infrastructure provisioning and client support). These capabilities are not necessarily mapped to individuals; the team as a whole must provide them. Each individual has a primary area of expertise, but their contributions are not limited to it.”

Second remark; picking a team’s stream:

“Different streams can coexist in an organization: specific customer streams, business-area streams, geography streams, product streams, user-persona streams, or even compliance streams (in highly regulated industries). … A stream can even take the form of a micro-enterprise within a large firm, with an independent focus and purpose (e.g., innovating on products that do not exist yet). Whichever kind of stream of changes a stream-aligned team is aligned to, that team is funded in a long-term, sustainable manner as part of a portfolio or program of work, not as a fleeting project.”

As I hope is obvious by now, stream-aligned teams are optimised for autonomous, end-to-end capability.

“Generally speaking, each stream-aligned team will require a set of capabilities in order to progress work from its initial (requirements) exploration stages to production. These capabilities include (but are not restricted to):

Application security
Commercial and operational viability analysis
Design and architecture 
Development and coding
Infrastructure and operability
Metrics and monitoring
Product management and ownership
Testing and quality assurance 
User experience (UX)

It’s critical not to assume each capability maps to an individual role in the team; that would mean teams would have to include at least nine members to match the list above. Instead, we’re talking about being able, as a team, to understand and act upon the above capabilities. This might mean having a mix of generalists and a few specialists. Having only specialized roles would lead to a bottleneck every time a piece of work depended on a specialist who might be currently busy.”

Stream-aligned teams are supported by enabling, complicated sub-system and platform teams.

What an enabling team is:

“An enabling team is composed of specialists in a given technical (or product) domain, and they help bridge this capability gap. Such teams cross-cut to the stream-aligned teams and have the required bandwidth to research, try out options, and make informed suggestions on adequate tooling, practices, frameworks, and any of the ecosystem choices around the application stack. This allows the stream-aligned team the time to acquire and evolve capabilities without having to invest the associated effort (in our experience, such efforts and their impact on the rest of the team also tend to be dramatically underestimated by ten to fifteenfold).”

What an enabling team isn’t:

“Enabling teams actively avoid becoming “ivory towers” of knowledge, dictating technical choices for other teams to follow, while helping teams to understand and comply with organization-wide technology constraints. This is akin to the idea of “servant leadership” but applied to team interactions rather than individuals. The end goal of an enabling team is to increase the autonomy of stream-aligned teams by growing their capabilities with a focus on their problems first, not the solutions per se. …

Enabling teams do not exist to fix problems that arise from poor practices, poor prioritization choices, or poor code quality within stream-aligned teams. Stream-aligned teams should expect to work with enabling teams only for short periods of time (weeks or months) in order to increase their capabilities around a new technology, concept, or approach. After the new skills and understanding have been embedded in the stream-aligned team, the enabling team will stop daily interaction with the stream-aligned team, switching their focus to a different team.”

What a complicated sub-system team is:

“A complicated-subsystem team is responsible for building and maintaining a part of the system that depends heavily on specialist knowledge, to the extent that most team members must be specialists in that area of knowledge in order to understand and make changes to the subsystem.”

Complicated sub-system teams develop for a very specific reason: as a cognitive load shield for stream-aligned teams that can’t replicate the volume of expertise/insight needed to interact with a complicated sub-system.

“The goal of this team is to reduce the cognitive load of stream-aligned teams working on systems that include or use the complicated subsystem. The team handles the subsystem complexity via specific capabilities and expertise that are typically hard to find or grow. …

The critical difference between a traditional component team (created when a subsystem is identified as being or expected to be shared by multiple systems) and a complicated-subsystem team is that the complicated-subsystem team is created only when a subsystem needs mostly specialized knowledge. The decision is driven by team cognitive load, not by a perceived opportunity to share the component.”

Finally, we have platform teams:

“The purpose of a platform team is to enable stream-aligned teams to deliver work with substantial autonomy. The stream-aligned team maintains full ownership of building, running, and fixing their application in production. The platform team provides internal services to reduce the cognitive load that would be required from stream-aligned teams to develop these underlying services. …

This definition of “platform” is aligned with Evan Bottcher’s definition of a digital platform: A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination. …

The platform team’s knowledge is best made available via self-service capabilities via a web portal and/or programmable API (as opposed to lengthy instruction manuals) that the stream-aligned teams can easily consume. “Ease of use” is fundamental for platform adoption and reflects the fact that platform teams must treat the services they offer as products that are reliable, usable, and fit for purpose, regardless of if they are consumed by internal or external customers.”

These four team types are able to utilise three patterns of interaction: collaboration, X-as-a-service and facilitating.


Interaction modes

Running with the team types above without also standardising interaction modes is like fishing without bait; it won’t accomplish much. A reminder:

“Formalizing the ways in which teams should interact when building software systems helps to more easily assess the effectiveness of many aspects of software delivery by more explicitly defining interfaces between teams; in turn, it is expected (from Conway’s law) that these interfaces will be reflected in the software systems being built. …

Teams should ask: “What kind of interaction should we have with this other team? Should we be collaborating closely with the other team? Should we be expecting or providing a service? Or should we be expecting or providing facilitation?” …

Leading technologist James Urquhart, writing about team intercommunication with Conway’s law in mind, describes the need for “a communication backchannel that avoids much of the politics, bandwidth constraints, and simple inefficiency of human-to-human communication.” This is exactly the kind of outcome this chapter’s well-defined team interactions should provide.”

Now, the interaction types.

First, collaboration (“working closely together with another team”)

“The collaboration team mode is suitable where a high degree of adaptability or discovery is needed, particularly when exploring new technologies or techniques. The collaboration interaction mode is good for rapid discovery of new things, because it avoids costly hand-offs between teams. …

There are two useful ways to visualize teams interacting using the collaboration mode. 

The first is to visualize two teams with distinct expertise and responsibilities working together on a small set of things. In this first collaboration interaction, the two teams substantially retain their responsibility and expertise for their natural area of focus, and work together on a specific subset of activities and details.

The second visualization of collaboration mode identifies that the nature of working together between teams can be almost total: although there were originally two teams with different skills and expertise, now there is effectively a single team pooling expertise and responsibilities.

In both cases—with a small defined overlap, and with a full overlap of focus and responsibilities—the two teams must take on joint responsibility for the overall outcomes of their collaboration, because the act of collaborating creates a blurring of responsibility boundaries. Without joint responsibility, there is a danger of loss of trust if something goes wrong.”

Second, X-as-a-service (“consuming or providing something with minimal collaboration”):

“The X-as-a-Service team interaction mode is suited to situations where there is a need for one or more teams to use a code library, component, API, or platform that “just works” without much effort, where a component or aspect of the system can be effectively provided “as a service” by a distinct team or group of teams. …

During later phases of systems development and periods where predictable delivery is needed (rather than discovery of new approaches), the X-as-a-Service model works best. …

With X-as-a-Service, there is great clarity about who owns what: one team consumes something that the other team provides. Less context is needed by each team compared to working in collaboration mode, so the cognitive load on each side can be “lower than.” By design, innovation across the boundary happens more slowly than with collaboration, precisely because X-as-a-Service has a nice, clean API that has defined the service well. …

For a component or aspect of a system to be provided effectively as a service, not only must the responsibility boundary make sense in the context of the business or technical domain, but the team providing the service will also be required to be adept at understanding the needs of the teams that consume its service and managing their aspect of the system using service-management principles (through the use of versioning, product management, and so forth). …

For something to be provided as a service—whether a component, an API, a testing tool, or an entire delivery platform—the team responsible must have a strong sense of responsibility toward both the consumers and the viability of the thing they are providing. …

Furthermore, the service they provide must be managed in a way that keeps it viable over time: requests for new features from consuming teams are considered but not built just because a team has asked for them. Instead, the purpose and remit of the thing is evolved with the best interest of all consumers in mind, with enhancements carefully scheduled and planned in consultation with other teams.”

Third, facilitating (“helping (or being helped by) another team to clear impediments”):

“The facilitating team interaction mode is suited to situations where one or more teams would benefit from the active help of another team facilitating (or coaching) some aspect of their work. The facilitating interaction mode is the main operating mode of an enabling team (see Chapter 5) and provides support and capabilities to many other teams, helping to enhance the productivity and effectiveness of these teams. …

The remit of the team undertaking the facilitation is to enable the other team(s) to be more effective, learn more quickly, understand a new technology better, and discover and remove common problems or impediments across the teams. The facilitating team can also help to discover gaps or inconsistencies in existing components and services used by other teams. …

A team with a facilitating remit does not take part in building the main software systems, supporting components, or platform but, instead, focuses on the quality of interactions between other teams building and running the software.”

Naturally, some of these interaction modes map particularly well to certain team types. For example:

  • Enabling teams will facilitate increases in the expertise and autonomy of stream-aligned teams
  • Stream-aligned teams will co-ordinate with complicated sub-system teams to access X-as-a-service
  • Platform teams will collaborate with complicated sub-system teams to formalise and democratise access to X within a platform layer

But this is not the endgame. There is another layer, another glass ceiling which can be shattered. 

Organisations can do all of the above—adopt a team-first approach; leverage Conway’s law; restructure teams, actively manage their cognitive load and enhance their interactions—and see significant advantages as a result. But they can go a bit further.


Becoming a sensing organisation

Earlier on, I stated the radical idea that organisations are like organisms. Revolutionary stuff. Extend that a little: organisms sense.

“Historically, many organizations have treated “develop” and “operate” as two distinct phases of software delivery, with very little interaction and certainly almost no feedback from operate to develop. Modern software delivery must take a completely different approach: the operation of the software should act as and provide valuable signals to the development activities. By treating operations as rich, sensory input to development, a cybernetic feedback system is set up that enables the organization to self steer. …

With well-defined, stable teams taking effective ownership of different parts of the software systems and interacting using well-defined communication patterns, organizations can begin to activate a powerful strategic capability: organizational sensing. …

Organizational sensing uses teams and their internal and external communication as the “senses” of the organization (sight, sound, touch, smell, taste)—what Peter Drucker calls “synthetic sense organs for the outside.” Without stable, well-defined neural communication pathways, no living organism can effectively sense anything. To sense things (and make sense of things), organisms need defined, reliable communication pathways.”

In contrast to an organisation that senses, most organisations remain senseless—not by intent—by design.

“Many organizations—those with unstable and ill-defined teams, relying on key individuals and (often) suppressing the voices of large numbers of staff—are effectively “senseless” in both meanings of the word: they cannot sense their environmental situation, and what they do makes no sense. …

When speed of change was measured in months or years (as in the past), organizations could manage with very slow and limited environmental sensing; however, in today’s network-connected world, high-fidelity sensing is crucial for organizational survival, just as an animal or other organism needs senses to survive in a competitive, dynamic natural environment.”

I mentioned John Boyd above. The OODA loop is applicable in this context but it’s a bonus. Just remember that sensing implies not only perception but also decision making and action.

“Not only do organizations need to sense things with high fidelity, they also need to respond rapidly. Organisms generally have separate specialized organs for sensing (eyes, ears, etc.) and responding to input (limbs, body, etc.). The kinds of signals that different teams will be able to detect will differ depending on what the team does and how close it is to external customers, internal customers, other teams, and so on, but each team will be capable of providing sensory input to the organization and responding to the information by adjusting their team interaction patterns.”

Sensing is the recognition, transmission and response to a signal or stimulus. In the modern environment, the things that can be sensed are at once so numerous and so subtle that traditional organisational structures can’t be trusted. 

“Increasingly, software is less of a “product for” and more of an “ongoing conversation with” users. To make this ongoing conversation effective and successful, organizations need a “continuity of care” for its software. The team that designs and builds the software needs to be involved in its running and operational aspects in order to be able to build it effectively in the first place. …

The team providing this “design and run” continuity of care also needs to have some responsibility for the commercial viability of the software service; otherwise, decisions will be made in a vacuum separate from financial reality. …

One of the most important changes to improve the continuity of care is to avoid “maintenance” or “business as usual” (BAU) teams whose remit is simply to maintain existing software. …

Having separate teams for new-stuff and BAU also tends to prevent learning between these two groups. The new-service team gets to implement new technologies and approaches but without any ability to see whether these approaches are effective.”

In the same way that sensory information is continuous, so is the attention that must be paid to the sensory organs themselves. Organisations, in this context, are not like organisms. Their structures must be monitored, adapted and evaluated. Team Topologies provides three examples of scenarios that could trigger an organisational adaptation.

First, software has grown too large for one team:

“Another aspect at play occurs when the team no longer holds a holistic view of the system; thus, it loses the self-awareness to realize when the system has become too large. While there is some correlation between system size in terms of lines of code or features, it is the limit on cognitive capacity to handle changes to the system in an effective way that is most of concern here.”

Second, delivery cadence is becoming slower:

“A long-lived, high-performing product team should be able to steadily improve their delivery cadence as they find ways to work more efficiently together and remove bottlenecks in delivery. However, a pre-requisite for these teams to flourish is to grant them autonomy over the entire life cycle of the product. This means no hard dependencies on external teams, such as waiting for another team to create new infrastructure. Being able to self-serve new infrastructure via an internal platform is a soft dependency (assuming the provisioning self-service is maintained by a platform team).”

Third, multiple business services rely on a large set of underlying services:

“In order to deliver useful business value, the higher-level streams need to integrate with many lower-level services (the realm of enterprise service management). If the streams have to integrate separately with each underlying service, it can be challenging to assess the effectiveness of flow and to diagnose errors in long-running processes that may have some human-decision input. For example, the underlying services may not expose tracking mechanisms or may each have a separate way to identify transactions.”

Naturally, every adaptation to team structures and interaction modes involves the question of boundaries. Specifically, “where should they be placed?” The boundaries of a year-old startup will not be the same as the boundaries for an established, multi-national enterprise; especially as these organisations move through time.

Team Topologies suggests using the concept of fracture planes to orientate the setting of boundaries for teams and their interactions.

“A fracture plane is a natural seam in the software system that allows the system to be split easily into two or more parts. This splitting of software is particularly useful with monolithic software. …

It is usually best to try to align software boundaries with the different business domain areas. A monolith is problematic enough from a technical standpoint (particularly, the way it slows down the delivery of value over time as building, testing, and fixing issues takes increasingly more time). If that monolith is also powering multiple business domain areas, it becomes a recipe for disaster, affecting prioritization, flow of work, and user experience. …

Most of our fracture planes (software responsibility boundaries) should map to business-domain bounded contexts. A bounded context is a unit for partitioning a larger domain (or system) model into smaller parts, each of which represents an internally consistent business domain area (the term was introduced in the book Domain-Driven Design by Eric Evans).”

There are a wide range of fracture planes to choose from: regulatory compliance requirements, tempo or rate of change, user numbers or personas, revenue-generating versus customer-acquisition risk profiles, geography. Regardless of the fracture plane identified, the following question must be asked:

“Does the resulting architecture support more autonomous teams (less dependent teams) with reduced cognitive load (less disparate responsibilities)?”

This all falls under the “organisational sensing” category. External stimuli and signals must be perceived, transmitted, processed and responded to via internal structures that, in turn, generate their own stimuli and signals. 


An organism

The ultimate goal of the team-first approach is simple: transition away from being an organisation and become an organism. Everything above is a step on that journey, and there are already numerous groups skipping lightly down the road. 

The key question is, “Do we want to join them?”