Learning from Amazon

Over the winter break I read Working Backwards, written by a couple of Amazon OGs. I thought it was quite good – definitely worth the read. I had two big take-aways from the book that are very relative to startup/scale-up life.

Being metrics driven

Not surprisingly, Amazon is a very data-driven company. The authors place a lot of emphasis on the data driven decision making at Amazon. Their biggest insight is the idea of focusing on controllable input metrics rather than output metrics. The argument is easy to understand. Output metrics, things like revenue or P&L, or even customer growth, are generally the output result of many complex factors. For example we have all seen over the past 6 months how “the state of the market” can be an overriding input in those metrics! So while it is important to measure your outputs, it is hard to drive action and behavior around a complex output like “grow revenue”.

So the key is to determine which inputs to your system strongly correlate with your outputs. And in particular, inputs that you directly control (we don’t control interest rates but they will be an important input into many FinTech businesses). If you can determine those controllable inputs, then you can set goals to improve those inputs, and thus drive your outputs.

Intuitively this is not a surprising insight – figure out “what works” in your business that leads to good outcomes! I think Amazon’s big contribution is the emphasis on determining the right inputs to measure. The book goes into depth explaining how the process of determining the right “controllable inputs” is intense and iterative. Amazon obsesses over finding the right input metrics first, and then drives to improve those.

I have been working on engineering and business metrics at Tatari for at least 6 quarters. Just getting the infrastructure in place to be able to reliably measure is significant work – even the obvious output metrics. Amazon’s practice is good confirmation that there is another whole level of work to uncover the key metrics that you can intentionally move that then deliver improvement in output. The Amazon book focuses on product and business metrics, but I am interested to reflect on this process around engineering metrics. What are the key input metrics that will drive improved quality, velocity, and lower costs?

Team autonomy

The other section of the Amazon book that I found insightful was the discussion of how Amazon landed on the “two pizza team” idea. The authors go into some detail about how Amazon observed that as the company was growing it became increasingly difficult to prioritize across all of their projects. Constant jockeying for resources and attention resulted in team priorities changing frequently and work arriving from out of left field.

This led to the idea of trying to break the company down into smaller autonomous teams that wouldn’t need to coordinate so much. They started the “two pizza” team idea in the product development org. Teams quickly observed that they would need to invest a lot of time to break their dependencies with other teams by creating bounded services and clean contracts between them. The authors write that this service-oriented re-architecting took a number of years. They observed that this approach worked well overall, and that many teams achieved a significant level of autonomy. But, they concede that many projects still spanned multiple teams and required coordination between them, regardless of architecture.

This was also my observation at Heroku – where we used a very effective Service Oriented Architecture overall. While this architecture and org design worked well when a feature was completely within the control of a single team, there were numerous cases where features still required coordination amongst multiple teams to implement. Building a feature across multiple services – requiring changes to their connecting interfaces – was difficult and unnatural. This led to some poor anti-patterns like teams attempting to “hack” their features in such a way that they didn’t require changes to connected services. I remember a particular feature whose development resulted in multiple platform downtime incidents because the implementing team was so reluctant to coordinate their work with other teams.

At Amazon they eventually observed that the “two pizza” team size (10 people) wasn’t suitable for all cases, and that the key idea was the focus on a single problem domain. This led to an evolution of the single-threaded owner team. The core idea is that the key to making progress in a problem domain is to have a dedicated leader and dedicated resources. This can feel like an obvious conclusion, but in my experience many companies struggle to manage a large set of goals and initiatives that exceed the “single-threaded owner” capacity of the org. Apple tries to solve this problem by allowing the Directly Responsible Individual to come from any level in the company, so there are many more potential candidates. This can be a good solve for “leader” problem, but doesn’t solve for dedicating resources to work on the problem.