Data Process

Good Process: social conventions that improve overall productivity and reduce wasted/duplicative work. Bad Process: mandated rules that create additional work.


Over the last few decades, engineers have fine tuned process for shipping product changes. While automation (of testing or the deploy process) has reaped massive productivity gains, the last few years have taught me the value of engineering process to allow a team to parallalelize work on a shared goal and still avoid the pitfalls of the mythical man month by breaking down a project into smaller changes.

The paradigm and underlying tooling have become so commonplace and valuble that even for single-developer hobby projects, it’s still useful to setup git or continuous integration. Interestingly, improved tooling has led to a decline of specialization. Today, most people I know are “full-stack” developers. Instead of being responsible for “the website javascript” or “the billing system”, they own the whole shipping process: taking the idea, breaking it down into tasks, landing the code, and shipping those changes.


Just as engineering is centered on deploying product, analytics is centered on building a shared and cohesive understanding of our business. That understanding, just like a codebase, is both contextual to the product and ever-changing as the business (in its product offering, market position, and audience) change. Even if you hire a team of analysts, enabling that work to be parallelized and still cohesive without top-down management is insanely hard.

Evolving understanding

Most organizations have a “waterfall” approach to understanding their business. Executives define metrics, which are then split out to the smaller teams. At Dropbox, each product area had a single PM, who would either run analysis or delegate it to an analyst. Splitting the product into “areas” was the equivalent of having “frontend” and “backend” engineers—it helped reduce duplicate work and fulfilled demand from the product team for data. Analysts are more akin to sales engineers: filling data requests, but never acting as a cohesive org that was building a shared view of our business.

On the API team, our analyst helped us build a common understanding of what an “active user” of Dropbox’s API was. API users were amazing: they were disproportionately paid and high immensely high retention. As Dropbox grew, a large portion of the product engineering org was re-organized around “high value actives” or HVAs. These were different than monthly active users, because the company had decided that three behaviors (sending, sharing, and sync’ing files) were the key to our business.

This was problematic for the API team: using Dropbox via a 3rd party app wasn’t part of the definition, so if you used Dropbox to sync your 1Password data, you weren’t considered active. That might have been fine, except that API users were actually better in almost any metric than “sending files” users. For them, Dropbox had a highly differentiated value, much more than “I use you to send large files.”

If you were creating a definition for active, you’d probably focus on its connection with paid. Including API app usage as part of the definition both increased the number of HVAs and correlation to our paid users (the company metric that truly underlied everything). And while my manager agreed, HVAs were a top-down concept. Changing the definition required squeezing in lots of meetings to get executive buy-in. Even more problematically, there was no way to “measure the metric.” It was a subjective decision and that was that.

But because API users weren’t part of the HVA concept, the API group eventually became grouped under the “revenue” goals, which was even more confusing because the API wasn’t a paid feature and had even more trouble contributing to that metric. Dropbox manage to create a new shared understanding with HVAs…but our process for arriving at this understanding was too cumbersome for it to adapt when problems arose. As with all products that don’t have iterate rapidly, the problems accumulated, lead to mis-informed decisions, and eventually the metric was discarded. Dropbox’s S1 makes no mention of actives, despite MAUs being a commonly understood metric for SaaS companies.

“No battle plan survives contact with the enemy.” No product survives contact with the user (they find all the bugs you couldn’t imagine), and no metric survives contact with the market. Every metric definition has flaws, so it’s vital to allow the people closest to the battle to improvise and iterate on them.

Convergent evolution

How do we think about the funnel? It’s honestly hilarious that there are entire bootcamps around this content, as if the knowledge itself is some holy secret and not just the consistent application of common sense and common vocabulary.

It’s fine when metrics are an ivory tower, and the shared understanding is meted out in occasional all-hands. But what happens when you start democratizing data? Duplication through convergent evolution. You get a ton of people writing very similar queries, yet the analysis unable to be combined or de-duplicated into a greater whole because of incompatible assumptions.

For instance, at Sentry, we recently discovered that getting folks to send their first event was just the very beginning of their journey. We decided to define “active” as organizations that have at least one active user and have sent at least 100 events. Yet a huge portion of people who sent events in month 1 dropped off in month 2. Why? Our first instinct is “maybe they’re not using the entire product”. So you might perform a MLR on month 1 actives that remain active with feature adoption/usage as the independent variables. At the same time, our growth team is curious what makes people active in their first month. What features do they use?

At the end of this, we’ve spent days writing queries…but have we built a shared understanding for the rest of the company? Does your analysis take into account my assumptions (for instance, ignoring orgs that deleted themselves)? How do we build a shared vocabulary to talk about these assumptions? How do we question them, evaluate the pros and cons, and automatically rollout changes across all our existing reports?

Dropbox’s bizops team fought this problem with rollups, but because our tool was centered on the what (the query that generated the rollup) and not the collaborative decision-making that generated them, new hires would always ask, “Why did we structure this rollup this way?” and as older team members left, there were ever fewer people that could answer that.

It’s especially interesting because we often think of engineers as being incredble collaborative, and analysts being more like “business people”—more concerned about getting credit and sharing it. Yet…when you make collaboration so hard, why would anyone collaborate?

Why haven’t we solve this?

The current world of analytics is centered on charts and dashboards. But they are more analytics IDEs—meant to improve the productivity of individual analysts without improving the productivity of the team. Some of these tools, like Looker or Mode, that attempt to dictate their version with templates or showcases. Analytics tooling companies don’t have the best answer about how to arrive at a shared understanding of your businesses. Why? Because they’re all VC-funded analytics tool SaaS businesses. If your company sells high-end tea and tea accessories to businesses in Dallas, it’s doubtful anything they have to say is useful. To be clear, almost nothing our VCs have told me about Sentry’s business has turned out to be useful, and a lot of it was downright wrong.

It’d be as if GitHub published their own per-language ORMs and told everyone to use them. Not only is it doubtful that they’d be able to write the best ORM, but also it’s unlikely their ORM is the right answer for everyone. Instead of focusing on what code to write, they focused on improving collaboration and process around engineering and it was the community, in the form of Rails, SQLAlchemy, etc that built the “best” ORMs. The same goes for analytics. If there were a canonical, one-size fits all way of looking at A Business, then there’d be no point for investors: we’d just use a standard P/E ratio for all companies. Instead, our understanding the existing business and even new business models seems to be ever-evolving.

There is a need for a system that takes the best ideas from engineering process, and instead of simplifying the creation of the dashboard (and in the process, just multiplying the number of dashboards), recognize that “our shared understanding” is evolving and facilitates communication and iteration over perfection.