Uncategorized

How To Actually Be Data-Driven

March 6, 2023

In one of my previous jobs, the thing which I dreaded more than anything in the world was a 2-hour meeting every Monday.

In that meeting, the entire department would sit down and painstakingly go through the sales numbers for every single city in our network. City by city, we’d be asked to explain why the sales numbers were up or down that week. If Kuala Lumpur sales were down, the person managing KL would explain that consumers were avoiding purchases because of an upcoming election. If sales were up, they’d say that consumers were buying more ahead of the elections. Every week was an exercise of “Let’s try to come up with a better story to explain why the data went up or down”.

Chances are, you’ve probably been a part of these meetings yourself. Companies want their employees to be more “data-driven”, so managers force their teams to spend more time looking at the data. After all, if revenue is important, we should spend more time analysing the data, so that we can uncover some hidden secret in the data… right?

Well yes, except that I usually walked away from those meetings feeling like they were a colossal waste of time. We had mountains of data, so why didn’t any of it feel useful?

The Practical Powers Of Data

Of course, it’s not just about having more data, but understanding what you can actually DO with that data.

Unless you’re a researcher, you’re probably not that interested in intellectual back-scratching to uncover some esoteric insight that nobody cares about. No, most of us are interested in data only insofar as it helps us make practical decisions to get the results we want.

For example, we all know that weight loss is largely a function of 1) your diet, and 2) how much exercise you get. But which factor is more important? If your data uncovered the insight that diet helps you to lose weight 3X more effectively than exercise, you might focus your efforts on say, intermittent fasting, instead of spending 5 hours a day at the gym.

So how do you actually bridge the gap between 1) having the data and 2) uncovering the right ACTIONS to do?

I’ve been mulling over this problem for a couple of years now, partly because I want to avoid more time-wasting Monday morning meetings. But also because having this ability is like having a superpower. If I get to run a team or a startup in the future, I’d want to understand how the most sophisticated companies in the world make data-driven decisions, so that I can emulate them.

So while I’m not a data practitioner, I’ve researched approaches that others have adopted. And since one of the goals of this blog is to distill learnings from interesting (well, at least to me) topics, I’ll do my best to do an overview here. If there’s interest, I might do a deeper dive into these specific sub-topics. (let me know!)

Here are three non-exhaustive aspects of becoming more data-driven:

Understand the difference between input and output metrics
Understand the relationship between inputs and outputs
Understand the causality of inputs to your outputs

Understand the Difference Between Input and Output Metrics

In the book High Output Management, author Andy Grove (former chairman & CEO of Intel) described a company as simply a system that uses inputs to generate outputs. Outputs are things like software, products, and revenue, while inputs are things like labour, time, IP, and raw materials. Its a laughably simple idea – you might have learned some version of this in Econ 101 – but I’ve seen so many businesses completely ignore this when it comes to analysing data.

For example, in my painful Monday morning meeting, our key objective was “revenue”. Therefore, our meeting focused almost exclusively on revenue: Slicing & dicing different revenue cuts, looking at different time periods, etc. The problem with this approach is that revenue was an output metric, and not very actionable for us.

Why? Because there are dozens, if not hundreds of factors which impact revenue: Seasonality, demand, competitor activity, political climate, the economy, commodity prices, etc. Many of these were beyond our control. Whenever revenue went down, it was easy to blame these external factors, absolving ourselves of any responsibility. Whenever revenue went up, we could simply claim credit for it by attributing it to some activity we did like a promotion or more sales calls.

As an output metric, revenue was something that we wanted to move, but we didn’t have actionable influence over it. Conversely, we were not focusing enough on input metrics which we did have influence on. Input metrics are things that you can directly control, which are correlatedwith output metrics. Things like sales calls, promos launched, percentage of discounts offered, and marketing investment, were all things we should have been tracking closely, but weren’t.

The book Working Backwards has an excellent chapter on Amazon’s famous Weekly Business Review (WBR) meeting, where they go through 400-500 metrics in a single hour. However, unlike my Monday morning meeting, the meeting focuses more on input metrics. This makes the meeting a lot more actionable, since attendees know that if an input metric is trending downwards, it is their responsibility to fix it. They can’t blame anything else.

As Cedric from Commogcog writes (emphasis mine):

Amazon divides its metrics into ‘controllable input metrics’ and ‘output metrics’. Output metrics are not typically discussed in detail, because there is no way of directly influencing them. (Yes, Amazon leaders understand that they are evaluated based on their output metrics, but they recognise these are lagging indicators and are not directly actionable)…

an insider who has been engaged in the WBR process over a period of months won’t see 500 different metrics — instead, their felt experience of a WBR deck is that they are looking at clusters of controllable input metrics that map to other clusters of output metrics.

You can apply the concept of input and output metrics to virtually any domain of life:

Weight Loss: Output metric = number of kilograms lost, input metrics = number of calories consumed, minutes spent exercising, intensity score of exercise, etc
Sales: Output metric = revenue generated, input metrics = number of sales calls/emails sent, number of pitches made, value of pitches made
Paid marketing: Output metric = sales generated, input metrics = Marketing investment made, reach, frequency, quality score, etc

In summary, output metrics are the results which we want to achieve, and input metrics measure the impact of our ACTIONS. To be truly data-driven, we need to focus on the input metrics to know where to focus our efforts.

**Understand the Relationship Between Inputs & Outputs**

Next, we need to understand how our output metric relates to our input metrics. Average companies might know that “A affects B“, but great companies are able to come up with specific predictions like “An X% increase in A predicts a Y% increase in B”.

We can use models to approximate the relationship between inputs and outputs. A model doesn’t have to be complicated. E = mc^2 is a model. So is F = ma. In college, one of my favourite courses was Econometrics. Not because I was particularly interested in the subject, but because I believed I could use models like the Fama-French model to get rich in the stock market (I soon found out that this was a lot harder than I thought).

There are different ways to build a model from a bunch of data, but the simplest way is probably to do a regression analysis. Now, “regression analysis” might bring back nightmares from your JC math classes, but thankfully there are now packages you can run on Excel or Sheets to generate those in seconds.

An example might help. Let’s say you want to create a model on how your diet and exercise affect your weight loss. First, you’d need to collate the number of kilos you lose in a week, the number of calories burned through exercise, and number of calories ingested through food. You would then track this data over several weeks, compile it into Excel, and run your model. This would give you a simple model like:

Number of kilos lost = A * (calories burnt through exercise) + B * (calories ingested this week) + c

Here, A and B are the coefficients which estimate the impact of that input. If A = 0.00001, then every calorie burnt through exercise would lead to you losing 0.00001kg (I’m just making this up). B will most likely be negative, since every calorie you ingest is likely inversely related to the number of kilos you lose (unless you are one of those annoying people who lose weight while eating more, in which case we can’t be friends anymore). c represents the number of kilos you have lost that are unexplained by diet or exercise. Maybe you have a particularly active lifestyle, or the type of foods you eat helped you lose weight – these are not accounted for in your model and are captured by the c parameter.

Okay, but why go through all the trouble of creating a model in the first place? Some reasons:

First, models help you to estimate the impact of an input metric on an output metric. As I mentioned earlier, companies with models can make specific statements like: “A salesperson who makes 20% more sales calls will typically also see 15% more revenue“, or “A 10% increase in digital video investment is typically correlated to a 20% more sales.” You can see why companies armed with insights like these might have a tremendous edge over companies which don’t.

Second, models can help you map the relationships of multiple input metrics together. For example, you could create a model on the future value of your savings by using your starting principle, time, and interest rate as variables. Your model will help you uncover the interplay between these variables. If you need $100,000 next year for a house downpayment, then you know that you must either 1) Find a bank account with a higher interest, or 2) Increase your monthly deposits, 3) increase the frequency of your monthly deposits, or any combination of these factors. More importantly, you can forecast quantitatively what your output might be based on your inputs.

Thirdly, you can use models to make strategic decisions. Marketers who manage multiple channels (e.g. digital, out of home, TV, print, etc) might use a Marketing Mix Model (MMM) to estimate the contribution of each marketing channel to overall revenue or sales. An MMM’s coefficients can quantify how much each channel contributes. For example, if I know that digital contributes 1.7X more sales compared to print, I might use that insight to allocate more marketing budget to that channel when I’m doing my annual planning.

Now, models might sound like a fancy, sophisticated way to be data-driven, but on their own, they are insufficient. Why? Because models are merely approximations of the real world. No one really believes that your weight strictly follows a specific linear formula. Models are reliant on math, are based on static assumptions, and the real world doesn’t always obey those rules.

Still, if we can find a model that broadly approximates reality, it’s still far better than making guesses all the time. British statistician George Box famously said that “all models are wrong, (but) some are useful”. If your model can broadly forecast your results with a reasonable margin of error, you’ll have an edge over those who are just shooting in the dark.

**Understand the Causality Between Input & Outputs**

However, models do have one major shortcoming which we can’t ignore: They measure correlation, not causation.

For example, lets say a UK retailer runs a model estimating store sales as an output metric, and temperature as an input metric. The model might predict that people spend more money when its cold. However, this doesn’t mean that weather causes people to spend more. Instead, it could be that people spend more during the Christmas season when its usually cold. The retailer who is putting their trust in temperature as an input metric, and turning their air conditioning to full blast, is mistaking correlation for causation.

The way to escape the correlation trap is to run experiments. In an experiment, you would split your audience into two groups: a control group where everything remains the same, and a test group where you would change one thing. And then you would measure the difference between the two.

For example, if you wanted to estimate the impact of TV advertising on revenue, you would assign a bunch of cities which don’t run any TV advertising as a control. Then you would pick a few cities to run some TV advertising in, and measure the overall uplift in sales between the control and test cities. (Edward Nevraumont cites this example in his excellent article about experiments vs models here)

Experimentation is a great way to answer the question “What would be the impact on my output be if I removed this input?”

Marketers love to say that they run experiments. Experiments are much easier to understand than models (nothing gets people’s eyes closing faster than the words “linear regression”), and they tell a more compelling story. A sure way to shut most critics up is to say that you ran a “controlled, double-blind experiment” with a smirk, while simultaneously stirring a cup of tea.

But experiments have their own shortcomings too. They represent findings from a specific point in time, and the results might be highly-dependent on the circumstances in which the experiment was run. Perhaps your marketing experiment showed amazing results because your particular offer or campaign messaging was really compelling. For this reason, while experiments might show causality, they also tend to be difficult to generalise.

Another major shortcoming of experiments: They are helluva time-consuming. It takes time to come up with a hypothesis, set up the experiment, and analyse the results. Furthermore, as in the case of “holdback” tests (tests where you would go dark in some markets to test if that change had any impact on sales), there is the opportunity cost of potentially losing some sales and revenue in the “go dark” markets.

(Side note: This is also why you wouldn’t run experiments on just about anything. It would take up so much time that you wouldn’t have time to do anything else. It drives me crazy whenever marketers say that they want to test a minor tweak, especially one that’s already been conclusively proven among many other companies that it’s become a best practice. In those scenarios, its better to simply apply the best practice, move on, and save your experiment resources for bigger, more strategic bets.)

So we now know that neither models nor experiments are a silver bullet. We need to use them both in conjunction with each other. Here’s how they play together:

How Metrics, Models & Experiments Play Together

Now that we have all the building blocks in place, let’s go back to the question that we started with: How do you get from simply having the data to understanding what actions to take?

To illustrate this, let’s revisit my previous department which ran that dreaded Monday morning meeting every week. Instead of overanalysing revenue metrics, how could they figure out a data-driven approach to improve revenue?

First, they could start by differentiating between output and input metrics. The output metric is easy enough: Revenue. They would then collate all the possible input metrics that might impact it: number of sales calls/emails sent, number of pitches made, value of pitches made, size of promotions, amount of discounts offered, etc. Remember that input metrics should be what they can CONTROL, while output metrics are the RESULTS they want.

Next, they could try creating a model to map the relationship between revenue and their input metrics. They would collate all the data on a weekly basis, compile it into Excel, and run a simple formula to generate a linear regression. This would spit out a model like Revenue = a * (sales calls) + b* (pitches made) + c* (size of promotions) +… Remember: the goal of this exercise is to be able to state a sentence like: “An X% increase in sales calls predicts a Y% increase in revenue”

However, since the model is just an approximation of reality, they will need to figure out how to validate and adjust it to make it more accurate.

To do this, they would need to run a series of experiments to validate their model. For example, the model might predict that every sales call generates a $100 increase in revenue. Let’s run an experiment and see if that’s true. First, pick 50 random salespeople and ask them to stop making sales calls for a week, while the other 50 salespeople continue making sales calls. After the experiment ends, compare the revenue difference between both groups. (Of course, it’s not so simple in reality as we’d have to account for time lags, not to mention angry salespeople who hate to see their revenue fall, but for this example let’s just go with it). I can then calculate the incremental revenue per sales call. Let’s say that this number comes up to $200 in incremental revenue per sales call. This might indicate that my model is undervaluing the impact of sales calls, and so I need to adjust my model’s sales call coefficient upwards. (Here is a great article explaining how marketers can use experiments – what they call “incrementality” – to adjust your model)

You might then ask, “If experiments are so conclusive, why do we need to have models? Why not just run experiments”? The answer would be because 1) the experiment might not hold true across all periods, 2) the experiment doesn’t account for the other input metrics. For example, my experiment wouldn’t show if my salespeople were simply making sales calls to go drinking with clients and not actually pitching anything while they’re there. Whereas in a model, I can use “pitch value” as a variable to account for whether a pitch was made, and isolate the impact of a sales call on its own.

As an end state, I would have an experiment-calibrated model that shows the relationship between my input and output metrics.

I can then make a reasonable forecast of how revenue will look like when, for example, 1) Salespeople increase sales calls by 20% a week, and 2) they increase the value of each pitch by 10% each time, and 3) they take their customers out for drinks at least once a month. (Hopefully, this last factor is statistically significant).

This is what it means to arrive at some ACTIONABLE insights. Not only can you see which inputs affect the output, but you can also observe the interplay between inputs and even prioritise them if time is short.

Is This More Complicated Than It Has To Be?

Whew, this article turned out to be a lot longer than I anticipated.

The unexpected length of this article forces us to confront the uncomfortable question – does this approach overcomplicate things? Are we simply throwing unnecessary math at a problem to arrive at broadly similar conclusions?

In some aspects, yes. If you’re thinking about losing weight or finding the best flight prices to Bangkok, it’s probably not worth your time to come up with a regression model and run incrementality experiments (although that would be super interesting, wouldn’t it?)

But at a company level, where millions of dollars are at stake, I feel that this is often an under-appreciated aspect of business decision making. Most companies aren’t willing to hire an analyst or take the time to be truly data-driven. And they often end up losing out to those who are willing to make those investments.

The upshot of this is that the market will probably pay a premium to hire people who are truly data-driven, since they are so rare. Writing this post is my attempt to try to make sense of all this and get closer to that skill level, and I hope it was helpful to you too.

As always, open to feedback and discussion!

Disclaimer: The opinions stated here are my own, not those of my company

<!–

–>