Why Do Most Data Science Projects Fail? They Don’t Ask Good Questions
If I could sum up the problem with newly formed data science teams in 2019, it is this: way too much focus on tools and techniques rather than asking good questions that drive actual business value.
Yes, sometimes advanced machine learning methods generate huge value within organizations, and we should seek to apply these methods where appropriate. In many situations however, applying an advanced data science method may not be suitable for the question at hand. Or, an advanced method may be suitable but the question itself may not drive sufficient business value.
I’ve witnessed data science teams rush through the project selection phase, guided not by the beacons of business value and project feasibility, but by the shining lights of the latest tools and techniques. I’ve attempted to capture this bias towards fancy tools in the below series of matrices:
Put another way, some data science teams will select their preferred tools and techniques first and then go searching for a problem that fits nicely into their desired toolset. The better approach is to identify potential projects based on business value and feasibility and to then go searching for the best tools/techniques to solve them. And the dirty secret? A lot of the time, simple totals, counts, percentages and bar charts in Excel will get the job done. The secret sauce is the quality of the question and the business impact of the answer, rather than the method or tool.
But what about the analytics value chain? Most of us have seen some sort of variation of this popular diagram below, which purports that it is only machine learning and predictive analytics projects that can garner high value out of business data:
I think it includes the word ‘chain’ because it needs to be chained to the nearest boat anchor and sent to the bottom of the ocean.
For many organizations, this popular diagram is just plain wrong. Good descriptive analytics, (I.e. great questions answered by skilled analysts, packaged in a way the business can understand and action) often generates much more value than any machine learning model. Descriptive analytics can inform the strategic direction of a company, guide product development and inform management about competitive threats. Take the work of this analyst below, who found that Italian customers were proving to be problematic for a global marketing consulting firm.
The analysis shows that there seems to be a country level invoicing issue in Italy. This implies that perhaps we should examine the strength of our contracts in the region or assess the quality of our service delivery to Italian clients.
Upon drilling deeper to the customer level, the analyst found that there were 5 Italian customers in particular that successfully dispute a huge amount of their invoices, contributing to 44% of the lost revenue problem:
The analyst recommended immediately negotiating with these clients to pay upfront, and the revenue leak was fixed within days. No machine learning model in sight – just good, timely descriptive analytics that allowed management to take informed decisive action. As a result of the analysis, the business was able to save hundreds of thousands of dollars. If our analyst had decided instead to move higher up the ‘value chain’ to pursue a machine learning model, they may have wasted months without generating business value while the customer problem got worse!
In summary, we all recognize the tremendous value produced by machine learning methods in the last two decades. But let’s not forget where the data science process starts: the right business question. Let’s choose our tools and techniques in the same way that a skilled surgeon selects her instruments. Sometimes, it is the remotely controlled robotic arm that gets the job done. At other times, it is just a simple scalpel and sutures that saves the patient’s life.