Despite all the talks about big data and data science, it seems that at many companies, we are still in the era of data monkey. Hours are wasted copy/pasting cells from tools to Excel and making manual reports. Few insights, if any, are generated. “Most of my time is spent churning out a bunch of fairly standard reports on a monthly/quarterly basis.” – The Data Analyst guys complains
This is not business intelligence and it is not creating valuable insights. This is wasting time, plain and simple.
Let’s go back to the roots of the problems :
- The data is in different systems. Hence the copy/paste to have all the data in one single location. You can’t have a proper view of your business in either Google Analytics, Mixpanel or your accounting software. You need all of them.
- The data need to be processed for visualization. For instance, a webserver log is not valuable in itself. Neither is a detailed list of sales. But the conversion rate from web visit to sales by acquisition channel (derived from this data source) is something very valuable. Hence the manual report and the copy/paste madness.
Let’s see how those problems can be handled.
Solution 1 : the IT dirty fix
Usually, the solution is to throw the problem to the IT department (or more precisely the definition of the data you want). Some hours/days/weeks later, they provide a way to get the data through a query (which gives you a CSV file, or an Excel/Google Sheet). Still a bit of manual labor but the data is already in good shape.
Days later, a new issue arises, a new query is the solution.
Soon after, again, a new problem shows up. It is close to be solved by the first two queries, but not exactly. You still need IT to create a new one for this specific problem. And maintain it. Ouch !
The downside of this solution is that soon you end up with 10’s of queries that provide what should be comparable measures (let’s say revenues) but they don’t match because they weren’t designed for the same issue or by the same person. Everyone start to forget how the results are produced, why there is some filters in this particular query, …
After some level of complexity, you start to see queries that are built above the previous queries that no one want to touch anymore. It doesn’t feel right.
Should you avoid such solution? Probably not. It’s cheap at the beginning and allows you to move fast. But when the burden is too heavy to carry you need to clean this data debt. The logical way is to go for solution 2.
Solution 2 : the IT data warehouse
Data Warehousing is an old concept dating from early 1980 at least. The idea is to get all the corporate data in one place, modeled with care and easy to query. Clean and ordered. Sounds too good to be true? Well it is.
The main issue is that up to now, those data warehouse are IT domain and business users have few powers over it. Every change takes months and direct access is usually forbidden (for safety reasons). It is sad that so much power stay sleeping.
In 2000, the data warehouse was depicted by the number 3 : 3 years, 3 million, 3 reports.
In the search of perfection, the corporate data warehouse goes too far the opposite direction of solution 1. It tries to define business reality as an immutable thing (good luck in this fast-paced environment). It needs corporate wide consensus (where the middle ground satisfies no one). It assumes data governance (probably a synonym of political games or IT despotism). In short, it’s too ambitious to be achieved.
This is why after spending millions dollars of budget, people still end up using Excel instead of the onerous Business Intelligence tools : “export to excel is the third most common button in data and business intelligence app … after OK and Cancel”. What is the value of a BI tool that is just a way to export Excel files?
Solution 3 : Did I told you about Excel with Super Powers ?
Technology is getting better and better. On the hardware side you have more power on your laptop that most data warehouses two decade ago. On the software side, believe it or not, Excel contains a good part of what run data warehouses (and not two decade ago … right now, SQL Server Analysis Services). Check out my article to learn about Excel super powers.
Other tools are available with Tableau, Looker, … on one side (trying to abstract anything technical) and Jupyter, Metabase, Mode on the other side (staying technical but still remote from software development).
It’s now easy to manipulate huge amount of data and to connect to data sources. And by easy, I mean people outside of the IT department with a slim computer background make it happen it everyday.
Is it easy? Heck no, you need a real data sensibility. Some people have it, some don’t. But you no longer need a computer degree to make Business Intelligence. All you need is curiosity and business understanding.
Still, two gap remains:
- Data can be dirty in the operational databases (if ever IT gives you access to it).
- Data is more and more scattered around many databases (structured and unstructured) and across web services.
The technical complexity is (somewhat) easy to manage, but the data complexity is here stay. Remember the awkward excel database you maintain with great pain (15 sheets, 1000’s of formulas, 10’s of bugs)? Well most databases are like that.
That’s why you still need a data warehouse, to remove this complexity.
Solution 4 : the business data warehouse
Even the most brightest Excel and Power BI users recognize that adding a database for data shapping is invaluable. But you don’t want a monolith data warehouse, you want agility. My views on the subject are best synthetized in my Data Warehouse Manifesto where I introduce the module-based data warehouse.
You can rely on SQL, the language of databases. Here again, I can quote Excel experts saying that it is “easy and valuable to learn some simple SQL code”.
SQL is widely used for 30 years ago. You learn the basics and use contractors for more complex parts. That way, you will have a common language.
With a rock solid (but always evolving) data warehouse you command and empowering tools on top (like Data Brewery), expect to produce great results.
Conclusion: There is a choice to be made
Those previous solution can be divided by a profound choice you have to make. Who handle data complexity for insight generation? You can answer IT, in which case solution 1 and 2 are for you. You can also answer Business and focus on solution 3 and 4. Business Intelligence is neither IT, neither Business. It’s a blend. And someone need to take care of it for your company to thrive.
My take is obviously toward the business side. I’m a techy data guy and I haven’t work in IT departments for 10 years now (but still enjoy working with them). I have seen firsthand what is possible when the people in pain takes care of their destiny. Less technical focus and more business value.
What is your take on this? Let me know.
Let's stay in touch with the newsletter