Data is power
Conferences and vendor marketing materials are full of trite and banal sayings. Say something that seems to be profound, and perhaps they’ll think that everything else you have to say is just as profound. One of the common refrains you might hear at many an AI and data-focused event is the pithy statement that “data is the new oil” as if that’s supposed to mean something profound. The first time I heard this expression (about a decade ago, I should add), it was an interesting point to make about how “important” and “strategic” data is. But every time I’ve heard it since, it’s bandied about to imply something more than it is. Yes, we get it, data is important. But oil, really?
Comparing a dwindling, dirty resource we’re increasingly turning away from, to an increasingly valuable resource that seems to generate itself to almost infinite quantities doesn’t make sense. We want more data. We want less oil. We want to get more value from existing data, and we want to go away from any oil-derived value. Honestly, it’s a terrible expression. Can we honestly, finally, please kill it? More importantly, we’re missing the forest for the trees here — data has a huge area of untapped value that is uniquely enabled by AI, and we’re missing that by focusing on the bottom of the value pyramid of data.
Focusing on data, losing focus on value
The expression Data is the New Oil originates from as early as 2006 when Clive Humby from Tesco in the UK said that “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” Since then, this perhaps throw-away analogy was latched onto by increasingly more people with louder bullhorns, including Ginny Rometty, IBM CEO, and Peter Sondegaard, SVP of Gartner. The primary objective of the analogy is from a few perspectives. First, like oil, data has strategic value in that who owns it controls a lot of resources. Second, the idea is that data by itself has limited value much like oil must be refined to capture value. Third, the idea is that oil has commodity value that can be exchanged for valuable goods and money in much the same way that data can be exchanged, traded, and dealt with as a commodity with inherent value. Those points are good ones. But you can make the same points about any natural resource, including solar energy.
But the value of the analogy stops there. Oil is a natural resource that decreases in quantity and availability increasingly over time — this is why it’s called a fossil fuel. It’s made of dead things. Oil is dirty, requires huge amounts of effort to extract from the earth, refine, and transport around the globe. It’s stored and sits idle, often for long periods of time. And once it’s used, it’s gone. Oil wealth is concentrated in a few nations with spotty political and social behaviors, and is the subject of wars and international disputes. We’re trying to free ourselves of the oil economy.
Data on the other hand, is for all intents and purposes limitless in its volume, quantity, and availability. It grows even when you don’t want it to grow. Go to sleep with 1 Terabyte of data and wake up with 2 Terabytes of data. Data is easy to generate and cheap to transport. Data can be reused, repurposed, and new insights can continue to be gleaned from old data. In the information economy, data is the byproduct of our advancement. Data is both the inputs and the outputs of our technology organism. Like oil, data can be dirty, but unlike oil, it can be cleansed with more data.
The real untapped resource: Unstructured Data
The only positive aspect of the analogy we can leverage is the idea that a raw resource should be refined to extract more value. Defined simply, structured data is information stored in data stores that maintain some schema structure of the data, with defined types and often relationships between data. Unstructured data is often without any structure context such as images, video, emails, documents, text files, and many other sources of data. By all accounts, most of the data that companies collect is unstructured data. For most organizations, over 80% of their data is unstructured. For some organizations, it can be closer to 90% of their total data. Some call this unstructured data “dark data” because much of the value remains to be extracted. The true value therefore lies in extracting not only the value of structured information, but also that of unstructured data. The oil barons of the information economy are those who are doing it best – the Googles, Facebooks, Amazons, Microsofts, and their ilk of the world.
Artificial intelligence and machine learning are very data hungry. Huge amounts of data is needed to train AI models. The challenge comes from turning big data assets into valuable machine learning training models. Just like the concentration of wealth in the oil economy was with a few oligarchs and state-run organizations, so too are we starting to see concentration of big data-powered AI with a few large companies that have amassed tremendous quantities of big data. But unlike oil, big data-powered AI is available to any organization that can create a strategy for collecting quantities of unstructured and structured data suitable for machine learning and appropriately refining that data to extract increasingly more value from it over time. Doing this right is part of what will power companies to the next evolution in their digital transformation strategies.
Forward thinking organizations are becoming “AI First”, and to be AI first, you need to be data first. But don’t have the backwards perspective that data is the new oil, because it’s not. Data is an almost limitless resource, but it’s up to you to extract the value and promise of what it can become. Data is potential, and it’s up to you to realize its full potential, especially in the context of AI.