The amounts of data we are accumulating are growing exponentially with each day that goes by and the “Big” in “Big Data” is slowly getting out of hand to help us find the information we need. That’s why we at Wizzie believe in Smart Data as the next step in the Big Data evolution and are working towards it.
What is Big Data? Well, today almost everyone knows that: it is vast amounts of data, either structured or unstructured, from different data sources. This data can come from databases, logs, social networks, hardware sensors… almost from everywhere. IoT devices, for example, are a tremendous data source nowadays.
It is also a kind of buzzword for progress and for the solution of many problems, which may be buried inside the tons of data that companies have accumulated over the past years and still do today. But Big Data isn’t enough anymore. There is, simply put, too much of it. And we need a solution.
In the beginning there was… the problem
One of the most common errors a company can make is to try and use a Big Data System for Big Data’s sake. Because it’s modern. And because “somewhere in there must be information we can use”. That is not how things work. You don’t buy a car because the technology is great. You buy a car because you want to go somewhere. Using technology for technology’s sake will inevitably lead to problems, unnecessary expenses and wasted time.
Therefore, the right place to start any Big Data initiative is with a problem or a need. Your company needs to solve something or to improve a process, get an insight into customer behaviour, analyze production data, check on customer service requests… whatever it is, this is the point to start.
Once you’ve defined the problem, the next step is to find out if it is actually a Big Data problem. Many times things can be solved or found out with other, more traditional means. But, if they can’t you may consider using some Big Data technology. This is, digging through tons of different data types, looking for the ones you need to get an answer for the questions you have. And here is the thing: not all of your data are relevant for a particular purpose.
Why? Well, it’s called “Big Data”, and that may give you a clue. You may get your information from dozens of sources: sensors, data bases, online sources, bar readers, video or audio feeds, streams… but not all of them are relevant to the thing you need to now. That’s when your data has to become smarter, not just bigger, in order to serve the purpose you are looking for.
V, V, V, V, and…oh yes, V
You might have heard of them, the 5 V’s. They are often mentioned when talking about Big Data and represent Volume, Velocity, Variety, Veracity and Value. Sometimes the “value” part is left out, but for Smart Data it’s actually the most important “v”, because we are looking for relevant data, in order to slim down our available data pool (or lake), to only those components which are relevant to us right now, for the current question or purpose.
And here is the first pitfall: don’t try and look for a confirmation of what you suspect or want to be true in your data. Don’t forget the “V” for Veracity. It’s about the truth buried in the data, not conforming the data to your truth. Let the data speak for itself. And, for this, you actually need it to be smart.
Making Data Smart
In a sense, Smart Data is a subset of Big Data. Actually it is the subset of the information we need for the problem at hand. Without all the noise we don’t need or care for right now. And with the right context. In our Wizzie Data Platform, for instance, for tasks fall mainly on the Enricher and the Correlator. They pull data from different sources and build the best dataset possible, for the current task.
However, Smart Data not always has the answer we are looking for. On certain occasions it is just an intermediate step or helps us to break a large problem into smaller chunks, which are easier to process. Or it puts us on track to the right answer.
Why do we actually need the data to be smart and not just big? Well, it’s a question of size and scale. In today’s world the sheer volume of the Big Data sets or streams can be overwhelming, and we need to take that volume a notch down, to something more useful and practical. We are in what some call an “Algorithm Economy” (I’ll write about this topic in a future post) and with tons of unstructured or irrelevant data flying around in our data pipelines, we need to make sense of it all, applying rules, AI, ML or whatever is needed to “tame the beast” so to speak.
But, you may argue, then it is a mere question of size. No. Not exactly. It’s a question of getting data we can act upon. The right data. And with the ever growing amount of data, the “making it smart” part is a must.
To get your actionable data, the first thing you need to know is what do you want to capture, why and what do you want to do with it. And there is a process to all of this:
- From all your available data sources you need to select those that have the promise to contain an answer to the question you are going to formulate. Remember, without a question (the “problem”) Big Data just doesn’t make much sense.
- Combine those data sources that make sense to enrich the information, to better suite the goal. It’s handy to have a Data Scientist for this step.
- Input the resulting data either into your Analytics module or system or your visualization system.
- Align the results with your business processes. If they don’t fit, it may not be the data’s fault, but the processes’. Don’t be afraid to change those, if needed.
Avoid the Data Swamp
Many companies, in the wake of the Big Data omnipresence, have resorted to just store all the data they can, in the hope it will some day be useful for some as of yet unknown purpose. And that is a really fast track to convert your Data Lake into a pretty messy Data Swamp in no time.
Unless you are Facebook, who keeps everything, there is no need to store it all, just in case at some point in the future someone comes up with a new way of looking at the data in order to extract something super-valuable. For companies with limited resources, it is much better to store only the Smart Data they have put together with their own rules, so it is much more oriented towards the company’s business goals.
Although storage is relatively cheap, keeping data just because we can, is usually not a good strategy. On the other hand, having to think about what really matters to us from all the data we’ve gathered, is a good exercise to examine our goals, and our means to get to them.
Smart Data gets even smarter if smart people decide what “smart” actually means for them and their business.
What about Machine Learning?
Yes, we know. Finding a good Data Scientist is harder than finding the proverbial pink elephant (fortunately we’ve got lots of yellow elephants in the Big Data world). Anyhow, this difficulty is the reason why so many companies resort to Machine Learning (ML) instead. These AI based systems are able to find patterns, trends and the like in huge datasets and, over time, get better and better at it. With a little training they are faster, more accurate and more efficient than humans. In the context of Smart Data, they can be used as “superfilters” on your data, in order to “keep the best and trash the rest“.
We don’t have to forget that all things AI related are still mostly work in progress, but with appropriate guidance, the Machine Learning Systems can work following the guidelines, needs and interests of a specific company, to provide only those datasets that are relevant to the problem at hand.
We at Wizzie, for example, are working hard to improve our Enricher and our Correlator to provide a solid foundation to make your data as smart as possible, in order to get the maximum value out of it. Because we know what a big asset company data is.