The changing data landscape: The future is little data

Published on January 22, 2015

The fact that we generate more data today than we ever have has become such a truism that it’s hard to appreciate the sheer magnitude of what's happening (the buzz word “Big Data” certainly doesn’t help). To put it in perspective, in 2002, all of the print, film, magnetic, and optical storage media produced was estimated to be 5 exabytes for the year. By 2020, the digital universe is estimated to reach 40,000 exabytes, or more than 5,200 gigabytes for every man, woman, and child in 2020. That’s an 8,000-fold increase in less than 20 years. As we have gone from information scarcity to abundance, data has become a mantra for businesses – but more for their own use than to benefit their customers. Most of it is being used by companies to gain knowledge – and the knowledge they are seeking is the unusual, not the obvious. Supermarkets already know that hot days will stimulate sales of ice cream. There’s no need to invest in expensive data scientists for that – what they are looking for are the little nuggets that shift the needle a little bit. Spotting that buyers of Call of Duty also buy adult diapers as Amazon’s algorithm did (draw your own conclusions) is the sort of thing that might just get you a bump in sales. An incremental change in behavior can translate to a lot of money in the aggregate, but it’s also just as likely to lead to false positives (aka “spurious correlations”). What gets less attention, yet might very well be more valuable, is what’s in it for us as consumers. Little Data is the data we know about ourselves, the data that’s useful to us, especially if it’s relevant to the time, place and activity we’re focused on at any particular time. The technologies that give us this insight remove barriers, helping us to live our lives more mindfully. This in turn leads us to gravitate to services that offer such data – a win for their creators, but also for us. In the future, we can expect to see the Little Data continuing to grow in importance. There are three hallmarks of this type of data thinking: 1. Data streams that can be combined with others are exponentially more valuable than data that sits in isolation– especially to the individual. Take for example London’s transport authority – TfL – which sits on masses of real-time data on the movements of trains and buses. By opening their APIs to apps that overlay the live transport information with location, mapping and other data, they’ve allowed for a plethora of new ways for Londoners to find the best and fastest way to get where they need to be. 2. We spend our days making decisions – from what to wear in the morning to whether we should refinance that mortgage. Naturally, making decisions can get exhausting. Just as businesses use data for decision management, we consumers are also increasingly drawn to services that help us make decisions. Think about the last time you did a search on Google – did it autosuggest the full sentence you were thinking in your head? And do you get annoyed when you don’t get autosuggestions online? That’s the web getting smarter about guessing your intent, and completing it for you. Machine learning and predictive analytics are behind some of the most compelling online brands online – from Netflix to Amazon to Google. One wonders if they are really just data companies masquerading as video, retail and search companies… 3. We can now collect our own data streams from Nest units, FitBits, and personal finance apps, learning how to improve and simplify our lives using the information they provide. This is only going to increase and improve – by 2020, the Internet of Things will bring us data from a projected 50 billion items, giving us even more ways to control and know our lives. To some, all this might sound like the machines are taking over, leading us to a dystopian future in which we’ll surrender our serendipity, or worse. To others, a future in which we can harness our own data looks bright – and far better than one in which companies keep our data to themselves. This article was first published on Wired.