The amount of data in the world is doubling every 2 years, but we are still at the very beginning of a huge explosion in information. Research firm ICD forecast that by 2020, 40 zettabytes of data will have been created, equivalent to a stack of DVDs reaching halfway to Mars. That’s more than 10 times the amount of data created in 2012 and 250 times that created in 2006. We are entering a new age of ‘big data’ – data on an unprecedented scale – that will change the technology we build and use.
We now create digital content whenever we log into our computers or turn on our smartphones, from the pictures we take and the text we type, to the recordings of our voice as we talk. Much of it is transient and deleted almost immediately, but a large amount is stored on computers round the world. It’s not just the data we generate about ourselves that is proving interesting, but data that is generated about us. Every time we browse the web or use our phone on the move, computers log information about what we do and where we are when we do it. Understanding this data will let us uncover patterns in behavior that we couldn’t see before because we simply didn’t have enough data.
The world’s largest technical companies have been quick to understand the potential of big data. Engineers at Google noticed early on that lots of people searched for the same thing, but there was no obvious way to know what people would search for in the future. They hit upon the idea of predicting what people were going to type based on all the data they’d ever collected in the past about searches and where they were done. Their unnervingly accurate auto-complete, launched in 2004, uses millions of past queries to guess what you’re searching for as you type, based on where you are, and relevant web pages are shown straight away. We’ve become accustomed to this level of prediction, but it was unimaginable a decade ago, and only works now because of the huge amount of data that Google has collected.
Outside the realm of technology, big data is already proving valuable in other scientific fields. In the past, scientists have had to carefully design their experiments and sift through the result by hand. With the advent of big data, scientists will be able to automatically perform many more experiments and let the computer discover the patterns. This is already being put into practice at the Large Hadron Collider, which is best known for its particle physics experiments. What is less well known is that these experiments generate huge amounts of data. Only 1% of information from the collider’s sensors is stored, but this amounts to 25 million gigabytes of data a year, which is accessed by teams of physicists all over the world searching for clues to help understand the beginning of our universe.
There are challenges to be overcome before we can truly make the most of big data. In the past, data sets were small and so could be examined by hand, labeled and tidied up. With vast amounts of data this just isn’t possible, and so computers of the future will need to be able to handle messy, unstructured data that may contain lots of mistakes. And, despite the huge amounts of data being created today, less than 1% is currently analysed. New ways of storing and accessing this data need to be invented if we are to take full advantage. We can no longer use single computers as we have done in the past, so data must be split across many machines and shared among lots of people. This has driven the recent move towards cloud computing, where data is stored on hard drives in different locations and can be accessed from all over the world. But more still needs to be done to handle the ever increasing amounts of information we are creating.
As well as logistical issues, there are privacy, ethical and security concerns to be overcome. In a world where so much information about us is stored, data protection laws will become even more important. Yet, advances in the past decade have opened our eyes to the potential of big data, and the next decade will see innovations that we haven’t yet begun to imagine.