In February 2016, a press conference revealing the discovery of a gravitational wave was streamed live into our Edinburgh office to a small group of, frankly, quite emotional people. Predicted by Einstein over 100 years ago, this discovery was the culmination of a 25 year international project which included many of the Scottish Universities.
As you’d expect of a big data analytics company, we have our fair share of physicists, engineers and mathematicians at Aquila Insight. So we pay attention when science confirms a significant breakthrough. After all, it’s hard to beat an accurate prediction made across space-time from a distance of one century ago.
In our day-to-day roles, we analysts are used to dealing with large data sets comprised of billions of records and terabytes of data. With scalable resources, we can usually turn this data round quickly to give actionable business insights for our clients. But storing, processing and analysing the interactions that people make with our clients every day pales into insignificance compared to sheer volume of data that the Square Kilometre Array (SKA) will produce.
The SKA is an advanced radio telescope, designed to survey our sky and the universe beyond, much more quickly than anything humankind has ever built before. It is planned to site the telescope in South Africa and Western Australia, where millions of receivers will be grouped into stations each generating between 10 and 30 terabytes of data per second.
When complete and at operating at full capacity, the stations will generate up to 7.5 petabytes per second. The aggregate data rate into SKA’s central processor is estimated to be ten petabytes per second. I’ll say that again slowly … ten petabytes per second. That is simply staggering by today’s standards.
To put those numbers into perspective – just one petabyte of storage on a smartphone, would hold the DNA sequences of 750 million different people, around one-ninth of the world’s population. Or you’d be able to listen to your music continuously without repetition for 2,000 years. That’s enough storage even for some early Genesis tracks.
Can you imagine what processing ten times that amount of data, each and every second, would involve?
Impressive yes, but what does this mean for the future of big data analytics and business? First of all we have to recognise that we are talking about ‘internet scale’ volumes of data here. Currently the only companies dealing such quantities of data are the likes of Facebook and Google. Most businesses will not have data sets of this size, even taking into account the exponential increase due to the rise and rise of mobile. But these projects do set an expectation of what should be possible in terms of scale in future.
It also raises the bar for companies anticipating the size their streams of data may well reach. Already, anyone working in commercial analytics knows that the amount of data currently being produced is fast becoming large enough to break current legacy systems based on traditional technology. In order to cope with this it’s no good looking backwards. Bigger machines and bigger pipes may help but are not a solution. The rate at which we produce data isn’t going to slow down, and this could mean increasing outages as systems break.
We need to look at the leaps made by such scientific projects as SKA, how they manage the throughput, their use of distributed computing and cloud servers, and have the vision to do things as quickly. These people are pioneers and have provided us with the patterns and the packages, it’s up to us now to take these and apply them.
Dr Jonathan Forbes, Chief Architect Engineering & Analytics, Aquila Insight