At the recent Cloud Expo event in London, one of the most interesting themes was that of Big Data and the Cloud.
While it is clear that Cloud-based consumer applications are some of the biggest drivers (creators) of Big Data, organisations’ propensity to perform their Big Data analysis in the Cloud is less certain. There are several reasons for this, including:
1) lack of experience amongst users generally of addressing and solving Big Data problems – using cloud-based solutions introduces yet another unknown
2) likely analysis requirements combines data held in-house (eg customer data) or in disparate cloud-located systems which need to be combined with each other and public cloud data (eg social network data) – it may be easier and more controllable to bring what’s needed in-house for processing
3) the big analytics technology vendors, from whom users are used to taking their solutions ideas, are primarily promoting on-premise solutions: these include IBM, Microsoft, Oracle, SAP, Teradata…
4) The IaaS vendors’ charging models are not still yet ready for Big Data – they are still evolving. So while the scalability of cloud is a key “plus” in Big Data analysis, the pricing may still not be cost-effective.
This last factor is one that is changing rapidly and is likely to affect users’ readiness to use and build cloud-based solutions. We have recently seen announcements from Amazon (Redshift) and others in this regard, reducing prices and changing the offer. On the price side, we are seeing something of a Dutch auction at play.
But an interesting presentation at the event by CloudSigma’s COO, Bernino Lind, pointed out that a Big Data solution includes a number of different workloads – some cpu-intensive, but others not; he argued that pricing should reflect that reality. As CloudSigma’s clients for its Big Data IaaS service include the CERN physics research laboratory, amongst others, his views carry weight. CloudSigma offers CPU, RAM, data storage and bandwidth as separate items in its pricing “menu.” Lind also pointed out that sharing HDD on a public cloud service led to bottlenecks, and argued that SSD was a better bet for many workloads and should also be available as an option – a theme picked up by other speakers such as Glyn Bowden of Lucr.
In PAC’s view, the use of Cloud services for Big Data will increase significantly in the coming couple of years. In the early years, what is likely to change this is the increasing maturity and flexibility of pricing models as advocated by Lind.
Beyond that, we think that the real catalyst will be a move to higher value services based on Cloud. Not just managed services to remove the need for Hadoop/etc admin knowledge – though this will be important – but true value-add: moving from Big Data IaaS to Big Data platform as a service (PaaS), where tools are linked to pre-analysed data sets for particular problem sets, and even Big Data business process as a service (BaaS) where analytic services are added in.
At the moment this service-based approach is very nascent. We believe that many new and innovative offerings will emerge over the coming years to drive and the market. For the growth of Big Data services-oriented solutions will make it possible for a much wider set of organizations – from mid-market upwards – to take advantage of Big Data technologies.