The plummeting cost of storage combined with rocketing quantities of available data mean that companies can now keep hold of vast amounts of information in the hope of one day turning it into useful business intelligence. This opens up new opportunities for data analytics, enabling lines of business to increase revenues and open up new business opportunities.
For example, a recent report by consultancy firm McKinsey & Company estimated that retailers embracing big data initiatives could increase operating margins by more than 60 percent. Businesses with substantial data reserves can use them to unlock competitive advantage.
A big caveat here is that these reserves can also contain information that could cause damage in the event of an accidental leak or intentional hack. That damage might come in the form of brand damage from media headlines or monetary damage from potential regulatory fines. When you consider that the most well-known big data platforms – Hadoop, MongoDB, Cassandra and CouchDB – don’t provide significant security features, businesses must take it upon themselves to implement the appropriate security measures if they wish to avoid data breach incidents or running afoul of data protection laws.
Defining big data
While there are various definitions of big data, the industry is coalescing around a definition that focuses on attributes rather than specific technologies. The attributes include: volume (large quantities of data), velocity (data that changes quickly) and variety (a range of types of data from different sources, from transactions to sensor data to web analytics).
The combination of these attributes enables enterprises to use the insights gleaned from big data implementations to offer new services, operate more efficiently and bring competitive advantage.
Big data generates considerable discussion around whether it is a matter of scale or technology. To some, big data is a matter of scale with larger volumes of data. Others would consider big data to be a technology discussion around NoSQL platforms like the Apache Hadoop platform, MongoDB, CouchDB, and Cassandra.
While there might be an argument by technologists over whether big data equals SQL or NoSQL technology, from a security perspective the challenges are the same. No matter what implementation is in use, SQL database or NoSQL environment, you need to secure the data, control access to the data, and report on that access.
Big data security challenges
The benefits of big data projects have been well documented, but pose some taxing security and compliance challenges:
– Ensuring the protection of big data repositories which can contain sensitive and confidential information
– Meeting compliance requirements for certain data within a big data environment
– Ensuring that response times are not impacted when deploying security solutions to protect the data
– Maintaining the security and control of data as it moves in private and public clouds
– Controlling data access so root administrators can do their jobs but not see sensitive data
– Minimising administrative costs in securing big data and avoiding the creation of another "security silo" with the resulting increase administrative costs and overhead
Knowing where the sensitive bits of data reside is difficult. Data discovery and classification is an ongoing challenge in both the SQL and NoSQL worlds. The volumes of data big data environments make it highly probably that you have a sprinkling of sensitive data in the environment and will need to secure the entire repository to avoid a breach that can cause brand damage or fines from regulatory authorities.
In the SQL database realm, the security ecosystem is well established and you have a mature stable of tools that you can choose from – database activity monitoring (DAM) inside of the database, encryption for data at rest, security information and event management (SIEM) to gather log data and glean security intelligence, etc. No matter what your SQL database might be, from DB2 to Oracle to SQL Server to MySQL, there is a fairly well-developed set of tools to choose from, and some well-established best practices that administrators, security practitioners and auditors are comfortable using.
On the other hand, security in the NoSQL world is still in its infancy with many deployments having little security beyond rudimentary password protection. The security ecosystem is sparse and many tools are not optimised for the distributed nature of NoSQL data repositories. Enterprises are rushing to take advantage of low cost big data compute clusters for data analysis, but such provide little security beyond network and perimeter protection.
The Big opportunity
We are awash in data. Indeed, Forrester estimates that we generate 2.5 quintillion bytes of data every day, with 90% of the data in the world having been created in the last two years. While many enterprises are accelerating their big data deployments – awed by the promise of driving efficiencies as well as gaining valuable business insights – utilising these platforms without carefully considering technical security issues involved can needlessly put an organisation’s reputation on the line. Security teams need to get ahead of the game by engaging with lines of business and big data architects to ensure security considerations are understood.
Broadly, the best way to address the security challenge posed by big data deployments is to place strong controls around the data itself, independent of file format, size or location. Creating a data firewall of sorts ensures that the information in big data platforms is protected against unauthorised disclosure, without having to modify applications or re-architect the storage infrastructure.
—————
Paul Ayers, VP EMEA, Vormetric. Ayers leads Vormetric’s channel programme in EMEA. He was previously sales director for PGP Europe and senior sales director for Northern Europe for PGP Corporation until its acquisition by Symantec. Vormetric (@Vormetric) is the leader in enterprise encryption. www.vormetric.com.