How did Apache Cassandra start?
Cassandra was developed at Facebook as a way to handle the huge volumes of data that the company was creating as part of its messaging services. In 2008, very few companies had to deal with that challenge of scale, so they had to develop a new way to manage all that data. Following that development phase, the Facebook team decided to open source Cassandra and it was picked up by the Apache Software Foundation.
At the same time, Jonathan Ellis discovered that Cassandra was a great fit to scale the data processing for Rackspace’s thousands of customers, and for the still-young but increasingly important space of native cloud applications. It was a big change in thinking, and that new approach was perfect for Cassandra. He took on the lead role for the Apache Cassandra™ Project team for the next seven years.
Following this, he formed DataStax as a commercial company to support organisations that wanted to deploy applications based on Cassandra. Since then, the company has developed and now has more than 400 customers worldwide that rely on DataStax for an enterprise-ready version of Cassandra.
How has DataStax supported Apache Cassandra between then and now? Any changes?
Jonathan was instrumental in the beginning. He helped set up and run how Cassandra was supported and developed at the start. He handed the project chair role over a couple of years ago when we felt the community could take over more of the direction and management of the open source development side. Since then, more developers and committers have joined. Companies like Uber and Apple have provided their own improvements and development work back to the community.
DataStax continues to support Cassandra – around 85 percent of the commits on the code base come from developers working for DataStax. We want Cassandra to continue as a successful open source project that meets the needs of developers.
How do you see Cassandra fitting into modern applications and services?
A: I joined DataStax three years ago, and I have seen how companies have had to change their approach towards data during that time. It’s not just about big data now – it’s about using that data right when someone is making a decision around what they need. This could be a shopping purchase, a content recommendation, a public safety decision… anything that has to be done in the moment. Cassandra is a perfect fit for that, and that’s why so many Internet scale companies choose Cassandra for their applications.
However, there’s a much wider trend taking place around data too – multi-model. There are other new technologies like real-time analytics and graph analytics that need their own approaches. Companies don’t want to run in siloes, where they have to complete multiple changes around their data in order to analyse it and use it effectively.
DataStax Enterprise (DSE) provides a multi-model approach where you can process and analyse data in multiple ways based on the requirements within that application and how that can suit customers better. Graph is better for showing up relationships between items within your data – for example, if there are similarities between customers and their purchases. Real-time analytics is essential so you can make decisions around those transactions as they are taking place, rather than after the fact. Search is essential and has to be included so you can query and interrogate that data over time.
All of these technologies help turn data from something you store and look at later into an asset that you can use in the moment. This ability to run in real-time makes a huge difference to digital transformation initiatives, and multi-model data management helps make that happen.
What’s the future for Cassandra, and for cloud?
The world is turning towards the cloud. There’s been a push towards public cloud services due to the flexibility and speed that they support, but there’s also been a recognition that these services come with costs. Firstly, cloud services are not necessarily cheaper when you factor in all the costs involved. Secondly, there’s an element of lock-in that can occur when you start building your applications and services on multiple tools from a single cloud provider. Thirdly, as every enterprise moves to the cloud, they end up being in a Hybrid architecture which has some very important implications, particularly with respect to the data.
Platforms like Cassandra and DSE can help here. You can run Cassandra across multiple clouds – so AWS and Azure, or Azure and Google Cloud Platform – and across private and public cloud services. This latter approach is what we see enterprises adopting.
Enterprises want the flexibility and expansion capabilities that public clouds provide, but they also want to own their destinies when it comes to their data. They see the potential costs that can come with migrating away from, or between, public clouds, and they don’t want to fall into that trap. They have also seen the hidden costs that come around cloud migration – where you have to change or rewrite apps in order to move them from one cloud to another, for example. Using Cassandra means that you can avoid this cost, rather than being locked to a specific public cloud provider.
For enterprises that want to run hybrid cloud, and want help supporting those applications, DataStax supports production deployments of Cassandra using DSE. This approach makes sense to enterprises that want to run at scale and manage their own data, rather than being too reliant on another enterprise that might compete with them in the future.