The penalties for getting cloud wrong can be high. Forums and conversations with end users are regularly peppered with horror stories of cloud instances left spinning, racking up bombshell bills. Managing cloud spend is a fine art, as many are learning.
Deloitte notes that companies worldwide spent $34.6bn on cloud services in Q2; an 11% rise on-quarter. It is an increasingly chunky part of a CIO’s budget. And cost-control is often sub-optimal, with significant savings possible without impacting performance.
The cloud is horrible. Just discovered we accidentally left a GPU instance running that billed us $3000. https://t.co/sZ2SphRLSw
— Brad Dwyer {?} (@braddwyer) September 9, 2020
Selecting the right instance type is important. Instance families that suit your application workload are essential to keeping costs down. These will be a combination of general-purpose instances, to compute optimised instances that allow for more computational power, to things like spot instances.
Maksym Schipka, CTO of oil data company Vortexa told Tech Monitor: “Looking at people’s cloud infrastructure, one of the things that people often do is use the cloud providers in a very simplistic way. If people want to run a SQL database, for example, they often spin-up a [virtual] machine, then they install a database on that machine and use that database, on that machine, while every single cloud provider, everyone, provides you with already managed PostgreSQL for example, which in many cases may end up being cheaper.”
Simon Timms, a principal architect at cloud services company Inventive Works, LLC agrees: “People tend to get locked into virtual machines, which are fabulous tools, but they’re a pretty big hammer to bring to the cloud party. So you might want to look at doing things like containers, which can be short-lived and live on Kubernetes and give you a much finer-grain control over the scalability of your application.
“Serverless applications are also a fantastic approach. If your application runs very few requests, then you can run on Serverless for next to nothing, then if your application become very loaded, they scale up really nicely and very quickly.”
Are you building something on the cloud that’s already built?
Building things that already exist on the cloud is another common mistake, Timms says: “I’d much rather people focus on their company’s differentiators, rather than spending a lot of time and effort building something out like email, which everybody has, where there are cheap services that allow you to do that.”
Identity and file storage are a case in point: “People used to run their own Active Directory clusters inside of their organisations, and now it is just as cheap and easy to position those in the cloud,” he added.
“The bonus is that because there’s a provision to the cloud, it allows you to hook in all sorts of other identity providers and users to authenticate against services, while also providing things like single sign-on. It’s probably not your company’s core competency, so it makes a lot of sense to outsource that one.”
Want to keep cloud costs down? Check capacity closely…
“People always overestimate how much space they need for safety,” said Schipka.
“They just request much bigger resources from the cloud provider than they actually need, just to be on the safe side.
“Quite often, when people adjust at the beginning of their cloud journey, they’re keen to prove the business’s value, rather than optimise costs. They try to scale up or scale out their infrastructure too quickly.”
And now we come to up-scaling, or more specifically, the failure to remember to scale down after an up-scale has taken place. This is where most horror stories come from, where companies lose thousands of pounds in a matter of hours.
Timms explained: “The thing with cloud costs is they spiral out of control really easily. It’s very easy to click that button to scale up or to provision new instances. So that’s something you need to keep in mind when you’re setting up new stuff.”
Schipka reiterated: “There are fantastic services available with Azure. And those services are very easy to switch on and to forget to switch off. I’ve seen situations in companies where, to give you an example, in AWS, a set of instances by SageMaker, which is a top-notch tool for data scientists to explore big data, was spun out and people, just by mistake, totally forgot to spin them down. They only discovered it in something like two weeks’ time where it was already several tens of thousands of pounds on the bill, all from an idle insistence doing nothing.”
Much like security failings often start with simple human forgetfulness, such as failing to kill off credentials on a VPN for someone who has left the company, the same thing can apply with cloud services: a lot of time people will put in place rules that will scale up services so that they can handle an increased load.
Schipka says: “That’s fantastic for something like a Black Friday sale or Christmas sale or something like that, where all of a sudden you have a lot of load on your site. You have the much-needed ability to scale up to hundreds of instances.
“However, Remembering to rapidly scale that back down again is almost as important as being able to scale up in the first place.”
Get this right and the savings can be substantial.
As Jarod Greene, general manager, TBM Council, notes: “As a variable spend, the impact of making proper adjustments to cloud expenditure can be realised almost instantly. With appropriate management and optimisation, around 25% to 30% of this wasted cloud spend can be trimmed back, which can make a significant difference to an organisation’s bottom line.
Businesses need to deploy automated tools that will help them gain quick insight into their cloud costs, grounded by utilisation and spend data, which can then quickly highlight areas of spend to be rightsized or cut back.”
“Data access is also a thing that can be problematic,” warns Timms.
“So running complex queries against a well-designed SQL database, you tend to cross a lot of tables together and you end up with fairly big queries that talk to a lot of tables throughout your database. This is just a product of normalisation, which we’ve all been through in school.
“If you’re going to be cost-efficient, then sometimes you have to sacrifice that normalisation and hold multiple views of that data allowing you to kind of pre-populate those queries, so that instead of doing big joins, across a lot of tables, you end up just doing key lockups.
“Key lockups can be very, very fast and very, very cheap. They also have the added advantage of not needing the power of something like an SQL server behind them. You can move those off into a product, like table storage, where you can do thousands of queries for a penny.”