Philip Williams
on 19 January 2023
The flexibility of public cloud infrastructure allows for little to no upfront expense, and is great when starting a venture or testing an idea. But once a dataset grows and becomes predictable, it can become a significant base cost, compounded further by additional costs depending on how you are consuming that data.
Public clouds were initially popularised under the premise that workloads are dynamic, and that you could easily match available compute resources to the peaks and troughs in your consumption, rather than having to maintain mostly idle buffer capacity to meet peak user demands. Essentially shifting sunk capital into variable operational expense.
However, what has become more apparent is that this isn’t necessarily true when it comes to public cloud storage. Typically what is observed in a production environment is a continual growth of all data sets. Those that are actively used for decision making or transactional processing in databases, tend to age out but need to be retained for audit and accountability purposes. Training data for AI/ML workloads grow and allow models to be more refined and accurate over time. Content and media repositories grow daily, and exponentially with the use of higher quality recording equipment.
How is public cloud storage priced?
Typically there are three areas where costs are incurred.
- Capacity ($/GB): this is the amount of space you use for storing your data or the amount of space you allocate/provision for a block volume.
- Transactional charges when you interact with the dataset. In an object storage context, this can be PUT/GET/DEL operations. In a block storage context, this can be allocated IOPs or throughput (MB/s).
- Object storage can also incur additional bandwidth charges (egress) when you access your data from outside of a cloud provider’s infrastructure or from a vm or container in different compute regions. These charges can even apply when you have deployed your own private network links to a cloud provider!
If in the future you decide to move your data to another public cloud provider, you would incur these costs during migration too!
Calculating cloud storage TCO
Imagine you have a dataset that’s 5PB and you want to understand its total cost of ownership (TCO) over 5 years. First we need to make some assumptions about the dataset and how frequently it will be accessed.
Over the lifetime of the dataset we will assume that it will be written to twice, so 10PBs of written data. We will also assume that it will be read 10 times, and each object is an average of 10MB.
In a popular public cloud, object storage capacity starts at $0.023/GB, and as usage increases the price decreases to $0.021/GB. You are also charged for the transactions to store and retrieve the data. These costs sound low, but as you start to scale up, and then consider the multi-year cost they can quickly rise to significant numbers.
For the 5PB example, the TCO over 5 years is over $7,000,000, and that’s before you even consider any charges for compute to interact with the data, or egress charges to access the dataset from outside of the cloud provider’s infrastructure.
Balancing costs with flexibility
Is there another way to tackle these mounting storage costs, yet also retain the flexibility of deploying workloads in the cloud?
IT infrastructure is increasingly flexible, so with some planning it is possible to operate an open-source storage infrastructure based on Charmed Ceph that is fully managed by experts adjacent to a public cloud region and connected to the public cloud via private links to ensure the highest availability and reliability. Using the same assumptions around usage as before, a private storage solution can reduce your storage costs by more than 2-3x over a 3-5 year period.
Having your data stored using open-source Charmed Ceph in a neutral location, yet near to multiple public cloud providers unlocks a new level of multi-cloud flexibility. For example, should one provider start offering a specific compute service that is not available elsewhere, you can make your data accessible to that provider without incurring significant access or migration costs. As you would when accessing one provider’s storage from another provider’s compute offering.
Additionally, you can securely expose your storage system to your users via your own internet connectivity, without incurring public cloud bandwidth fees.
Later this quarter we will publish a detailed whitepaper with a breakdown of all the costs of both of these solutions alongside a blueprint of the hardware and software used. Make sure to sign up for our newsletter using the form on the right hand side of this page (cloud and server category) to be notified when it is released.