On the future of cloud services and BYOC — Jack Vanlightly

On the future of cloud services and BYOC

My job at Confluent involves a mixture of research, engineering and helping us figure out the best technical strategy to follow. BYOC is something I’ve been thinking about recently so I decided to write down the thoughts I have on it and where I think cloud services are going in general.

Bring Your Own Cloud (BYOC) is a deployment model which sits somewhere between a SaaS cloud service and an on-premise deployment. The vendor deploys their software in a VPC in the customer account but manages most of the administration for the customer. It’s not a new idea, the term Managed Service Provider (MSP) has been around since the 90s, and refers to the general term of outsourcing management and operations of IT infrastructure deployed within customer or third-party data centers.

MSPs can be attractive to customers who are used to the on-premise, self-hosted model, that want to keep some degree of control and visibility but don’t want to operate the software themselves anymore. It is becoming a popular model with data start-ups albeit with a new name (BYOC). Examples are Databricks (the modern pioneers of BYOC), StarTree BYOC, Redpanda BYOC and StreamNative BYOC.

The promises of BYOC seem to be:

  • You get better security by keeping the data in your account.

  • It is cheaper, lower TCO.

These seem plausible on the face of it, but when you look deeper they don’t always hold up to close scrutiny. Also, what is missing here is everything that is given up by adopting this “in your own account” model. These things which both customer and vendor lose:

  • Serverless/resource pooling: The economics, elasticity and reliability of large-scale cloud services which can draw on effectively infinite compute.

  • Operational efficiency: BYOC has structural disadvantages that add extra overhead and friction to the operating model which can manifest to the customer as poorer quality of service and to the business as a difficulty to sustain its momentum and evolve the service.

  • A clear responsibility boundary.

In this post we’ll look at all of these points with the aim of coming away with a realistic view of BYOC and SaaS cloud services. We’ll also look at why BYOC cloud deployments are particularly alluring for start-ups, but that ultimately end up as a long-term architectural trap.

Databricks are a good example, when asked “[What is] your worst mistake as an entrepreneur (and what you learned from it)”, Databrick’s Ali Ghodsi conceded:

“Not building a complete SaaS offering from day one and dabbling around with more "secure" semi-hybrid SaaS architectures. Most shortcuts don't pay off.”

The promise of security

On the face of it, it seems intuitive that things in your account must be safe and secure. However, that neglects the obvious fact that you are bringing the vendors product, and their security regime, into your account. If you think about it, it isn’t a question of which account the product runs under, but rather who has access to deploy code or access your data. Unfortunately what this means is that security isn't a single silver-bullet thing, but a long list of painstaking tasks that any product must do, regardless of where it runs. With BYOC, most of your data never leaves your account but that doesn’t mean you’ve just solved security. The key risks remain. Who has access to the machines where data resides? Who has access to install code into those machines? What does that code do? etc.

With the BYOC model the vendor could operate at two extremes:

  • Extreme closed: The vendor has no access to deploy code, change infrastructure, debug etc.

  • Extreme open: The vendor has full access to deploy, make changes, debug, access running instances and data etc.

The same is true of a SaaS cloud service:

  • Extreme closed: The vendor fully locks down all access outside their own control plane just as in the extreme BYOC case.

  • Extreme open: The vendor provides full access to the servers, applications, code, and data to all (or at least a good chunk of) their staff.

Extreme closed restrictions don’t work in practice because you can’t hold the vendor accountable for the operation of your service; and the reliability under those restrictions would be severely hampered. When the BYOC vendor must debug issues in an environment over a Zoom call, this isn’t cloud, it's just an overly complex self-managed software. Vendors that start here will quickly add some kind of break-glass emergency access option to allow them to debug and remediate difficult incidents. However,  security is only as strong as the weakest link, and BYOC agent software that is able to open a reverse SSH tunnel to the mothership significantly changes the isolation guarantee.

On the other hand, extreme open restrictions don’t work for more self-evident reasons: there’s simply nothing preventing anyone, BYOC or SaaS, from accessing whatever they feel like.

Neither BYOC vendors nor SaaS providers want extreme levels of restriction and no-one should feel comfortable about leaving environments wide open as a free-for-all.

In reality the best implementation of either model involves careful restrictions, layering, JIT access, least-privilege, escalations, detailed audit logging, identity integrations, backups, recovery planning, and following other good-practice security principles.This kind of level of maturity takes time.

Security isn't a single silver-bullet thing, but a long list of painstaking tasks that any product must do, regardless of where it runs.

The other side of this is sovereignty. Who really controls the data? Can you take it back if you need to in an extreme situation? This would seem to be an advantage of running in your account. After all, if need be, you could revoke access to your data from the vendor. But this kind of control and revocation is not unique to the controls you have to lock down your account. SaaS cloud services handle this with a mechanism that provides exactly the same thing—customer controlled encryption at rest (you do encrypt your data, right?!). For example in Confluent, Snowflake, Mongo, and most other SaaS data products you can revoke the encryption keys at any time to shut off the vendor’s access to data.

Encryption comes in many forms and can be used to limit access to data by third parties.

If what we are worried about is actually access to data, rather than whose account the machines are in, then there is something that resolves this concern regardless of which account is involved: end-to-end encryption. If we encrypt data in the clients, then the vendor never has access to it. Doing this by hand is a pain, but many SaaS vendors, including Confluent, are including end-to-end field level encryption. Confluent will be announcing a significant addition to this capability at the Current conference this week. End-to-end encryption is the true gold standard for data confidentiality/sovereignty, as it significantly reduces your data risk regardless of BYOC or SaaS. A security posture which relies on having the data in your account isn’t inherently safer than one based on the industry best practice of encryption. 

The argument that BYOC is the silver-bullet for security is so over-simplified that it is essentially nonsense. What is definitely true is that with BYOC, the buck stops with the customer. Since it deploys in the customer environment, but isn’t at quite the same trust level as the rest of their code, the customer needs to make sure they control its access to other systems and services. The customer ultimately must invest the time and money into locking down the environment and executing the proper security scanning and monitoring oversight. It is true that the customer gets more controls and more visibility but they also inherit the work that comes with that. These responsibilities and workloads are seldom, if ever, represented in the TCO equation.

Speaking of responsibilities, it’s time to talk about BYOC and networking.

The complexities and costs of BYOC networking

BYOC isn't as simple as dropping a VPC into your account and you’re done. BYOC is dependent on private networking for inter-VPC connectivity (something which is avoidable with SaaS). This is an additional headache for the customer which must now figure out an inter-VPC connectivity strategy.

In the case of AWS, the free option is VPC Peering but this can come with a significant operational burden due to several complexities it introduces for network management. 

  • Routing tables must be updated which can be complex, especially in scenarios with multiple VPCs and complex routing requirements.

  • IP address overlaps between peered VPCs can cause conflicts and routing issues. 

  • There can be issues due to transitive peering limitations. Achieving transitive connectivity requires additional configurations or the use of AWS Transit Gateway (not free). 

  • You must configure security groups and network ACLs to allow the necessary traffic between peered VPCs. Misconfigurations can lead to connectivity problems or security vulnerabilities. 

  • DNS resolution between peered VPCs can be complex, especially if you have custom DNS configurations. 

  • Route propagation and timely updates are critical for VPC connectivity.

Another free option which avoids this additional complexity is VPC Sharing. The idea with this network architecture is that the vendor deploys right into your existing VPC where your applications are deployed. This avoids the need for VPC inter-connectivity but now the vendor software operates inside the same trust boundary as the rest of your deployed code. Security and convenience are seldom friends.

To avoid the operational hassle of VPC Peering or the security risk of VPC Sharing, customers can choose either Private Link (PL) or Transit Gateways (TGW) but these both come with a price tag. The general point I am making here is that all this increases the Total Cost of Ownership (TCO) because the customer must either pay for PL/TGW or must assume the additional management costs that VPC Peering introduces. This is something never highlighted as a cost of BYOC.

The promise of “It’s cheaper”

BYOC pricing is based on a subscription to the software, not the underlying infrastructure it needs nor does it include the budget needed for the additional overheads of private networking and security.

BYOC can often be made to look cheaper because the vendor focuses on the price of the software, but the customer still has to pay for the infrastructure from the cloud service provider (CSP). Other costs such as the additional responsibilities of securing the environment and private networking are also not factored in. The SaaS provider on the other hand includes all costs, including the underlying compute, storage, networking, security personnel/infrastructure and support.

That initial BYOC price isn’t the true cost the customer will end up paying. Even worse, the customer gets billed twice, and has to sort out which expenses belong to the BYOC service from two different bills. The true BYOC costs end up buried in a mountain of other CSP costs.

The start-up usually can’t negotiate discounts with the CSPs that are significantly better than their customer’s own discounts. This is where the argument comes from that BYOC is cheaper because customers can leverage their own CSP discounts. But over time, as a cloud vendor grows, their collective cloud consumption grows far larger than all but the few largest customers. What was once a disadvantage becomes an advantage as the vendor is able to negotiate heavy discounts which can be better targeted to their workload and which can in part be passed down to its customers. 

In fact the BYOC model has some tension and perverse incentives. Most of their billing  models are priced as the number of nodes under management. Their way to increase revenue is to add more nodes to your cluster! An optimization that shrinks node count will drop revenue. The SaaS provider has the opposite incentive - it optimizes its stack at every level to gain every bit of cost efficiency and performance that it can. Economics 101 tells us that given there is price elasticity that as the base cost drops, the optimum price for profit also drops and even the greedy SaaS vendor would pass on savings to its customers. 

But the most important reason why SaaS cloud services have a better potential to deliver the most cost-effective service is related to precisely what BYOC gives up: 

  • The economics, elasticity and reliability of large-scale cloud services.

  • The operational efficiency of locked down large-scale cloud services.

In contrast, BYOC is fundamentally single-tenant and typically on-premise software co-managed by the vendor. However, once you see the structural benefits that are inherent to large-scale multi-tenant architectures you really appreciate why the CSPs build the vast majority of their data services that way.

The economics, elasticity and reliability of large-scale multi-tenant (serverless) systems

Consider the efficiency at scale of thousands of single-tenant clusters each sized with enough margin for steady performance compared to a large-scale multi-tenant system based on resource pooling.

Let’s take S3 as a case study for the large-scale multi-tenant (MT) system. S3 was something new, it was cloud-native in a world still firmly following the on-premise way of thinking. 

There were four attributes that made S3 a success:

  • Simple

  • Very low cost

  • Scale free (as far as the customer is concerned they don’t think about it)

  • Extreme durability (and high availability).

Two distinguished engineers, Marc Brooker and Andy Warfield at AWS have both written about the benefits of large-scale MT systems. Marc Brooker wrote the excellent blog post titled “Surprising Scalability of Multitenancy” and Andy Warfield wrote “Building and operating a pretty big storage system called S3”. Let’s look at what they say about the economics and elasticity of MT systems.

Resource utilization (aka cost of infrastructure)

A fundamental property regarding the economics of scaling is that you scale your infrastructure for peak load but extract the value from the system over the long-term average. Imagine you receive 1 million orders per day and at a constant rate per second. This would be a dream for cost efficiency because you could scale your infrastructure to exactly match the constant load on the system. However, now imagine that you receive those 1 million orders per day in one minute bursts every five minutes. Your peak load is now five times higher than the average and so you must scale out your system to be five times larger to cope with the same total load. The real world is not constant and the larger your peak load deviates from the average, the less cost effective your system is.

“The gap between "paying for peak" and "earning on average" is critical to understand how the economics of large-scale cloud systems differ from traditional single-tenant systems.” Marc Brooker, Surprising Scalability of Multi-tenancy.

Individual workloads can be bursty or have hourly changes in demand which leads to poor resource utilization where you are literally throwing money away for wasted hardware. However, in a multi-tenant system like S3, as you add more and more workloads, the aggregate demand tends to flatten out. In fact, as you aggregate more and more workloads this flattening trend becomes more and more pronounced. 

“But as we aggregate millions of workloads a really, really cool thing happens: the aggregate demand smooths and it becomes way more predictable.” Andy Warfield, Building and operating a pretty big storage system called S3.

Single-tenant systems don’t benefit from these flattened predictable load trends and instead must be over-sized to handle the load peaks of its single tenant. Amazon RDS is a great example of a single-tenant system requiring often-idle dedicated provisioning. 

RDS (excluding Aurora) is single-tenant because it is a managed traditional RDBMS service. Have you ever read the S3 and RDS SLAs? Have you ever noticed that the RDS SLA has a number of exclusions related to issues resulting from database overload and the customer not following operational guidelines (adequate sizing, proper timing of backup jobs etc)? None of those exclusions exist for S3 because the load from any single customer is insignificant compared to the aggregate load. Customers don’t have similar operational guidelines for S3 because S3 isn’t an on-premise era technology. It’s a cloud scale multi-tenant service that distributes a shared pool of resources as necessary, depending on customer load. 

But what about auto-scaling? That is a good point which leads us to elasticity.

Elasticity and excess capacity

Many distributed, single-tenant data systems can be scaled in/out and some can auto-scale according to demand, but this autoscaling is notoriously difficult to get right. Scaling a single-tenant system usually involves the overhead of standing up new infrastructure, draining nodes, potentially pausing workloads, causing perturbations to client software as well as the data movement costs. This makes it best suited for seasonal and other predictable slow moving changes in load.

What is scaling in a large-scale MT system? The answer is that it is logical scaling, not physical scaling. Logical scaling can be performed across multiple dimensions (some visible to the customer and some not):

  • Variable quotas and rate limits. Scaling out is as fast as it takes the system to apply the new limits.

  • An upper bound set of quotas/rate limits + consumption based billing. Simply place an upper bound on resource consumption (throughput, requests/sec etc) and charge the customer for what they actually use hour-by-hour.

  • Dynamic allocation of underlying physical resources to logical resources. This type of logical scaling may be hidden from the customer but it can be just as responsive. It simply expands/shrinks the amount of the physical infrastructure that the logical resource is run on.

The nature of the logical scaling depends on the API of the service. In the case of Confluent, it is currently Basic and Standard clusters, which are really just logical clusters. A more sophisticated autoscaling logical cluster offering will be announced at the Current conference this week. For S3 it’s a bucket and its objects.

Confluent offers the auto-scaling of logical Kafka clusters without requiring the customer to wait for any physical scaling. The customer has an upper range for the size of the cluster (in terms of resources like throughput and connections/sec) implemented as a set of quotas/rate limits. The customer then simply gets charged based on hourly consumption of what was actually used. S3 is somewhat similar: it applies upper bound rate limits and charges you based on the number and types of requests.

This comes back to how large-scale MT architectures work - the load of a single customer is small compared to the total capacity! What auto-scaling really means for me as a customer is: limit my upper-bound with rate limits, and just charge me for the consumption I actually used. The underlying cloud service can handle large volumes of traffic, the changes in demand of any given customer workload is absorbed over the short term and aggregate trends are predictable enough that the underlying infrastructure can be scaled ahead of time. If the general trend for a given physical cluster shows that aggregate load is increasing, the SaaS provider can increase the physical cluster size far before the aggregate load could cause the cluster to reach a level that could impact performance. 

Elasticity also plays a role in reliability as individual workload spikes can be accommodated without stressing the system. However there is another key property that contributes to reliability: excess capacity.

Single-tenant systems have a smaller margin of extra capacity to work with when failures occur or maintenance operations such as cluster rolls are needed. For example, a single-tenant three broker Kafka cluster will lose 33% of its capacity when a single broker fails or is restarted. Each remaining broker must go from handling 33% to 50%  of total load. When a disk or server experiences degraded performance, it requires expanding the cluster and then decommissioning the problematic VM. These kinds of operations can be time consuming and nerve-wracking when the cluster is already under heavy load. Interestingly, the per-node pricing of BYOC actually encourages the customer to deploy fewer nodes to keep licensing costs down, but from an operational perspective it is far from the optimum deployment strategy.

By comparison, large-scale MT systems have enough capacity to absorb failures and server upgrades with only minor impact on individual server loads. Degraded hardware is handled by rebalancing reads and writes for this data to other servers with minimal/no service interruption and replacing the problematic hardware without worries of overloading the rest of the servers.

All this translates into a more reliable and dependable service for customers with greater cost efficiency. There is a reason why the hyper-scalers (AWS, GCP, Azure) don’t implement most of their services as single-tenant systems. The economics, elasticity, reliability and simplicity that a true cloud service architecture offers cannot be refuted.

It should be noted that AWS is now offering most of its initially single-tenant data services as multi-tenant serverless. For example, there is a serverless SKU for EMR, Redshift, OpenSearch, Aurora and Database Migration Service (DMS). 

Here comes the fundamental truth and main crux of why I believe the future of cloud services is large-scale and multi-tenant: S3 is able to deliver amazing durability, scalability and low price because the technology itself is structurally oriented to deliver those things. There is no magic here, you build the software architecture to support the attributes that customers need.

I will say it again:

“It is the underlying technology that enables the outward facing attributes we value like simplicity, reliability, elasticity, and cost efficiency. We obtain those key attributes because we architect specifically for them”. Jack Vanlightly, this post.

Single-tenant systems cannot match the potential of large-scale MT systems on these critical attributes. The underlying design has fundamental inescapable properties.

The operational efficiency of large-scale cloud services (or “The massive overhead of single-tenant BYOC”)

Imagine you are a very senior engineer tasked with supporting and evolving a data service in the cloud. Would you rather have to support potentially tens of thousands of single tenant on-premise clusters deployed across tens of thousands of customer accounts, with a wide variety of instance types, storage types + sizes, with varying agreements and contracts with those customers in terms of the type of access, who is responsible for various parts of security and operations? Or would you prefer a large fleet of servers with a disaggregated architecture deployed in a pristine uniform environment, fully under the vendor's control? 

This large-scale multi-tenant architecture with all the benefits from the inherent traits such as excess capacity and elasticity, as well as:

  • Per customer quotas.

  • A stable environment.

  • A standard set of instance and storage types where your software has a well-known performance profile.

  • A minimized set of versions and configurations.

  • A standard set of tools and access controls.

Now imagine you are a senior executive who has to operate this ship, deliver it at the  lowest price, best level of service for customers in terms of stability and reliability and on top of all of that, be able to evolve and innovate the product over the long run.

One of those models is sustainable, efficient and actually able to deliver a great experience to customers. The other is a kind of architectural dead-end. BYOC is a nice enough street, but it doesn't go anywhere in the long run. The BYOC business model is at odds with itself, as it requires managing and maintaining an ever-increasing count of single-tenant clusters with a diverse set of hardware and software configurations, while at the same time seeking out efficiencies to keep it competitive with other cloud vendors. Even worse, it fails to provide the simple hands-off experience that made services like S3 such a definitive success.

The responsibility boundary

BYOC is not only an operational battle for the vendor, but also for the customer. The Shared Responsibility Model for BYOC is a three way partnership between the customer, the CSP and the BYOC vendor. This presents some challenges which can strain the vendor-customer relationship and result in poorer quality service. It might not always be clear who is responsible for what and the resulting confusion can be frustrating for both parties. The customer which may be used to an on-premise model (hence may be why it picked BYOC) now doesn’t have full access to debug and troubleshoot, but the vendor may have some restrictions in place that limit their ability to enter and remediate. Remember the data in your account security silver-bullet? Customers don’t want an extreme-open security model so there is a tricky balance of restrictions and separation of responsibilities involved.

When the system goes down tickets get logged, stress can mount and the questions around who is responsible for this outage comes up. Maybe the customer overloaded the system? Or perhaps it's a bug in the software that just looks like an overload? Perhaps it's the CSP’s fault, a degraded storage drive, a degraded network or just that the customer didn’t size the hardware adequately? Remember those RDS SLA exclusions?

The boundary between responsibilities is much clearer with a SaaS cloud service. The customer gets an API, a set of quotas and holds the vendor accountable for the quality of service. The separation of responsibilities is clear. The customer can get rate limited, yes, but it won’t overload the underlying large-scale service. There can still be ambiguities that crop up and usually about performance issues which could be customer caused or vendor caused. A good SaaS vendor will have heavily instrumented their system so that issues can be quickly diagnosed and remediated if there is an ongoing cause like a degraded disk.

The single-stack, multiple deployment model fallacy

It seems reasonable that a vendor could offer both a SaaS and a BYOC deployment model - simply architect it so the data plane can be placed in any VPC (customer or vendor owned) and the control plane stays in the vendor account. However, this is the same on-premise architecture thinking. 

The single-cloud-stack: bringing single-tenancy to any VPC.

The design of on-premise software tends to be monolithic - everything in a single binary! However, large-scale MT systems are not monolithic; they are decoupled into different services which can be scaled and deployed independently. S3 for example has a front-end layer of API endpoints, a storage fleet of hard drives, a storage management layer and a namespace service. Each of those components themselves are composed of many services. While this is the right architecture for S3, it isn’t the right architecture for a small-scale single-tenant deployment.

The vendor that wants to offer a fully managed SaaS service and BYOC has two choices; opt for the single stack and stay the course with the traditional ST architecture, or choose to support both ST BYOC and MT SaaS.  Supporting two stacks requires much more work, including development, maintenance, documentation, and support. The ability to execute on that plan is challenging to say the least. 

Further, not only does the vendor need to do all those things well, they also need to make a profit (or not run out of runway)! Maturing and hardening two platforms at once is no easy task, forcing the vendor into a position where they must delegate their limited resources between one platform or the other. Big impact features will require both a ST and a MT implementation, leading to repeated similar-yet-different work, and resulting in an inability to mature either platform to a level that can match SaaS competitors. While you go chasing BYOC and Serverless at the same time, MongoDB Atlas or Snowflake just ate your lunch. For a start-up considering BYOC vs. SaaS, this can become existential stuff!

Single-tenancy is not the future of cloud services

I am not so against BYOC or single-tenancy as this post may seem so far.

As far as single-tenancy goes, Apache Kafka and similar distributed streaming options such as Apache Pulsar, RabbitMQ, and databases like MySQL, Postgres etc were born in the on-premise world and are best-in-class options in that context. Many are designed with fault tolerance and high availability with some amount of elasticity built-in. These are all great systems for running in the on-premise world where the almost infinite compute scale of the cloud is not available. I have spent half of my career dedicated to working on those kinds of systems.

But in the cloud, we do have effectively infinite compute, storage, and networking, allowing us to build services that are orders of magnitude larger. We can create services that deliver better economics, better elasticity and better reliability that anything the on-premise world is capable of. The architecture of massive resource pooling simply offers an unbeatable cost-efficiency that single-tenant systems cannot match.

So why do start-ups offer single-tenant BYOC as cloud services? Single-tenant BYOC may well be the best way for an early start-up to turn a piece of self-managed software into an MVP cloud product. This path lets them treat cloud like an RDS-management framework for their on premise offering. It will also be cheaper for them to get going: companies like Confluent and Snowflake made very heavy investments in cloud infrastructure and negative cloud margins early on as they built their cloud businesses to scale.

While BYOC can be a shortcut to a cloud product it is very far from being "the future of the cloud".

The best cloud services leverage the scale of the cloud

Single-tenant BYOC has a place and it is best suited to the start-up which needs to get to market fast. I get it and honestly if I worked at a start-up I might consider that model myself. These open-source and source-available distributed data systems are designed to be fault tolerant and highly available and can do the job while the company is in the early stages. 

But BYOC is not a long-term business model and once in it, it ends up being a trap which is hard to escape from. Trying to build and scale a business on literally tens of thousands of tiny (compared to the aggregate) clusters distributed across thousands of customer cloud accounts is an architecture for hardship, for both vendor and customer. 

It is very hard to argue with a straight face that the single-tenant BYOC architecture is superior to the huge scale multi-tenant systems of S3 or BigQuery or Spanner. The cloud enables you to leverage the inherent benefits of infinite and elastic compute to build more reliable and cost-efficient services than was previously possible.

The reality is that the CSPs innovated cloud-native SaaS solutions first because they saw the benefits of large-scale MT architectures. Now, popular software vendors have seen the success of the CSP offerings, and have moved to follow in their footsteps. Among the software vendors, there are pioneers such as Snowflake who adopted the large-scale MT SaaS cloud architecture from the start. Other companies, such as MongoDB and Confluent, are also quickly maturing their own MT SaaS services to leverage the massive compute infrastructure of the cloud. Databricks is also rapidly making the switch to SaaS with products such as Serverless SQL warehouses.

AWS single-tenant services are going serverless, such as EMR, Redshift, OpenSearch, Aurora and others. AWS stated that this move “makes it significantly easier and more cost effective for customers to modernize their infrastructure...—without having to even think about managing infrastructure”.

It’s not just the structural benefits of large-scale MT that we’re aiming for, it’s the security maturity that comes with it. Things are maturing very fast in this arena, with features such as end-to-end encryption, Bring-Your-Own-Key (BYOK), private networking, audit logging, JIT access… the list goes on.

To repeat myself again, because this really is the fundamental insight here:

Top tier SaaS services like S3 are able to deliver amazing simplicity, reliability, durability, scalability and low price because their technologies are structurally oriented to deliver those things. Serving customers over large resource pools provides unparalleled efficiency and reliability at scale.

BYOC is not that architecture - it is the on-premise architecture that someone else co-manages with you in your cloud account.

Final thoughts on the future

The trend has always been towards higher level abstractions, making the individual software engineer more and more productive at each stage of the evolution of our industry. The explosion of the cloud and SaaS is just one manifestation of that history towards doing more with less.

While offering a string of new features to customers is attractive (in fact very hard to resist), in the end building a cost effective, elastic and reliable cloud service is the main goal. It also happens to be the hardest one to do and the longest in terms of engineering time. Reliability and stability are perhaps the most important of all for customers as nothing else matters if you don’t have that.

The direction that cloud services are heading is not back to the on-premise architecture deployed in your cloud account but towards “serverless” large-scale MT systems. This shift to large-scale MT is already underway and the proportion of workloads run on single-tenant systems will gradually shrink over time.

I think the same will be true for BYOC. Just as customers moved away from racking their own hardware to move to the cloud, most of those that try BYOC will similarly migrate away to SaaS for its simplicity, reliability, scalability and cost-effectiveness. There are some legitimate reasons for needing to run software inside your own account. For example, many organizations have agreements with their own customers that their data won’t leave their jurisdiction, and therefore are unable to use a SaaS vendor. That’s fine and I think BYOC can make sense for those situations. Some organizations just want to keep their data in their account and that is that. But these customers are a minority and vendors focusing on BYOC as their core product offering will see themselves losing the competitive battle against SaaS due to the factors I have outlined in this post, inducing them to shift to their own multi-tenant SaaS offering, or die trying.



Thanks for reading.

Share