Why Snowflake wants streaming — Jack Vanlightly

Why Snowflake wants streaming

Disclaimer: I am a researcher and advisor at Confluent, specializing in technology strategy. The insights shared here are my own, shaped by years of experience in the data industry.

Rumors are swirling that Snowflake intends to acquire Redpanda and many are questioning why and what impact this might have on Confluent. First, let’s remember that these are just rumors and there’s nothing official. But given that people are speculating, here are my thoughts on how to interpret such an acquisition, whether it ends up happening or not.

There are a number of market trends in play right now, such as the rise of Iceberg and open data, as well as the war with Databricks and Snowflake’s refocus on AI. While it may not be evident at first, these are all driving Snowflake towards streaming.

Driving factor #1 - Generative AI in the enterprise is more about streaming, not batch

Enterprises are not developing their own generative AI models. Simply put, it is too complex and too costly. Not only does it require millions of dollars, but it also requires a lot of data acquisition, compute infrastructure, and expertise. It makes no sense for enterprises to develop their own. Fine tuning is also not showing great ROI, given the ever more powerful foundation and frontier models and the rapid evolution of models in general.

The shift is away from complex feature engineering and custom model building to reliance on models built by third parties (Anthropic, OpenAI, Meta, Deepseek) who do the hard and expensive training on vast troves of raw data. AI in the enterprise has firmly moved to inference time, in applications that combine the powerful foundation models (with general-purpose reasoning) with domain-specific enterprise data via RAG. 

Inference with RAG is most valuable when used in real-time or streaming applications based on real-time data. There is a stark difference in utility between ChatGPT before it had web search compared to now. AI without up-to-date data is frustrating and its value is limited.

An acquisition of a streaming vendor starts to make sense, given that AI infrastructure for enterprise workloads is primarily about streaming, and Snowflake has little in the way of streaming infrastructure. The Information, who first reported the rumor, also concluded that this was mostly about augmenting their AI capabilities.

Driving factor #2 - Snowflake is under pressure from open-data platforms

Snowflake was the first cloud data warehouse platform on the market. Since its inception, Snowflake has been a walled garden. They own the data, they have full control, and don’t even think about bringing your own query engines.

The open table formats such as Apache Iceberg and Delta Lake have fundamentally shifted the market towards open data. Databricks invested early on in its Delta Lake format, providing sharing protocols and allowing customers to store their data in their own accounts. Iceberg is now causing a shift even further towards open. Snowflake has been forced to open up, partly by customers and partly due to the fierce battle with Databricks. But while Snowflake has begun adopting Iceberg and contributing to the project, is this enough?

In a market of open data storage and Bring-Your-Own-Query-Engines, Snowflake will increasingly compete workload by workload with other vendors where before it was used to owning the whole vertical of storage and compute. Surveys have shown that a significant portion of Snowflake customers also use Databricks and vice versa. Customers are happy to split workloads between platforms based on the strengths and cost models of each platform.

But it goes deeper, the shift to open data destabilizes Snowflake’s underlying business model, which has always been centered around being a walled garden–data gravity, ownership of the compute and charging a premium for the “Apple experience” where everything seamlessly works. The writing is on the wall–the data gravity that was once so strong is now weakening.

One way of gaining back some control is to own a larger slice of the end-to-end lifecycle of data. Over the past year and a half, Snowflake and Databricks have pursued vertical integration by incorporating ingestion as part of their ecosystem. In 2023, Databricks acquired Arcion, a connectors start-up with several connectors for enterprise databases. Also, that year, Databricks announced Lakehouse Federation, and then in 2024, it announced Lake Flow. In November 2024, Snowflake announced its intention to acquire the Apache Nifi start-up Datavolo. In the press release, they said, “By bringing Datavolo into the Snowflake fold, we are expanding how much of the data lifecycle Snowflake captures.

Redpanda has its own connect feature and is mostly complementary to Nifi. Snowflake could build out a streaming/batch ingestion platform using Datavolo and Redpanda IP as the foundation. Redpanda’s integration could manifest first as ingestion to Iceberg tables, leveraging Redpanda’s Iceberg integration, but would presumably be more tightly coupled to the Snowflake ecosystem over time.

If Snowflake can bring to market such a streaming/batch ingestion capability that integrates with a large ecosystem of other data sources, it protects its core role as a place where customers store their data. With a greater slice of the end-to-end data lifecycle, built into one easy-to-use platform, they can retake some of the control they lose from the trend towards open data. It also allows them to catch up with Databricks’ moves in ingestion. Snowflake would presumably make ingestion as slick as the rest of its product, allowing customers to benefit from an easy on-ramp. Snowflake customers would make the conscious choice to buy into a closed platform in return for ease of use.

Look at the price tag they say…

Others have commented on the reported ARR of Redpanda. I won’t say anymore except to say that if it were true, then it represents a tiny fraction of the overall commercial streaming market, and a speck relative to the broader open source community’s spend on streaming. Snowflake drives more revenue than that from single customers. Naturally, it has people wondering why would Snowflake pay a premium for a streaming vendor that has struggled to gain market traction?

Again, let’s remember this is all rumor. Some have speculated that the large price tag means Snowflake wants to expand into the general streaming market. So it goes, only that strategy explains the price tag. But I think that speculation is missing some fundamental conflicts.

The cognitive dissonance of conflicting business models

Streaming as a platform is all about sharing and liberating data. Vendors in that space aren’t trying to drive data gravity; their value-add is opening up data to be consumed by multiple platforms (including Snowflake and Databricks). 

The data gravity model focuses on creating a walled garden where data becomes a proprietary asset of the platform, designed to lock customers in and maximize lifetime value. These two business models inherently conflict: one seeks to restrict data mobility to maintain control and revenue, while the other advocates for making data freely available for any use case or platform a customer desires.

The dual strategy would create a state of cognitive dissonance within Snowflake leadership and engineering/product teams. Executives might argue that embracing both models provides the best of both worlds—maximizing revenue through data gravity while also expanding into the wider data streaming market. However, the inherent tension between the two opposing business models means that strategic decisions, product development, and marketing messages could end up contradictory. The delicate balance, if not managed carefully, could lead to brand dilution, internal conflicts, investor confusion, and a failure to meet customer expectations.

This dual strategy seems fraught with intractable problems. Would anyone really believe that Snowflake can be an open streaming platform? Wouldn’t that just confuse their investors? What exactly would Snowflake be to users? Why would users trust the development of capabilities that enable streaming to competitors, when that’s not in the primary business’s best interests? When might financial pressure change market strategy?

I can just imagine the internal conflict of helping Snowflake streaming customers get data into Databricks (rather than their own platform).

The more believable reason why Snowflake might acquire a streaming vendor

Choosing the two business models route seems like a recipe for confusion. Trying to enforce data gravity while championing open access is like steering two ships with one rudder.

What is the alternative? Let’s think through why Snowflake might pay a premium for a streaming vendor. 

Snowflake is already under huge pressure in its current market from the likes of Databricks, BigQuery, and Azure Fabric. Snowflake is also in a do-or-die pivot to AI. If they pull off their vertical integration and expansion to AI, they will be in a strong position, but if they fumble the opportunity, it could result in an existential crisis. While the pivot to AI and the first moves into ingestion are underway, Snowflake is still trailing Databricks in both areas. 

An inflated price tag can make sense when you look at how Snowflake could use Redpanda talent and technology to augment its platform and catch up with Databricks. In one acquisition, Snowflake can help protect its core business, which is under threat, and support its huge pivot to AI. All the while, Snowflake remains Snowflake, no cognitive dissonance required!

Why don’t they acquire a cheaper start-up? S2, Bufstream etc? These are tiny start-ups without much IP, battle scars, or experience. Whereas Redpanda has both streaming and data integration capabilities with some production usage. Snowflake might as well choose the build option if they only consider the tiny S2’s and Bufstream’s for acquisition.

So yes, it’s a high price tag for such a low revenue start-up, but we’ve seen a few of these happen over the last couple of years with Tabular being a prime example (another case of Snowflake vs Databricks mania). Looking at the market trends and the pressure Snowflake is under, we can understand the motivations for the acquisition.

On the other hand, becoming a streaming competitor to a data streaming platform like Confluent seems a bit of a stretch and a strategy with huge headwinds. Snowflake is a data destination company with a data-gravity business model. Extending its reach to encompass data ingestion extends the gravity field farther out. The DNA required for a streaming Central Nervous System platform vastly differs from a Data Gravity platform. Streaming CNS platforms may end up competing with Snowflake on the edges. But these will be overwhelmingly Snowflake-ingestion competes, not connecting diverse systems of record nor being the event infrastructure for microservices. 

Time will tell. My bet is that Snowflake will continue to be Snowflake with a coherent business model–a data destination platform with AI capabilities.

Share