Commercializing open source and the cloud threat

A look at the recent changes in Elastic’s licensing model as well as Confluent’s changes. Also covers developer incentives and long term success of open source projects

Hi All,

I hope you all are doing well and welcome (if you aren’t new then again) to Dozen Worthy Reads. A newsletter where I talk about the most interesting things about tech that I read the past couple of weeks or write about tech happenings. You can sign up here or just read on …

Elastic not too long ago changed its licensing for Elasticsearch and Kibana, two widely used open-source projects in enterprise tech (part of their ELK stack).

I’m going to explore, in this article, what these changes mean and why they did this. These products were originally available under the Apache 2.0 License but now require an SSPL (Server Side Public License). The key reason for this change is just primarily so that BigCloud (AWS, Azure, Google Cloud) should not just just take these open source products and make them available as a cloud “service”.

Like so many of you I find the dynamics of open source fascinating and thought it would be good to take a small step back and understand the inception of Open Source to understand how we reached this point and why companies such as Elastic and Confluent (Kafka has an Apache license as well as confluent community license for exactly the same reason — prevent bigCloud from taking advantage of their hard work).

But I digress so let’s get back into the inception of Open Source :

Open source is basically a branch of the free-software movement which began in the 80’s with the GNU (GNU’s not Unix) project. The term free software does not imply freedom of software exchange and modification. Both the free-software movement and the open-source movement share this view of free exchange of code and they are both collectively referred to as “Free and Open Software”or “Free/Libre Open-Source” (FOSS or FLOSS as you may have read)

Today we don’t appreciate how important open source is but take for example Linux which is open source and has a large community of contributors that made the operating system what it is today. Its most illustrious and famous contributor is Linus Torvalds (who is known for his insults at members of the community). In a lot of ways Linux was the poster child for successful open source projects (and back in the day people thought this was crazy but look how far we have come).

In this case Linux was free to download, compile, and use but adoption took long because companies that ran mission critical software were afraid to use the software since the software could potentially fail or have unexpected bugs which might take long to fix since there isn’t one “throat to choke” so to speak. CTO’s, CIO’s and IT managers love to have a “throat to choke”, so to speak. For example, if you recall Heartbleed (OpenSSL) .. From The Complicated Economy of Open Source Software

The code change that Henson approved on that fateful night in December had been submitted by a German developer named Robin Seggelmann who helped write the “heartbeat” standard in OpenSSL. Henson and Seggelmann had been workshoping the code for weeks before it was approved, but nevertheless failed to catch a bug that would allow an attacker to intercept information that was passed to any site secured by OpenSSL.

Most CEO’s would never have imagined taking the risk (funnily enough SO MANY large companies do use OpenSSL, I digress though)

RedHat was one of the first companies to provide “one neck to choke” with Red Hat Enterprise Linux. By providing the proverbial “one neck to choke” when companies pay for support they created a huge market for themselves and gained significant adoption since this made companies more amenable to experiment — at first with test/dev systems and then with production systems!

It is estimated that at least 60–70% of servers now use Linux. With the inception of other open source technologies there was what is called a LAMP stack (Linux-Apache-MySQL-PHP) which was literally all free and all a developer needed to get started. This was a huge deal back in the day and a great (future moat) counter positioning. As I wrote:

Counter-positioning defined by Hamilton Helmer, the author of 7 powers:

“A newcomer adopts a new, superior business model which the incumbent does not mimic due to anticipated damage to their existing business.”

A great example of this I think is Dollar Shave Club. Dollar Shave Club was predicated on having cheaper blades shipped to your door. They didn’t need to have retail partnerships or high margins and their marketing channel was social media. They did a fantastic job of creating a subscription product by counter positioning

Another great example is Robinhood or Wealthfront, they positioned their products at younger, less wealthy but tech savvy people that did not have the funds to even justify a personal investment advisor.

This counter positioning along with the “one neck to choke” and hugely stabilized versions of Red Hat Enterprise Linux was the key reason this change happened!

This was very much by design. From the wiki page. This was very much the way the GNU license was structured …

The free software licenses, on which the various software packages of a distribution built on the Linux kernel are based, explicitly accommodate and encourage commercialization; the relationship between a Linux distribution as a whole and individual vendors may be seen as symbiotic. One common business model of commercial suppliers is charging for support, especially for business users. A number of companies also offer a specialized business version of their distribution, which adds proprietary support packages and tools to administer higher numbers of installations or to simplify administrative tasks.

Licensing models :

There were MANY different types of licenses but think of licensing as a range from public domain (copy and reuse as you need to) all the way to Trade secret : From Wiki:

There were many different types of open source licenses but overall licences but key to the argument (and licensing changes) is how permissive do you want the license to be? A permissive software license, sometimes also called BSD-like or BSD-style license allows say a company (take any company that use open source), make improvements to the product, make it proprietary, and sell it without giving this back to the community (I am simplifying this a bit)

A copyleft license generally requires the reciprocal publication of the source code of any modified versions under the original work’s copyleft license. A copyleft license tries to ensure that modified versions of the software will remain free and publicly available, generally requiring only that the original copyright notice be retained.

Note that there are many nuances here but for all practical purposes how you license open source software greatly impacts what companies/individuals can do with this.

I want to take a second to call out the Affero GPL. From Wiki

Both versions of the Affero GPL were designed to close a perceived application service provider (ASP) loophole in the ordinary GPL, where, by using but not distributing the software, the copyleft provisions are not triggered. Each version differs from the version of the GNU GPL on which it is based in having an added provision addressing use of software over a computer network. This provision requires that the full source code be made available to any network user of the AGPL-licensed work, typically a web application.

Essentially this means that since the software is not “distributed” the copyleft (distribution back to the community) which was not possible with a GPLv3 license. The GPLv3 license states that Nobody should be restricted by the software they use. There are four freedoms that every user should have:

the freedom to use the software for any purpose,

the freedom to change the software to suit your needs,

the freedom to share the software with your friends and neighbors, and

the freedom to share the changes you make.

When a program offers users all of these freedoms, it’s called free software.

Developers who write software can release it under the terms of the GNU GPL. When they do, it will be free software and stay free software, no matter who changes or distributes the program. We call this copyleft: the software is copyrighted, but instead of using those rights to restrict users like proprietary software does, we use them to ensure that every user has freedom.

We update the GPL to protect its copyleft from being undermined by legal or technological developments. The most recent version protects users from three recent threats:

Tivoization: Some companies have created various different kinds of devices that run GPLed software, and then rigged the hardware so that they can change the software that’s running, but you cannot. If a device can run arbitrary software, it’s a general-purpose computer, and its owner should control what it does. When a device thwarts you from doing that, we call that tivoization.

Laws prohibiting free software: Legislation like the Digital Millennium Copyright Act and the European Union Copyright Directive make it a crime to write or share software that can break DRM (Digital Restrictions Management; see below). These laws should not interfere with the rights the GPL grants you.

Discriminatory patent deals: Microsoft has recently started telling people that they will not sue free software users for patent infringement — as long as you get the software from a vendor that’s paying Microsoft for the privilege. Ultimately, Microsoft is trying to collect royalties for the use of free software, which interferes with users’ freedom. No company should be able to do this.

Version 3 also has a number of improvements to make the license easier for everyone to use and understand. But even with all these changes, GPLv3 isn’t a radical new license; instead it’s an evolution of the previous version. Though a lot of text has changed, much of it simply clarifies what GPLv2 said. With that in mind, let’s review the major changes in GPLv3, and talk about how they improve the license for users and developers.

Of course these changes did not fix the problem with “network” usage and hence the aforementioned Affero license. Ok that was a lot of context, very much simplified that brings me to the MongoDB battle. Recall I started the article with:

These products were originally available under the Apache 2.0 License but now require an SSPL (Server Side Public License).

From Wiki:

According to MongoDB, the SSPL is based on the AGPL3 license, with the addition of a new section that “clearly and explicitly sets forth the conditions to offering the licensed program as a third-party service,” requiring that those making the software publicly available as part of a “service” must make the service’s entire source code available under this license

According to Packt writer Richard Gall, the most important new sentence in the license reads “If you make the functionality of the Program or a modified version available to third parties as a service, you must make the Service Source Code available via network download to everyone at no charge, under the terms of this License

From Protocol:Elastic takes aim at AWS, limits Elasticsearch and Kibana

Last Thursday, Elastic published a blog post — curiously titled “Doubling down on open, Part II” — announcing that Elasticsearch and Kibana, two widely used open-source projects in enterprise tech, would no longer be available under the permissive Apache 2.0 license. Instead, all subsequent releases to those projects will only be available under either a controversial new license known as the SSPL, or the Elastic License, both of which were designed to make it difficult for cloud companies to sell managed versions of the open-source projects they’re applied to.

This of course makes sense since AWS competes at a much larger scale/cheaper costs and as such by offering Amazon Elasticsearch Service they basically take open source software mostly maintained/improved by Elastic and benefit from selling this

Elastic has never tried to hide its disdain for AWS, a feud that dates back to the 2015 launch of Amazon Elasticsearch Service. The introduction of that AWS service, a managed version of the Elasticsearch open-source project, was arguably the low point in the strained history between enterprise tech companies based around open-source projects and AWS.

AWS of course has done nothing illegal given the licensing that Elastic first used — a more permissive licensing — which allowed AWS to do exactly this. In fact what is more is that AWS has signalled plans to fork the two projects, or to take them in a new AWS-led direction, under the same permissive Apache 2.0 license.

This is of course also problematic for the developer community. As a developer I contribute to open source for two reasons — one to homestead and the other to give back to the open source community. This might lead to open source developers contributing to Elastic to either stop of go to Amazon’s Open Distro for Elasticsearch and contribute there instead since that continues to be licensed under Apache 2.0 (more permissive license)

As Banon said :

“I’m very worried about [alienating community members]; this is why we didn’t make this change lightly,” Banon said. “Regardless of how much we try to relax our user base, some people will end up being alienated, and others, which I’m more worried about, might be fed by FUD,” the tried-and-true “fear, uncertainty and doubt” campaigns that have been part of enterprise tech marketing for decades

“They are aiming to hit Amazon, but what they are doing is throwing a boulder at Amazon floating peacefully in a pond of an ecosystem,” said VM (Vicky) Brasseur, a corporate strategist and former vice president of the Open Source Initiative. “I don’t think it’s worth it. They are going to destroy their ecosystem.”

However this I think was a question of survival for them. Elastic makes money by charging for the cloud or the non-basic self managed versions of the product so they don’t really have a choice. Amazon has used an Elastic trademark which has resulted in consumers not being sure if this is a partnership between Elastic — which to be clear, it is not. This results of course in customers going to Amazon instead of Elastic because 1) They have all their infra there and 2) The price is probably cheaper (and I say probably because I cannot compare it!) and understanding AWS’s pricing is uh well rocket science (Also see The Duckbill Group — Lower Your AWS Bill by 15–20%)

One can argue that this is too little too late. Maybe Elastic will get out of this or maybe they won’t. In fact Confluent was quicker to make this licensing change for specific components (Dec 2018) versus Elastic which published their licensing changes in January 2021, a full two years later)

Let’s take a second and look at Confluent’s licensing model. From their licensing blog post:

We’re changing the license for some of the components of Confluent Platform from Apache 2.0 to the Confluent Community License. This new license allows you to freely download, modify, and redistribute the code (very much like Apache 2.0 does), but it does not allow you to provide the software as a SaaS offering (e.g. KSQL-as-a-service).

What this means is that, for example, you can use KSQL however you see fit as an ingredient in your own products or services, whether those products are delivered as software or as SaaS, but you cannot create a KSQL-as-a-service offering. We’ll still be doing all development out in the open and accepting pull requests and feature suggestions. For those who aren’t commercial cloud providers, i.e. 99.9999% of the users of these projects, this adds no meaningful restriction on what they can do with the software, while allowing us to continue investing heavily in its creation.

Essentially what their community license says (and important section in bold)

License. Subject to the terms and conditions of this Agreement, Confluent hereby grants to Licensee a non-exclusive, royalty-free, worldwide, non-transferable, non-sublicenseable license during the term of this Agreement to: (a) use the Software; (b) prepare modifications and derivative works of the Software; © distribute the Software (including without limitation in source code or object code form); and (d) reproduce copies of the Software (the “License”). Licensee is not granted the right to, and Licensee shall not, exercise the License for an Excluded Purpose. For purposes of this Agreement, “Excluded Purpose” means making available any software-as-a-service, platform-as-a-service, infrastructure-as-a-service or other similar online service that competes with Confluent products or services that provide the Software.

Confluent is protecting its business by clearly outlining that the excluded purpose includes any service “that competes with Confluent products or services that provide the Software”. This makes a lot of sense for Confluent. They are not restricting usage or disallowing anything for K-SQL really except that you can’t directly commercialize it.

What is K-SQL

KSQL is a SQL streaming engine for Apache Kafka. It provides an easy-to-use, yet powerful interactive SQL interface for stream processing on Kafka, without the need to write code in a programming language like Java or Python. KSQL is scalable, elastic, and fault-tolerant. It supports a wide range of streaming operations, including data filtering, transformations, aggregations, joins, windowing, and sessionization.

At this point Confluent has 3 licenses. Let’s use the same lens as before:

This makes a lot of sense. This allows Confluent to continue to develop open source Kafka while ensuring that their business cannot be usurped by BigCloud, while at the same time providing additional avenues for Confluent to monetize (and they should, it’s a business not a charity). This is known as the open-core model. As mentioned in the blog (and any open source dev will tell you) it is hard to monetize open source but the key is a good balance of monetization and community engagement.

You don’t need to take my word for it, though, it turns out this experiment has been done. Dozens of NoSQL databases emerged in the 2009–2010 timeframe. Some were created as part-time projects, some came out of the internal infrastructure of large web companies, and some were created as commercial ventures. What I think is most stark is that the only systems that remained relevant through to today are those that, whatever their origin, managed to develop a stable commercial entity that helped sustain ongoing investment. Those that did this (MongoDB, ElasticSearch, Cassandra, Hadoop) all continue to thrive and have become part of the modern stack. Those that didn’t (Voldemort, Dynomite, CouchDB, and a dozen others) have all fallen by the wayside, despite early popularity. They still exist, but most likely you have never heard of them.

The idea being to become one of these :

In the long run open source will continue to flourish and we’ll see lots of new open source products gain adoption for new use cases. Having a community play, engaging developers, having them contribute is only key to initial growth. However developers don’t just contribute code for the heck of it. They want their work to see the light of day, have millions of people use it and be proud of their work. This is where commercialization comes in. The ultimate objective is to be part of a large open source project that becomes huge and leads to outsized outcomes for the company as well as the developer in terms of hiring optionality

Thank you for reading. Stay safe, be well! If you enjoyed reading this please consider sharing with a friend or two (or sign up here if you came across this or were forwarded this)

Product, Strategy, Growth, Business, Engineering. Love Tech