Skip to main content

PubPub's Plan to Support Self-Hosting

It won't be easy, but we have a plan — and more importantly, a business case — for removing proprietary dependencies and supporting self-hosting.
Published onJun 30, 2022
PubPub's Plan to Support Self-Hosting

We sometimes hear from potential PubPub users that they’d like to be able to host PubPub on their own servers. We’d like this, too, but to date we’ve had to answer those queries with an honest but understandably disappointing response: we’re happy to help people try it, but our focus right now is on the hosted version, which requires using some proprietary technologies. We are planning to remove those dependencies from the hosted version and support self-hosting in the medium-term. In the meantime, PubPub is a fully open-source project that anyone can download, inspect, propose changes to, and, with some effort, run on their own.

We want to take some time to add context to that answer, and lay out our roadmap for achieving self-hosting so that you can better understand our plans and hold us accountable for our progress. This is not meant to be a firm commitment or timeline, because circumstances change, and, as you’ll see below, openness can conflict with our commitments to our community, members, and employees. Generally, however, we plan to make PubPub, and all the services it implements, fully open-source and self-hostable within three years. It will likely be much more easily self-hostable, but still reliant on some proprietary services, well before that. Here’s why, how, and why it’s important for us to achieve this goal.

Our Philosophy: Balancing Openness, Mission, and Sustainability

We’ve been committed to building open publishing technology since PubPub started seven years ago as a graduate student project. We’re also deeply focused on solving core needs for our users quickly and effectively. Finally, we’re committed to our own sustainability, because it’s important to us that we don’t abandon our users, like so many well-meaning projects that were unable to become sustainable have needed to do.

Sometimes, these three commitments come into conflict. It’s often faster and more cost-effective to build what users need on top of closed platforms, and we’re a small non-profit operating in a resource-constrained environment. Making decisions in those scenarios is always difficult, and though we don’t always get it right, we try to strike the right balance of using as much fully open technology as possible while also making sure the product meaningfully addresses core community needs and doesn’t overextend our resources. We’ve also committed to making these decisions publicly as much as possible so that users and the community can weigh in. Our entire codebase, roadmap, and day-to-day project planning is all publicly viewable on GitHub.

The core need PubPub was built to address is the ability to draft and publish web-native scholarly content without relying on costly in-house technical capacity. Over and over, we hear from communities that our user-friendly approach—which requires no server setup or maintenance—is unique to the ecosystem and the only reason they’re able to publish at all. We take our responsibility to this community seriously because even in a world with an increasing number of open publishing technologies, users with limited time or technical capacity have very few options. It’s always gratifying to hear from librarians, editors, teachers, and researchers who are relieved to find us after reading blog posts extolling just how easy it is to learn how to setup a server and install a package without pausing to consider how chronically overworked and underpaid knowledge workers already are. But our commitment to these users comes with tradeoffs.

The main tradeoff is that we’ve focused almost exclusively on building the central PubPub instance, rather than dedicating resources to maintaining a self-hostable version. These are two very different use-cases, which will require dedicated resources to build and maintain. In an ideal world, we’d have the resources to do both, and we’re working towards that world. The easiest path would be to split the company into a for-profit in charge of maintaining the central instance, and a non-profit in charge of maintaining the self-hosted codebase, much like Wordpress. We’ve resisted that approach because such a split would distract from our mission by forcing us to build our core product for users who can afford to pay for-profit prices—exactly the opposite of what we’ve worked so hard to accomplish.

Instead, we’re planning to support both centralized and self-hosted versions within our existing nonprofit by slowly building our new membership and community services offerings, as described in our sustainability update earlier this year. Our goal is to become deeply sustainable. Not just enough to keep the lights on, but enough to pay fair salaries and offer great benefits to our employees, to continue developing innovative features, and to build a robust rainy day fund. Unfortunately, some people in the ecosystem don’t like this approach. They see it as a false choice, and believe that focusing on making our core use-case sustainable compromises on openness. We respectfully disagree. Learning from the legions of projects that have either pivoted to serve the needs of well-funded incumbents, or simply failed when their funding ran out, we believe we are on the right path to sustainably meet the needs of an underserved set of users while maintaining our commitment to being as open as possible.

The Path to Self-Hosting PubPub

On a technical level, there are two main projects we will need to complete to enable self-hosting: removing PubPub’s reliance on proprietary systems, and working to containerize and document the application so it can be deployed on lots of different architectures.

Removing Proprietary Dependencies

The path to removing proprietary dependencies will be time-consuming, but relatively straightforward. It would have been considerably more difficult when we started building PubPub 7 years ago. At that time, one of the only ways a single developer could build and maintain a production-ready reliable, real-time, collaborative, WYSIWYM1 editor—the core feature that originally made PubPub so useful—was to use proprietary real-time database technology from a startup called Firebase, which was later acquired by Google.2

Today, open-source technologies such as Supabase that replicate key parts of Firebase are commonplace, which makes the prospect much less daunting, though still very involved. Replacing Firebase is on our long-term roadmap for both hosted and self-hosted versions of PubPub, because its proprietary nature and ownership by Google does not align with our values, it is a large cost center, and it has become less reliable and flexible over time.

In addition to Firebase, there are two truly proprietary services PubPub integrates that cannot be easily replaced with a compatible, self-hosted version: Heap, for impact measurement; and Algolia, for search. In a self-hosted version, Heap could be easily replaced by a tool of the users’ choice. We will recommend privacy-respecting analytics tools, and we are considering replacing Heap with the open-source tool Posthog, which also has great privacy features, for the hosted version as well. For search, we will likely continue to rely on Algolia for the time being. Unfortunately, it’s become harder to self-host scalable search tools, with even open-source pioneer Elasticsearch recently switching to a more restrictive license.3 Luckily, Algolia is an independent, value-driven company that offers steep discounts for non-profits and open-source maintainers. For self-hosting, which does not require Algolia’s scale, we will likely replace it with a simpler search feature.

Containerizing and Documenting

Containerizing and documenting the application is in some ways less straightforward than removing proprietary dependencies, but it is also less complex because it can be done in stages. PubPub is currently deployed on Heroku, which is both expensive and beginning to stagnate after being acquired by Salesforce in 2011. Luckily, Docker, an open-source technology that allows developers to containerize their applications so that they can be run on lots of different hosting services, is now commonplace, and thanks to efforts like the Open Container Initiative, developing into a community-led standard. We are beginning to explore ways that we could transition PubPub to a containerized build system that would give us more flexibility for hosting. This would immediately benefit potential self-hosters because they would be able to easily deploy and modify PubPub locally with minimal setup and would not be locked into any particular hosting architecture for deployment.

Most likely, even once the hosted version is containerized and our tight coupling with specific proprietary services removed, our application may still delegate some functionality to third-party services of the user’s choice. For instance, we may ask users to specify a file storage provider, giving them a choice between Amazon’s S3, another cloud service, or a locally-hosted tool.

By the time we achieve containerization and the removal of proprietary services for the hosted version, we plan to be sustainable enough to dedicate resources to fully documenting the self-hosted version as well as developing necessary infrastructure to maintain it, such as release/upgrade infrastructure, expanded testing, and quality assurance.

We’re not just committed to doing this because it’s the right thing to do. It’s also in our best interest to do so.

Business Model Alignment

As you may have noticed in the previous section, most of the changes required to build a self-hosted version of PubPub are already on our roadmap. This isn’t solely due to value alignment. Many open-source companies (nonprofit and for-profit) are criticized for developing business models that compete with the open-source versions of their software, which leads them to under-support that version. On the flip side, we believe companies that pursue openness for the sake of it without a clear understanding of how it fits into their business model should be just as closely scrutinized. These companies often fail to build useful technology, fail to find sustainability and disappear, or, like Elasticsearch, ultimately pivot to restrictive models that more closely align with their revenue goals, leaving users stranded.

At KFG, we carefully designed our sustainability model to incentivize us to be as open as possible. Our membership program and services offerings are perfect complements to self-hosting. In the future, anyone should be able to self-host PubPub for free. If they want to support us and receive support and additional benefits, they can choose to become a member. And if they need extra help setting up, customizing, or publishing on their instance, we (and others!) can provide those services without needing to care whether they host themselves or on our core version.

We’re also betting on self-hosting being a revenue driver in the future. As PubPub becomes more popular, we’re receiving an increasing number of requests for managed, on-premises hosting, often due to data sovereignty laws. We’re excited about being able to provide this service, both because it opens up a new line of sustainable, mission-aligned revenue, and because enabling data and technology sovereignty is well aligned with our mission.

As always, we hope this provides an honest, transparent window into our thinking to help you assess whether to use PubPub. If you have any questions, would like to help, or are still wondering if PubPub is right for you, please feel free to leave a note on our GitHub Discussion Forum, reach out via Twitter (@pubpub), or contact us via email at [email protected]. We look forward to hearing from you, and working with you to make PubPub even better.

— Gabriel Stein, KFG Head of Operations and Product

No comments here
Why not start the discussion?