Reply on Bluesky and Decentralization

This is a reply to Christine Lemmer-Webber's thoughtful (and widely read) "How decentralized is Bluesky really?" blog post.

I am so happy and grateful that Christine took the time to write up her thoughts and put them out in public. Her writing sheds light on substantive differences between protocols and projects, and raises the bar on analysis in this space.

However, I disagree with some of the analysis, and have a couple specific points to correct.

This response is split up in a few sections. The first talks about architecture and large infrastructure. The second gets into goals, and the third touches on specific terminology ("federated" and "decentralized"). Following sections confirm identified challenges, pick a couple nits, some thoughts on trust, and wrap up.

The Big World Picture

Bluesky's mission is to build tools for "open and decentralized public conversation". Global visibility, discovery, and interactions with strangers and remote organizations are essential functionality.

Christine makes the distinction between "message passing" systems (email, XMPP, and ActivityPub) and "shared heap" systems (atproto). Yes! They are very different in architecture, which will likely have big impacts on outcomes and network structure. Differences like this make "VHS versus BetaMax" analogies inaccurate.

The "shared heap" concept goes deeper than just the relay and AppView network roles. atproto is a global self-authenticating network: a single gigantic heap. Accounts have global identifiers (DIDs), which they use to publicly publish data (repos and records). That data is published into an intertwingled ocean of public graph data: records with AT-URI references to other records. Applications are indexes or "views" of that global data graph. Details like the firehose and relay are simply mechanisms for accessing and synchronizing this public graph data. Other data transfer mechanisms, such as batched backfill, or routed delivery of events (closer to "message passing") are possible and likely to emerge. But the "huge public heap" concept is pretty baked-in.

This architecture facilitates big-world indices. It doesn't necessarily require them, but making them simple to build and operate is a big priority. Such indices are not cheap at scale! Fast disks, RAM, and bandwidth all cost money. Why prioritize these big indices? ActivityPub, Secure Scuttlebutt, and other social web and dweb projects previously demonstrated it is feasible to build network systems which don't involve big expensive servers.

It comes down to a combination of goals, design approach, and an expectation that incumbent corporations will enter the network.

As mentioned up top, we are building for big world public networks. This isn't to say that we think everybody should interact publicly all the time! Smaller more densely-connected communities are super important, and are where folks should probably spend the majority of their hours. Smaller communities are probably not well served by global public broadcast systems. But mass public networking has an important role in society, complements smaller communities, isn't going away any time soon, and should not be ceded to the incumbent closed platforms.

Given our focus on big-world public spaces, which have strong network effects, our approach is to provide a "zero compromises" user experience. We want Bluesky (the app) to have all the performance, affordances, and consistency of using a centralized platform. We don't think that we will succeed in our mission by trying to convince users that they don't really need things like consistent views of large discussion threads, or only "friends of friends" hashtag search scope, or delays in notifications. A convenient and competitive user experience is a "must", and this currently means large, low-latency, full context indices.

As an aside, there are cool distributed projects out there like YaCy. Some day it might be realistic to index cooperatively with many small-scale sharded participants, but this tech needs more R&D and polish, and would be too risky to build on today.

Lastly, for public networks, we think big full-network indices are basically inevitable. If a public network is successful, somebody will build a Google Search (Web), Google Reader (RSS), Google Groups (NNTP), Google Shopping... you get the picture. This kind of service can capture control of open networks if they aren't planned for from the start. atproto helps turn these extensions and "views" into commodity API services, and ensures that new providers have full easy access to data needed for indexing (in contrast to the many challenges with web crawling). This keeps the network resilient and interoperable even when (not if!) the largest tech companies in the world get involved.

The recent Fediverse Discovery Providers initiative seems to acknowledge the potential for large-scale indices in the Fediverse, as well as the possibly growing role of ActivityPub relays, which in my understanding have existed a long time but did not have much interest or impact until recently. These are framed as optional add-ons, and some folks might be willing to go without, but I think this functionality is important enough that it should be included in the overall protocol architecture.

So, yes, the atproto network today involves some large infrastructure components, including relays and AppViews, and these might continue to grow over time. Our design goal is not to run the entire network on small instances. It isn't peer-to-peer, and isn't designed to run entirely on phones or Raspberry Pis. It is designed to ensure "credible exit", adversarial interop, and other properties, for each component of the overall system. Operating some of these components might require collective (not individual) resources.

This doesn't mean only well-funded for-profit corporations can participate! There are several examples in the fediverse of coop, club, and non-profit services with non-trivial budgets and infrastructure. Organizations and projects like the Internet Archive, libera.chat, jabber.ccc.de, Signal, Let's Encrypt, Wikipedia (including an abandoned web search project), the Debian package archives, and others all demonstrate that non-profit orgs have the capacity to run larger services. Many of these are running centralized systems, but they could be participating in decentralized networks as well.

Developers and communities with different visions and priorities could also build alternative dataflows using the same data and Lexicons. Posts for a single account can be fetched and rendered directly from the account's PDS, as demonstrated by athome. Resilient small-world bsky AppViews could work by periodically polling specific accounts from their PDS and indexing relevant posts. There are likely more approaches and architectures we have not thought of. The ability to scale-down atproto is similar to the ability to scale-up ActivityPub to mega-instances like Threads: it wasn't a major original design focus, and there might be a bit of friction, but it is possible.

A specific form of scale-down which is an important design goal is that folks building new applications (new Lexicons) can "start small", with server needs proportional to the size of their sub-network. We will continue to prioritize functionality that ensures independent apps can scale down. The "Statusphere" atproto tutorial demonstrates this today, with a full-network AppView fitting on tiny server instances.

Design Goals

To some degree, I don't really want to spend time in a terminology debate. I don't think anybody's real goal is to have something "federated" for the sake of being federated. They either have more concrete properties that they are interested in ("can I self-host my own instances for my own needs"), or want to avoid centralization of power in a general sense. Shifting the discussion to more clearly assessable properties is healthy, I think.

Over the summer, I wrote a summary of Bluesky's progress on atproto on my personal blog: "Progress on atproto Values and Value Proposition". Christine identified "Credible Exit" as one of these key properties. Some of the other high-level goals mentioned there were:

Own Your Identity and Data
Algorithmic Choice
Composable Multi-Party Moderation
Foundation for New Apps

Any of these could be analyzed individually; I have my own self-assessment of our progress in the linked article.

One thing I'd be curious to see is an equivalent set of design goals for ActivityPub (or for Spritely's work, for that matter). This might exist somewhere obvious and I just haven't seen it. It might all differ for the distinct original projects and individuals which participated in the standards process.

Terminology

On the other hand, I do feel the need to defend the use of certain terms to describe our work to date.

So: is atproto "decentralized"? Is Bluesky "federated"?

One thing I would not debate is that much of the protocol and the network is reliant on Bluesky in material terms today. Almost everybody in the atproto network is hosted on a Bluesky PDS instance. Most self-hosted folks run the Bluesky PDS software. Most services use a Bluesky relay as a firehose. Most users run the Bluesky app. Most developers are working with the Bluesky-defined application schemas (Lexicons). I think a lot of people are waiting to see progress on some or all of these. There is a sense in which it is fair to say a system is or isn't federated or decentralized until it has been in a material sense.

But if Christine wanted to define things this way, she could have just cited some quick statistics and been done in a paragraph. I think what we are really discussing is whether the network architecture facilitates and lends itself to a "decentralized" or "federated" outcome.

One form of decentralization I could try to emphasize here is independent Lexicons. For example, this blog post is published using whtwnd.com, a blogging platform built on atproto. In theory the Bluesky relay or appview being down should not impact whtwnd (which I believe pulls directly from PDS instances). This is really great! One of our big goals is "Foundation for New Apps". But the ability to build new apps in the same network doesn't mean that any one specific app is decentralized or federated.

So down to brass tacks. Christine uses the following definitions:

Decentralization is the result of a system that diffuses power throughout its structure, so that no node holds particular power at the center.

[Federation] is a technical approach to communication architecture which achieves decentralization by many independent nodes cooperating and communicating to be a unified whole, with no node holding more power than the responsibility or communication of its parts.

Choosing my own fighter, I'm going to pull definitions from Mark Nottingham's excellent and very relevant RFC 9518: Centralization, Decentralization, and Internet Standards (2023):

[...] "centralization" is the state of affairs where a single entity or a small group of them can observe, capture, control, or extract rent from the operation or use of an Internet function exclusively.

[Decentralization is when] "complete reliance upon a single point is not always required" (citing Baran, 1964)

[...] federation, i.e., designing a function in a way that uses independent instances that maintain connectivity and interoperability to provide a single cohesive service.

I think that atproto maybe doesn't meet Christine's definition of decentralization: power is not diffused evenly throughout. There is not a permanent single center, but there may be power imbalances between participants.

On the other hand, I think atproto does meet Mark's basic definitions. Every major infrastructure component can be substituted without undo friction. There are significant design features to prevent capture, control, or extraction of rent of the overall atproto network.

What about federation? I do think that atproto involves independent services collectively communicating to provide a cohesive and unified whole, which both definitions touch on, and meets Mark's low-bar definition. It does involve many hosts, each with multiple accounts, all interoperating, which is a pattern I associate with federation, as a point on a spectrum between peer-to-peer and fully-centralized. On the other hand, atproto does have services like relays, which process messages on behalf of other services, which isn't a super common federation pattern. And there is no message-passing directly between hosts, which is another common federation pattern.

Overall, I think federation isn't the best term for Bluesky to emphasize going forward, though I also don't think it was misleading or factually incorrect to use it to date. An early version of what became atproto actually was peer-to-peer, with data and signing keys on end devices (mobile phones). When that architecture was abandoned and PDS instances were introduced, "federation" was the clearest term to describe the new architecture. But the connotation of "federated" with "message passing" seems to be pretty strong.

What would be a better term? At some point we started using "social web" more, and I think that matches the atproto architecture well. There is some tension around that term because it is used by the W3C Social Web Community Group, and the recently launched Social Web Foundation, both of which are ActivityPub / Fediverse projects.

Acknowledging Challenges

The article makes some accurate points that I want to explicitly acknowledge.

The chat/DMs service is totally centralized. We do want to do E2EE DMs at the protocol level, most likely using open standards (MLS), but it is going to be delicate work (not something we can or should rush). We have been pretty transparent about this, but if folks are confused in the wild then maybe we haven't done enough.

The vast majority of DID PLC accounts do not have independently controlled PLC rotation keys configured. The enabling technical mechanisms are all in place, and external developers are making early progress on tooling already. But it is going to be a bunch of work to make key management truly safe and accessible even for motivated power users.

Blocks on Bluesky are public. This is not what users want, not what we want, and a thorn in our side. There are some ideas floating around (as linked from the article), but nothing yet that we can run with. This is a tension with the structure and values of the protocol, and might persist even with private data features added to the protocol.

The cost of running a full-network, fully archiving relay has increased over time. After recent growth, our out-of-box relay implementation (bigsky) requires on the order of 16 TBytes of fast NVMe disk, and that will grow proportional to content in the network. We have plans and paths forward to reducing costs (including Jetstream and other tooling).

The PLC hash truncation situation isn't the greatest. As I remember it, we truncated because we wanted the identifiers to be compact and ergonomic: they end up in URLs and URIs, get passed around as parameters a lot, etc. They are intended for machines, but might need to be compared visually or typed out by humans in certain situations. The truncation length was originally flexible, which made the initial decision less high-stakes, but that led to a security incident, and they are now fixed-length. As a general theme, we probably should have used "domain separation" in more places across the protocol, both for hashing and cryptographic signatures. Some parts of the protocol could evolve gracefully to improve this. PLC is challenging to change, because of existing identifiers, but we may be able to extend and strengthen things "going forward" (new identifiers, and new ops on existing identifiers). Optimistically, formal standards bodies are a good venue for security review and improvements, and a good "milestone" for disruptive changes.

Nit Picks

There are some examples in the handle section mentioning handles like alyssa.bsky.app. bsky.app is the domain of the Bluesky web application, but we use *.bsky.social as the suffix for handles on our PDS instances, not *.bsky.app. Honestly, this one is kind of on us for having a sprawling set of domains/TLDs without obvious distinction! Regardless, this would be the one small thing I might recommend updating in the post, to reduce confusion.

The DID PLC section ends: "[...] there is still a problem in that Bluesky will always have control over that user's key, and thus their identity future." I'm not entirely sure what is meant by this, but the design of the rotation key mechanism is that accounts can entirely swap out any and all existing rotation keys. For example, folks who created accounts on the Bluesky PDS instances, then migrated to their own self-hosted instance, no longer have any Bluesky-controlled rotation keys in their identities. There is a 72 hour recovery period in the system, to help mitigate some attacks and accidents. But after that window, following the PLC rules, Bluesky PBC has no control over the identity. Technically, Bluesky PBC is still operating the PLC directory (we are actively exploring how to change this). However, there are already independent developers in the ecosystem who are mirroring and auditing the stream of operations from the directory, so any retroactive manipulation would be observed and called out.

Trust

In Christine's essay, she graciously states she thinks the Bluesky team has good people with good intentions, and specifically mentions Jay. Christine has been at this longer than most of us, and this carries a lot of weight. Thank you!

I very much think the same is true of the ActivityPub ecosystem, and Christine as an individual. The community is full of amazing and principled people who have achieved pretty incredible things.

The Bluesky team has a different theory of change, prioritized different design goals, and is taking some risks. I still think we are points in the same overall design space, and that respectful dialog is beneficial to all.

There is one aspect of this I feel mixed about: I don't want folks to trust Bluesky, or believe in atproto, just because of the people on the team. Or because of our track record to date. Teams and individuals change over time, and we mean it seriously when we say "the company is a future adversary". The bar we are shooting for is to convince people that atproto is legitimate and useful even if Bluesky and the team adopt the worst of intentions. We have a lot of work to earn that kind of trust in the protocol, but it will be all the more meaningful if the goalposts don't move.

Parting Thoughts

Bluesky is designed for big world public conversations. We started with a specific set of design principles, including "credible exit" and "no single party should moderate the entire network". In other words, preventing permanent centralization of the network in the long run. These principles resulted in an architecture where every major component can be substituted: PDS hosting, relays, and AppViews. Custom feeds and moderation services can simply be reconfigured.

We are pragmatic, and have prioritized functionality and user experiences which are competitive with incumbent centralized platforms. Others might make different design tradeoffs and have different outcomes. There are still plenty of tasks and milestones on the checklist. But we have made consistent progress on our protocol design principles, while continuing to ship product features and support a growing user community at the same time.

The stakes are high, but I think we have a real shot to collectively end the era of centralized social platforms.

@bnewbold.net

2024-11-27T02:50:54.668Z