0
0
0

Introduction to atproto 1: What is PDS? What Features Does It Have?
-WhiteWind TechTalk-

Last updated on 2024-04-05

Hello, I'm K-NKSM, a developer at WhiteWind who created this blog system! Recently, the microblogging site Bluesky has been getting a lot of attention. It's well-known that Bluesky uses the Authenticated Transfer Protocol (AT Protocol, atproto) behind the scenes. However,

  • The standards page is too abstract and doesn't give a clear picture
  • It's hard to grasp the novelty of atproto
  • There's a desire to create something with atproto but moving forward is difficult without concise sample code or a Getting Started guide

Many people might feel this way. This series aims to demystify atproto, starting with its basic mechanisms understandable even to non-engineers, through to setting up a development environment for atproto, designing AppView, and using WhiteWind as an example to help resolve the above questions.

In this first part, we will discuss the characters and features of atproto.

Below is a simplified overview of Bluesky's operation, focusing on the essential components for a basic understanding of atproto.

Let's look at the roles of each component.

Bluesky client

This refers to apps that users can operate from smartphones or computer browsers. It sends data like posts or likes to the server based on user operations.

PDS

PDS is cloud storage with an API

PDS stands for Personal Data Server, which saves data like posts or likes based on commands from the client. PDS has a dedicated space for storing each user's data. This space is called a repository (repo).

PDS receives atproto data from the client and saves it to the repository, but PDS itself fundamentally does not understand what data it's being asked to store. Following a certain format, it's possible to store data unrelated to Bluesky that the user has created on their own. This is permissible because PDS is not Bluesky. I often describe PDS as "cloud storage with API functionality." Like cloud storage, you can store anything you want, whether it's created with Google products or not, and whether it's images or PDFs. Just as there are various cloud storage options like Google Drive or OneDrive, PDS can be operated by people completely unrelated to Bluesky, hence it allows for the storage of any data one likes. However, as of April 2024, Bluesky, the developer of atproto, operates most of the PDS.

All data is public

All data stored on PDS is publicly available worldwide. PDS implementations that require authentication for reading specific data are not prohibited by the standard, but no such PDS implementations exist at the moment.

Relays requests from clients to AppView

PDS also has the role of relaying requests from clients to AppView.

PDS can be switched

atproto provides a mechanism for users to switch their PDS, allowing users to continue using the same account even after switching PDS. This means the account is managed separately from the repository on PDS. This is fundamentally different from, for example, downloading all posts from X/Twitter, deleting the account, and starting over with a new account to repost or refollow. The mechanism for atproto accounts will be explained next time.

A more detailed overview of PDS

Relay (firehose, BGS)

Relay is constantly connected to all PDS worldwide. If a repository on PDS undergoes any changes, it notifies Relay of what changed. Relay receives that data and streams it to those connected to it. Like existing internet mechanisms, Relay resembles entities that crawl all kinds of websites, like Google or Bing, absorbing data.

There can be many Relays worldwide

Just as there are various search services like Google or Bing, there can be many Relays. However, due to their role of connecting to all PDS and distributing data to numerous downstream systems, it's expected that they require significantly powerful infrastructure. It's likely that only organizations with a certain scale of financial resources can operate them.

Anyone can connect to Relay

Relay is a public infrastructure that anyone can freely connect to, allowing access to all atproto data without having to crawl each PDS individually.

AppView

AppView is connected to Relay and watches all atproto data flowing from around the world. AppView understands the content of the data streaming in, processing it to create timelines or notifications among other data handling tasks. For example, the Bluesky AppView filters out data related to Bluesky, lists posts from specific users, and creates timelines for users being followed, providing features that require knowledge of global atproto data, such as user interactions and the actions of other users. This component is what is referred to as "atproto service."

There can be many AppViews worldwide

Similar to Relay, there can be many AppViews worldwide. If an AppView decides to shadow-ban a specific user or forcefully push unnecessary information, it's possible to switch to a different AppView.

What happens with this setup?

I've explained the basic roles of each component. So, what happens with such a component setup?

Data accumulates on the user's server

As mentioned before, users' data is stored on PDS. PDS is akin to cloud storage, allowing users to freely manipulate their data, essentially acting as "the user's server." Thus, the data of atproto services is placed on the user's server. It won't be arbitrarily deleted or altered by the service operator, nor will it become invisible to the world.

AppView does not manage users' data

Although this is essentially repeating the previous point with different wording, the perspective changes significantly. As a trade-off for users owning and managing their data, service operators do not manage the data for them. In centralized service configurations, user-generated data accumulates on the service's server, so the service operator would pay for:

  • Data storage costs
  • Network bandwidth costs associated with data delivery
  • Backups
  • Availability maintenance

With atproto, because users control their data, PDS pays these costs instead. This is a significant advantage from the perspective of service operators.

Competing services can be easily created

Since all AppViews have access to the same data, it's impossible to gain a competitive advantage by monopolizing data. Therefore, anyone (given they have the money and time) can create a social networking service to compete with Bluesky.

Business Model

For the ecosystem to develop sustainably over the long term, a profitable business model for everyone involved is necessary. How will each component make money? The following are my personal speculations (from a business novice).

PDS

Thinking of PDS as "cloud storage with an API" that hosts data and responds to queries from Relay and others, maybe a model where a certain amount of data storage is free, and exceeding that limit requires a subscription to a professional plan could work. Additionally, a system where those who wish to distribute content to a large audience or benefit from wide visibility, like "influencer plans," pay for infrastructure costs according to bandwidth and availability might be interesting. Although it's conceivable to have PDS that understand specific records and insert ads into responses to clients, considering the variety of records beyond Bluesky and WhiteWind that could exist in the future, this might not scale well.

Relay

It's unclear how Relay would make money. Maybe it could contract with downstream systems to charge beyond a certain traffic volume.

AppView

Since AppViews often develop clients as a set, inserting ads on the client side or mixing ads into the delivery data is possible. The structure of record data inevitably becomes public, so anyone can easily develop a client with ad-blocking features. However, since the operational and infrastructure costs traditionally borne by the service side are paid by PDS, the necessary amount of advertising could be less.

Summary

  • In atproto, there are three main components: PDS, Relay, and AppView.
  • PDS serves as the user's server, where data can be freely placed.
  • Relay constantly absorbs data from all PDS.
  • AppView receives data from Relay, processes and interprets it, performing indexing, data transformation, and user interaction, among other tasks.

Next time, we plan to explain the account system in atproto.