These are some informal notes on setting up a full-network atproto Relay, using the bigsky
relay software developed by Bluesky. This is the same software we run ourselves at https://bsky.network. The focus here is on the compute resources necessary to replicate the type of full-network, full-featured service that Bluesky currently operates with the size of the network that exists today.
The demo Relay described here is running at relay-ovh.demo.bsky.dev
. It handles crawling and re-publishing the full network firehose, with headroom for traffic spikes and growth on a number of dimensions: new accounts (repos), new PDS instances, more content being created (firehose event rate), number of services consuming from the firehose, etc.
Changes
Running through this demo setup turned up some sharp edges and missing configuration knobs in bigsky
. For example, trying to backfill the full network with default configuration was resulting in OOM errors with an instance this size. Tweaks and configuration have been merged to the main
branch of bigsky
, along with additions to the README.
Network Scaling
How big can this instance scale? Hard to tell exactly, my guess is that it could do an order of magnitude more event rate, but will run out of disk before too long (eg, in the next year).
There are a number of possibilities for improving Relay efficiency to make this kind of service cheaper: There are implementation details (like using alternative database engines, or not storing repo data as millions of small files on disk). Data and work could be sharded across multiple machines. Not every Relay needs to crawl and mirror the entire network, and Relays could potentially be simplified to not maintain a full mirror of network content. On the other hand, actually running critical services would have a number of needs not covered here: legal and administrative burdens, monitoring and alerting, etc.
Shopping for an Server
My assumption is that the main thing needed is a relatively large and reasonably fast disk. The Bluesky production Relays currently use about 1 TByte for PostgreSQL and 1 TByte for CAR storage on local disk. The CAR storage filesystem should also be XFS (not ext4), to handle many millions of small files. When shopping for instances I looked for around 2 TByte for PostgreSQL and 2 TByte for CAR storage so that this setup would be realistic even with growth of the network over time. This storage could all be one disk/filesystem or two separate disks/filesystems. Regular SSD would probably work fine, NVMe is nice, especially for backfill.
More RAM always helps (page cache and other caches). Don’t need much CPU; the relay process is highly concurrent, but mostly I/O bound. Do want decent network monthly quota.
Disk is definitely the hard part. Network block storage (eg, AWS EBS) is pretty expensive even from cheaper providers, and usually costs more monthly than an entire bare metal instance with larger disks. Bare metal instances are mostly spinning disk or NVMe, not SSD; I assume that spinning disk isn't realistic for a fast backfill demo.
I ended up selecting an OVH instance with for about $150/month:
ADVANCE-2-LE
: https://www.ovhcloud.com/en/bare-metal/advance/adv-2/- 12 vCPU (Intel Xeon-E 2136 - 6c/12t - 3.3 GHz/4.5 GHz)
- 32 GB RAM (32 GB ECC 2666 MHz)
- disks: 2×1.92 TB NVMe
- 1Gbit/s unmetered and guaranteed
- $152/month plus one-time $92 setup fee (no commitment)
That exact config isn’t available now (a week later), but a very similar one is:
ADVANCE-1
: https://www.ovhcloud.com/en/bare-metal/advance/adv-1/- 12 vCPU (AMD EPYC 4244P - 6c/12t - 3.8GHz/5.1GHz)
- 32 GB RAM (32GB DDR5 ECC 5200MHz)
- rootfs disk: 2x NVMe 960GB (RAID)
- data disks: 2x 1.92TB NVMe
- 1Gbit/s unmetered and guaranteed
- $153/month plus one-time $93 setup (no commitment)
In both cases the setup fee is waved with a 6 month commitment, and there are discounts on the monthly rate with longer commitments.
Host Provisioning
Using the OVH web interface, provisioned the server with Ubuntu 24.04. With the ADVANCE-2-LE
host, I specified partitioning to not use RAID. I let the setup wizard use one of the two disks for rootfs, boot, and swap. With all defaults this resulted in ext4. The second disk was not partitioned or configured using the wizard (I got to that later on the server itself).
Configured a DNS A record to point at the IPv4 that OVH gave us.
Logged in to the server and ran commands similar to this:
hostnamectl hostname relay-example.demo.bsky.dev
apt update
apt upgrade
apt install ripgrep fd-find dstat htop iotop iftop pg-activity httpie caddy golang postgresql yarnpkg
# set up yarn command; could also have used nvm
ln -s /usr/bin/yarnpkg /usr/bin/yarn
# punch holes in default firewall for HTTP/S
ufw allow 80/tcp
ufw allow 443/tcp
Ran through partitioning of the second NVMe with XFS. Note that on a real machine you'd want to set up fstab
so this mounts automatically on a reboot.
# create a partition
sudo fdisk /dev/nvme1n1
# c (create), default (primary), default (1), default (start sector), default (entire disk), w (write)
# create XFS filesystem on that partition
sudo mkfs.xfs /dev/nvme1n1p1
# mount that filesystem to /data
sudo mkdir -p /data
sudo mount /dev/nvme1n1p1 /data
Pull the indigo codebase and build; ran this as the ubuntu
user not root
:
# depending on user that will be running the service
mkdir -p /data/bigsky
mkdir -p /data/bigsky/events
sudo chown ubuntu:ubuntu /data/bigsky/
sudo chown ubuntu:ubuntu /data/bigsky/events
# pull source code and build. if you had patches or a working branch, would modify here
cd
git clone https://github.com/bluesky-social/indigo
cd indigo
make build-relay-ui build
Configure PostgreSQL (sudo -u postgres psql
); replace CHANGME with a secure password of your choice:
CREATE DATABASE bgs;
CREATE DATABASE carstore;
CREATE USER bigsky WITH PASSWORD 'CHANGEME';
GRANT ALL PRIVILEGES ON DATABASE bgs TO bigsky;
GRANT ALL PRIVILEGES ON DATABASE carstore TO bigsky;
# these are needed for newer versions of postgres
\c bgs postgres
GRANT ALL ON SCHEMA public TO bigsky;
\c carstore postgres
GRANT ALL ON SCHEMA public TO bigsky;
Create a config file at ~/indigo/.env
:
ENVIRONMENT=production
DATABASE_URL="postgres://bigsky:CHANGEME@localhost:5432/bgs"
CARSTORE_DATABASE_URL="postgres://bigsky:CHANGEME@localhost:5432/carstore"
DATA_DIR=/data/bigsky
RELAY_PERSISTER_DIR=/data/bigsky/events
GOLOG_LOG_LEVEL=info
# or whatever DNS you want to use for handle resolution
RESOLVE_ADDRESS="8.8.8.8:53"
FORCE_DNS_UDP=true
RELAY_COMPACT_INTERVAL=0
RELAY_DEFAULT_REPO_LIMIT=500000
# these were somewhat tuned to this instance size
MAX_CARSTORE_CONNECTIONS=12
MAX_METADB_CONNECTIONS=12
MAX_FETCH_CONCURRENCY=25
RELAY_CONCURRENCY_PER_PDS=20
RELAY_MAX_QUEUE_PER_PDS=200
#RELAY_ADMIN_KEY=CHANGEME
UPDATE: renamed BGS_COMPACT_INTERVAL
to RELAY_COMPACT_INTERVAL
, and added RELAY_PERSISTER_DIR
.
With the RELAY_ADMIN_KEY
set to a strong random value, and DATABASE_URL
substituted to the earlier database password. You can create one with:
openssl rand -base64 30
Create a system-wide Caddy config at /etc/caddy/Caddyfile
. Substitute in your hostname, and comment out any other lines in the file:
relay-example.demo.bsky.dev {
reverse_proxy 127.0.0.1:2470
}
Restart caddy: sudo systemctl restart caddy
Running bigsky
and Backfilling
Run the actual service! For example, in a screen
session, or a service management tool of your choice:
cd ~/indigo
./bigsky --api-listen 127.0.0.1:2470
Confirm that everything is working by connecting using the gosky
command from a laptop (which is in the indigo
repo). Won’t get events (because Relay hasn't subscribed to anything yet), but should connect successfully:
gosky readStream wss://relay-example.demo.bsky.dev
You can also connect to the web management interface at https://relay-example.demo.bsky.dev/dash. This lets you view basic stats per PDS, modify limits, add new PDS instances to crawl, takedown individual repos (by DID), block PDS instances by domain suffix, etc.
To start backfills from a laptop, create a hosts.txt
file with PDS hostnames, then run initial crawl command. Can do this from a laptop:
cd ~/indigo/cmd/bigsky
export RELAY_ADMIN_KEY=CHANGEMESECRET
export RELAY_HOST=relay-example.demo.bsky.dev
cat hosts.txt | parallel -j1 ./crawl_pds.sh {}
Let that bake for a few hours or overnight. Only accounts with new commits will get backfilled. A 24 hour period is usually around 10% of the network.
Then can start explicit backfills per-PDS ("resync"). This will pull a complete list of DIDs hosted on the PDS (or at least, which the PDS thinks are still hosted on the PDS, this might not yet handle migrations). Don’t want to do full PDS backfills for all the big PDS instances at once, or the relay will get overwhelmed (eg, OOM). Instead, do 4-8 at a time, modifying hosts.txt
or the head
command as needed:
head -n 4 hosts.txt | parallel -j1 ./sync_pds.sh {}
# check progress
head -n 4 hosts.txt | parallel -j1 ./sync_status_pds.sh {}
Smaller self-hosted instances can be backfilled in big batches (eg, hundreds of backfills at the same time).
While running backfill, some new PDS instances will be discovered, even if not crawled specifically, and even with “spidering” disabled. My guess is that accounts which migrate away from our PDS instances are still listed by the original PDS. When bigsky
does a backfill, it resolves all the DIDs, and sees a different PDS in the atproto service entry, and adds it to the PDS list.
The entire backfill took a couple days of casual checking in and poking it along.
How does one get a complete list of PDS instances in the network? It would be helpful if Relays had a public endpoint to scroll through all known PDS instances, and indicate if they are active, blocked/suspended, and roughly how many repos there are. For now, you can pull hostnames from public listings like https://blue.mackuba.eu/directory/ and https://bsky-debug.app/. Or scrape the complete DID PLC directory (which is public and enumerable), and extract all PDS service endpoints.
Rough Performance Stats
Here are some quick/informal system performance snapshots. The backfill period (fetching all previous repo content from the network) is far more resource intensive than steady operation.
I didn't run any compactions manually, and disabled automatic/periodic compactions. These are resource intensive to process, but free up disk and database space. Compactions are a feature specific to bigsky
and it's data storage system.
UPDATE: you can re-enable compactions by editing the RELAY_COMPACT_INTERVAL
environment variable. The default is 4h
; it is disabled (set to zero) in the template env file above.
During early phase of backfill:
# dstat
----total-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
32 16 44 5 0| 17M 1410M| 20M 879k| 0 0 | 107k 173k
52 13 29 4 0| 13M 1495M| 18M 765k| 0 0 | 134k 117k
74 13 13 1 0| 20M 623M| 18M 722k| 0 0 | 47k 62k
26 14 52 6 0| 12M 1610M| 14M 613k| 0 0 | 133k 120k
51 16 30 2 0| 19M 928M| 18M 813k| 0 0 | 55k 118k
29 16 47 6 0| 15M 1842M| 13M 587k| 0 0 | 133k 114k
26 14 55 3 0| 16M 1124M| 12M 537k| 0 0 | 69k 165k
24 15 52 7 0| 14M 1600M| 13M 575k| 0 0 | 131k 138k
30 14 51 3 0| 16M 1041M| 18M 786k| 0 0 | 62k 155k
20 14 57 7 0| 13M 1719M|8916k 406k| 0 0 | 137k 121k
# pg_analyze (as postgres user)
PostgreSQL 16.3 - relay-ovh - postgres@/var/run/postgresql:5432/postgres - Ref.: 2s -
* Global: 38 minutes uptime, 12.54G dbs size - 14.84M/s growth, 90.60% cache hit ratio
Sessions: 81/100 total, 42 active, 39 idle, 0 idle in txn, 0 idle in txn abrt, 0 waiting
Activity: 4382 tps, 74089 insert/s, 164 update/s, 0 delete/s, 28383 tuples returned/s, 0
* Worker processes: 0/8 total, 0/4 logical workers, 0/8 parallel workers
Other processes & info: 0/3 autovacuum workers, 0/10 wal senders, 0 wal receivers, 0/10
* Mem.: 31.12G total, 756.90M (2.38%) free, 14.14G (45.44%) used, 16.24G (52.19%)
Swap: 512.00M total, 511.00M (99.80%) free, 1.00M (0.20%) used
IO: 155846/s max iops, 2.15K/s - 0/s read, 608.78M/s - 155846/s write
Load average: 8.19 7.33 5.16
I don’t have stats, but at a later phase of backfill, I/O wait was pretty high and disk read/write were more symmetrical around 500MB/sec (should have taken a snapshot of that!), and CPU wait was only single-digit.
After all major backfills, just cruising along at a normal firehose subscription:
# dstat
----total-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
1 1 97 1 0|5794k 8046k| 232k 6851B| 0 0 |4439 7848
1 0 98 0 0|3615k 7399k| 219k 5071B| 0 0 |4062 7163
1 1 98 0 0|4645k 13M| 198k 5184B| 0 0 |4473 7194
1 0 98 0 0|4831k 8142k| 242k 7273B| 0 0 |4174 7581
1 0 98 0 0|3264k 7092k| 178k 4784B| 0 0 |3625 6619
1 1 98 0 0|3564k 7336k| 153k 3394B| 0 0 |3253 5502
1 0 98 0 0|4930k 9139k| 239k 6719B| 0 0 |4119 7364
2 1 97 1 0|6430k 14M| 313k 10k| 0 0 |6372 13k
1 0 98 0 0|3359k 7422k| 172k 5255B| 0 0 |3670 6860
1 0 98 0 0|3929k 10M| 206k 7088B| 0 0 |4036 8954
1 1 98 0 0|3732k 7560k| 212k 6173B| 0 0 |3789 6771
1 1 97 1 0|5694k 8819k| 267k 7758B| 0 0 |4630 8511
1 0 98 0 0|3480k 11M| 175k 4565B| 0 0 |3764 5758
# pg_analyze
PostgreSQL 16.3 - relay-ovh - postgres@/var/run/postgresql:5432/postgres - Ref.: 2s -
* Global: 4 days, 22 hours and 23 minutes uptime, 445.38G dbs size - 162.93K/s growth, 79.55% cache hit ratio
Sessions: 22/100 total, 1 active, 21 idle, 0 idle in txn, 0 idle in
Activity: 514 tps, 912 insert/s, 0 update/s, 0 delete/s, 1028 tuples returned/s, 0 temp files, 0B temp size
* Worker processes: 0/8 total, 0/4 logical workers, 0/8 parallel workers
Other processes & info: 0/3 autovacuum workers, 0/10 wal senders, 0
* Mem.: 31.12G total, 676.83M (2.12%) free, 17.36G (55.78%) used, 13.10G (42.09%) buff+cached
Swap: 512.00M total, 608.00K (0.12%) free, 511.40M (99.88%) used
IO: 0/s max iops, 0B/s - 0/s read, 0B/s - 0/s write
Load average: 0.46 0.44 0.38
# df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.2G 1.6M 3.2G 1% /run
efivarfs 192K 37K 151K 20% /sys/firmware/efi/efivars
/dev/nvme0n1p3 1.8T 452G 1.2T 28% /
tmpfs 16G 1.1M 16G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
/dev/nvme0n1p2 974M 182M 725M 21% /boot
/dev/nvme0n1p1 511M 5.2M 506M 2% /boot/efi
/dev/nvme1n1p1 1.8T 722G 1.1T 41% /data
tmpfs 3.2G 12K 3.2G 1% /run/user/1000
# df -i (inodes)
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 4078568 1019 4077549 1% /run
efivarfs 0 0 0 - /sys/firmware/efi/efivars
/dev/nvme0n1p3 117080064 230862 116849202 1% /
tmpfs 4078568 3 4078565 1% /dev/shm
tmpfs 4078568 3 4078565 1% /run/lock
/dev/nvme0n1p2 65536 603 64933 1% /boot
/dev/nvme0n1p1 0 0 0 - /boot/efi
/dev/nvme1n1p1 187537280 25138236 162399044 14% /data
tmpfs 815713 32 815681 1% /run/user/1000
# sudo du -sh /var/lib/postgresql/16/
447G /var/lib/postgresql/16/
# lsblk (for reference)
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 1 0B 0 disk
sr0 11:0 1 1024M 0 rom
nvme1n1 259:0 0 1.7T 0 disk
└─nvme1n1p1 259:7 0 1.7T 0 part /data
nvme0n1 259:1 0 1.7T 0 disk
├─nvme0n1p1 259:2 0 511M 0 part /boot/efi
├─nvme0n1p2 259:3 0 1G 0 part /boot
├─nvme0n1p3 259:4 0 1.7T 0 part /
├─nvme0n1p4 259:5 0 512M 0 part [SWAP]
└─nvme0n1p5 259:6 0 2M 0 part
PostgreSQL table sizes:
bgs=# SELECT
table_name,
pg_size_pretty(table_size) AS table_size,
pg_size_pretty(indexes_size) AS indexes_size,
pg_size_pretty(total_size) AS total_size
FROM (
SELECT
table_name,
pg_table_size(table_name) AS table_size,
pg_indexes_size(table_name) AS indexes_size,
pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
FROM information_schema.tables
WHERE table_schema != 'pg_catalog' AND table_schema != 'information_schema'
) AS all_tables
ORDER BY total_size DESC
) AS pretty_sizes;
table_name | table_size | indexes_size | total_size
-------------------------------+------------+--------------+------------
"public"."repo_event_records" | 18 GB | 541 MB | 18 GB
"public"."actor_infos" | 987 MB | 993 MB | 1980 MB
"public"."users" | 749 MB | 1060 MB | 1809 MB
"public"."pds" | 3752 kB | 32 kB | 3784 kB
"public"."auth_tokens" | 16 kB | 48 kB | 64 kB
"public"."slurp_configs" | 16 kB | 32 kB | 48 kB
"public"."feed_posts" | 8192 bytes | 24 kB | 32 kB
"public"."vote_records" | 8192 bytes | 16 kB | 24 kB
"public"."follow_records" | 8192 bytes | 16 kB | 24 kB
"public"."domain_bans" | 8192 bytes | 16 kB | 24 kB
"public"."repost_records" | 8192 bytes | 8192 bytes | 16 kB
(11 rows)
carstore=# SELECT
table_name,
pg_size_pretty(table_size) AS table_size,
pg_size_pretty(indexes_size) AS indexes_size,
pg_size_pretty(total_size) AS total_size
FROM (
SELECT
table_name,
pg_table_size(table_name) AS table_size,
pg_indexes_size(table_name) AS indexes_size,
pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
FROM information_schema.tables
WHERE table_schema != 'pg_catalog' AND table_schema != 'information_schema'
) AS all_tables
ORDER BY total_size DESC
) AS pretty_sizes;
table_name | table_size | indexes_size | total_size
-----------------------+------------+--------------+------------
"public"."block_refs" | 192 GB | 217 GB | 409 GB
"public"."car_shards" | 4011 MB | 4088 MB | 8098 MB
"public"."stale_refs" | 6245 MB | 576 MB | 6821 MB
(3 rows)