Self-referential/cyclic structure is both necessary and possible in content-addressed protocols, but...

Content-addressing is good. It's a way of giving objects a unique global ID that anyone can verify on their own is the correct ID for that object. This can cut down on a lot of network requests and really increase the long-term resilience of the data and the openness/fairness/cheapness/security of the hosting provider arena.

I want the APC protocol to have content-addressing right away, because another thing it gives us is a way of assigning globally unique IDs locally without having to wait on a dialog with the network. With content-addressing, a user can create an object and start writing edits against it before the network knows anything about it.

Content-addressed structures are immutable, if you change the content, the ID changes too and so it will not be identified as the same object. So how do we have mutation? A mutable AP object's ID is the content-address of the definition of that object. That definition states the initial state of the object, and also states the mutation mechanism. For instance, the definition might say that the object is hosted by a particular AP host (ie, that this host will be trusted to report the whole edit chain and most recent state), but it may also say that any edit must be signed by another Identity who 'owns' the object, who the host is hosting the object on behalf of. It may also constrain the type of the object, and which edits are allowed.

Another limitation is that content-addressed structures necessarily cannot contain their own content-address, nor the content-addresses of any object that contains their content-address. Cyclic/quined objects are physically possible, but computationally infeasible to discover. A graph linked together by content-addresses will never contain a cycle, it will always be a tree. Dealing with this shortcoming is a much trickier problem.

In the process of designing APC, the need for cyclical structures arrived literally immediately from multiple directions:

If you want a distributed type system, which you should — because types/schemas are good and they're the kinds of things you should want immutable ownerless schemas for — you will need to define the intrinsic TypeType, which, of course, is its own type, so has to have a reference to itself in its type field. We could separate type tag from value (which could potentially make sprue encoding much more compact...), in which case this could be handled unproblematically as TypedObjectRef { type: #TypeType, value: #TypeType }, but...
Self-referential types are not rare. A LinkedList must refer to an Option<LinkedList>. A recoverable Identity has to refer to social_recovery_authorities: List<Identity> (though tbh it would be rare to define an Identity with the social_recovery_authorities already decided).
You also need cycles to define even the most intuitive permissioning configurations for shared mutable objects.
- Let's imagine such an object. One of the permission lists is the read permission list. The people on the read list are entitled to subscribe to be notified of future updates to that object. What's the read permission list of the read permission list? Usually it should be the same read permission list. It is its own readlist. And who's on the add new read permission permission list of the reader list? It's the reader list itself again. What about the readlist's write list? (who has permission to remove people from the read list?) Well this time it should probably be the object's write list. But the write list should also be its own add list. And then there's an owner list that has the ability to add and remove writers and readers, but who owns the owner list? Usually it should be the owner list itself, but crucially, it isn't always! We don't want to hard-code any this structure as intrinsic, because sometimes you'd want the owner of the owner list to be a separate oversight group, or sometimes you'd want the lists to not be readable to anyone except owners. We need the data to be able to express these kind of structures, we need all of these reference cycles and knots even just to define the most obvious reasonable intuitive permissioning behavior a shared mutable object would have.
- Though... again, how often would a user define a mutable object with all of these things already configured in the initial state?.. Hmm.

So we're screwed then, aren't we? We could maybe tolerate just using a self hack for the TypeType, or defining the whole thing as an intrinsic, but we'd have to pretty much depart from content-addressing as soon as we have any kind of mutability, yeah? (spoiler: no!)

The simplest workaround is to just let mutable objects be defined with no initial value. A mutable object would be defined using a content-addressed definition of that object. After defining them, their content-ID will then be known, and then we can define our self-referrential initial value and insert that by publishing a mutation. Unfortunately the content-hash is kind of meaningless here. We wouldn't be able to put the initial state in the definition being addressed, nor the permissioning objects (because they're recursive too), so, since we've said very little about the object in its definition, we're completely trusting the mutation system/active hosts to tell us whether the current value they're serving us against this ID is correct. I hate this workaround. Especially for types, which aren't even supposed to be mutable. The prospect of being able to put a detailed prescription of how a mutable object would be managed into its definition object was exciting and with this would basically be the end of that.

So I ruminated on the problem for a bit, and I noticed another way of doing it:

A "knot" is a list of objects, defined together, that can refer to each other using lknot refs.

#a64029 = (knot
   (twin_brother
      (name mario)
      (other_brother (lknot 1)))
   (twin_brother
      (name luigi)
      (other_brother (lknot 0)))
)

the above structure, illustrated

We can then refer to luigi with (knotref #a64029 1) and this will resolve as a pair of objects who refer to each other, immutably, forever.

Some more examples

(knot
   (type
      (name type_type)
      (my_type (lknot 0)))
)

(actor
   (initial_value (bool false) (name multiplayer_toggle))
   (owners (knotref #list_knot 0))
   (writers (knotref #list_knot 1))
   (readers (knotref #list_knot 1)))

#list_knot =
(knot
   (actor
      (initial_value (permission_list
         (name owner_list)
         (permits #mako)
      ))
      (owners (lknot 0))
      ((method remove) (lknot 0))
      ((method add) (lknot 0))
      (subscribers (lknot 1)))
   (actor
      (initial_value (permission_list
         (name reader_writer_list)
         (permits #janice #craig)
      ))
      (owners (lknot 0))
      ((method remove) (lknot 0))
      ((method add) (lknot 1))
      (subscribers (lknot 1)))
)

AP programmers shouldn't have to think about knot encoding most of the time. When a new object is serialized, cycles can be detected automatically by just keeping a hashset of the ptr of each object that's included. When a cycle is detected, affected objects will be cut out of the atomic or sprue-batch encodings and encoded as a knot instead. Their IDs will point to knotrefs rather than raw objects.

But I'm not sure how cleanly we can abstract this away.

For instance, if you serialize the twin brothers starting from luigi instead of from mario, you get a different knot object, because the brothers' order in the knot would be different (getting the same knot from any starting point would require a solution to the graph isomorphism problem). That's a little weird. It's possible it introduces problems with canonicalization.

Building on that, another nasty surprise would be if someone made the mistake of including a graphic relation in one of their types (for instance friends) and then tried to batch encode it immutably. Each time they serialized a friend node, it would unexpectedly produce a large knot object encompassing roughly every node in the graph.

But I'm not sure anyone would ever do this accidentally. When you're building an object graph and the objects don't even have IDs yet, you're going to know it!

And large knots aren't large problems. If people want to only view a small part of an immutable graph, say, a single node, without resolving its friends, we can store the knot list as a merkle search tree and then we'll only have to send a logarithmic number of node addresses to confirm the node's presence in the knot. Note that a knot only has to encompass objects that are both pointing to and pointed from other knot objects. Knot objects can point out to non-knot objects, and of course knotrefs allow pointing in.

We could also make the object constructor method throw if it's called on a cyclic structure without maybe_knotted: true being set. I guess as long as most programmers don't need to make knotted structures then they don't necessarily have to know about that setting either. It's also mildly useful to provide cycle-checking just in case people need that at some point.

And a uplink.publish_batch_reference(List<Obj>)-> List<ObjRef> method could handle all cases cleanly. Any objects in the list that references an object that isn't in the list would be required to either be acyclic or to otherwise already have IDs. The refs returned could be knotrefs or regular refs, the programmer wouldn't need to know.

Knot objects still seem a bit weird to me, so I'd appreciate a look from others before I commit to having them. What do you think? Would this be weird and horrible? But why though?

I think there might not be anything weird here at all. If two objects point to each other, they are necessarily part of the same module, they share a unity. Whoever defined them must know about both of them and can define them together. It seems unproblematic, to me, for such objects to have to share a knot structure.

@makoconstruct.bsky.social

2024-12-11T05:50:35.214Z

mako

Post reaction in Bluesky

Reactions from everyone (0)

mako