Content-addressing is good. It's a way of giving objects a unique global ID that anyone can verify on their own is the correct ID for that object. This can cut down on a lot of network requests and it's a qualitative improvement in the resilience of the data, and the openness/fairness/cheapness/security of the hosting provider arena.
I want the modular types (MT) protocol format/language to have content-addressed IDs right away, because another thing it gives us is a way of assigning globally unique IDs locally without having to wait on a dialog with the network. With content-addressing, a user can create an object and start writing edits against it before the network knows anything about it.
Content-addressed structures are immutable, if you change the content, the ID changes too and so it will not be identified as the same object. So how do we have mutation? A mutable MT object's ID is the content-address of the definition of that object. That definition states the initial state of the object, and also describes the way mutations can be done. For instance, the definition might say that the object is hosted by a particular host (ie, that this host will be trusted to report the whole edit chain and most recent state), but it may also say that any edit must be signed by another Identity who 'owns' the object, who the host is hosting the object on behalf of. It may also constrain the type of the object, and which edits are allowed.
Another limitation is that content-addressed structures necessarily cannot contain their own content-address, nor the content-addresses of any object that contains their content-address. You don't have the content-id until you've already fixed the content. Put the content ID in and and the content will no longer be what it was. A graph linked together by content-addresses will never contain a cycle, it will always be a DAG.
This turns out to be a problem, because it doesn't take much foresight to realize that we need cyclical structures. When designing modular types, the need arrived literally immediately from multiple directions:
- It would be nice to be able to have content-addressed web pages that can link out to pages that also happen to link back to it.
- If you want a distributed type system, which you should — because types/schemas are good and they're the kinds of things you should want immutable ownerless schemas for — you will need to define the intrinsic
TypeType
, which, of course, is its own type, so has to have a reference to itself in its type field. We could separate type tag from value (which could potentially make sprue encoding much more compact...), in which case this could be handled unproblematically asTypedObjectRef { type: #TypeType, value: #TypeType }
, but... - Self-referential types are just really common. A
LinkedList
must refer to anOption<LinkedList>
. A recoverableIdentity
has to refer tosocial_recovery_authorities: List<Identity>
. - The Type type is also an enum type. We can't define our enum types without cycles, because an enum type definition basically consists of a list of its allowed variant types, but the variants inherit (or can be casted to) the parent enum (it's a bit rare in a type system for enum variants/cases to inherit the parent enum, but it's so natural, I don't know why it's rare, I think it shouldn't be.
Some
is anOption
imo! :<). - You also need cycles to define simple permissioning configurations for shared mutable objects.
- There's this very simple paradigm where every object has lists of who's allowed to edit that object (The actual edits might be processed using capabilities, but revocable capabilities are only possible if you're keeping a record of what was issued and to who, and holding editors acountable is much easier if everyone can see the list, so an access list is just a really natural UX), and the lists are just permissioned objects themselves, which have their own permissioned lists.
- Let's imagine such an object. One of its permission lists is the read permission list. The people on the read list are able to subscribe to be notified of future updates to that object. What's the read permission list of the read permission list? Usually it should be the same read permission list. It is its own readlist. And who's on the add new read permission permission list of the reader list? It's the reader list itself again. What about the readlist's write list? (who has permission to remove people from the read list?) Well this time it should probably be the object's write list. But the write list should also be its own add list. And then there's an owner list that has the ability to add and remove writers and readers, but who owns the owner list? Usually it should be the owner list itself, but crucially, it isn't always! None of the self-references here can just be assumed, they could sometimes be otherwise, there are all sorts of permissioning schemes that we could express with this kind of language. Sometimes you'd want the owner of the owner list to be a separate oversight group, or sometimes you'd want the lists to not be readable to anyone except owners. We need the data to be able to express these kind of structures.
So if you think content-addressed object graphs have to be acyclic, you can't do any of this. Well, they basically don't! There's a pretty good solution.
A less good solution would be to have a mutable object that's been defined with no initial value. After defining it, its ID will be known (being the content-address of its definition). We can then define our self-referrential initial value and insert that by publishing a mutation. Unfortunately, this approach makes the content-hash kind of meaningless. We wouldn't be able to put the initial state in the definition being addressed, nor the permissioning objects (because they're cyclic too), so, since we've said very little about the object in its definition, we're completely trusting the mutation system/active hosts to tell us whether the current value they're serving us against this ID is correct. I hate this workaround. Especially for types, which aren't even supposed to be mutable! The prospect of being able to put a detailed prescription of how a mutable object would be managed into its definition object was exciting and with this would basically be the end of that.
So I ruminated on the problem for a bit, and I noticed another way of doing it:
A "burl" is a list of objects, defined together, that can refer to each other using lburl refs.
#a64029 = (burl
(twin_brother
(name mario)
(other_brother (lburl 1)))
(twin_brother
(name luigi)
(other_brother (lburl 0)))
)
[edit: 'knot' was the term I was using when I drew this. burl = knot.]
We can then refer to luigi with (burlref #a64029 1)
and this will resolve as a pair of objects who refer to each other, immutably, forever.
Edit: I've noticed relative paths are a more elegant way of representing "burlrefs" and "lburls". I'll just define a path type. IPLD's doesn't seem to say anything about array indices, and they've explicitly forbidden having cycles. I don't know why they're trying to forbid that. It wont work :] there are admissions in various places that they can't enforce this. I contend that this is because cycles are natural.
So we'll just always refer to objects using paths, rather than root content-addresses, just in case the object is party to a cycle.
Some more examples
(burl
(type
(name type_type)
(my_type (lburl 0)))
)
(actor
(initial_value (bool false) (name multiplayer_toggle))
(owners (burlref #list_burl 0))
(writers (burlref #list_burl 1))
(readers (burlref #list_burl 1)))
#list_burl =
(burl
(actor
(initial_value (permission_list
(name owner_list)
(permits #mako)
))
(owners (lburl 0))
((method remove) (lburl 0))
((method add) (lburl 0))
(subscribers (lburl 1)))
(actor
(initial_value (permission_list
(name reader_writer_list)
(permits #janice #craig)
))
(owners (lburl 0))
((method remove) (lburl 0))
((method add) (lburl 1))
(subscribers (lburl 1)))
)
Modular programmers shouldn't have to think about burl encoding most of the time. When a new object is serialized, cycles can be detected automatically by just keeping a hashset of the ptr of each object that's included. When a cycle is detected, affected objects will be cut out of the atomic or sprue-batch encodings and encoded as a burl instead. Their IDs will point to burlrefs rather than raw objects.
But I'm not sure how cleanly we can abstract this away.
For instance, if you serialize the twin brothers starting from luigi instead of from mario, you get a different burl object, because the brothers' order in the burl would be different (getting the same burl from any starting point would require a solution to the graph isomorphism problem). That's a little weird. It's possible it introduces problems with canonicalization.
Building on that, another nasty surprise would be if someone made the mistake of including a graphic relation in one of their types (for instance friends
) and then tried to batch encode it immutably. Each time they serialized a friend node, it would unexpectedly produce a large burl object encompassing roughly every node in the graph.
But I'm not sure anyone would ever do this accidentally. When you're building an object graph and the objects don't even have IDs yet, you're going to know it!
And large burls aren't large problems. If people want to only view a small part of an immutable graph, say, a single node, without resolving its friends
, we can store the burl list as a merkle search tree and then we'll only have to send a logarithmic number of node addresses to confirm the node's presence in the burl. Note that a burl only has to encompass objects that are both pointing to and pointed from other burl objects. burl objects can point out to non-burl objects, and of course burlrefs allow pointing in.
We could also make the object constructor method throw if it's called on a cyclic structure without maybe_burled: true
being set. I guess as long as most programmers don't need to make burlted structures then they don't necessarily have to know about that setting either. It's also mildly useful to provide cycle-checking just in case people need that at some point.
And a uplink.commitAll(List<Obj>)-> List<ObjRef>
method could handle all cases cleanly. Any objects in the list that references an object that isn't in the list would be required to either be acyclic or to otherwise already have IDs. The refs returned could be burlrefs or regular refs, the programmer wouldn't need to know.
Burl objects still seem a bit weird to me, so I'd appreciate a look from others before I commit to having them. What do you think? Would this be weird and horrible? But why though?
I think there might not be anything weird here at all. If two objects point to each other, they are necessarily part of the same module, they share a unity. Whoever defined them must know about both of them and will have no substantial difficulties defining them together.