edit: whitewind has repeatedly lost sections of this post while I was editing it. I'm done with whitewind (other problems include interfering with common browser shortcuts like ctrl+l (badly), ctrl+tab, being slow, various small things. I'll probably be on hackmd).
This was mostly just a draft/prospective thing though.
? specifically, or perhaps it's just a member value, ie, class Signature { date: DateTime, author: Profile, signing: Hash, signature: RSASig }, and perhaps this implicitly defines a number of type parameters, Signature<date: DateTime = Dynamic, author: Profile = Dynamic...>. which default to a Dynamic variant which means they can't be checked in type expressions or allocation.*)
- Where a type extends another (its supertype), it can be used anywhere its supertype is expected, despite having additional components of its own. Extension is especially important in cross-org contexts where a new type often must provide a fallback type (the supertype) for participants who haven't been updated to recognize the extension yet, or where the extension might never become ubiquitous. Any type system can do this. Atproto schemas currently can't. Leaf sort of can as a result of components being analogous to supertypes (any component type that's always paired with another could be described as a subtype of the other, though additional language would be needed to invalidate entities that present an extension type without the supertype it extends, or to ensure that the inheritance structure is always clear even when the recipient is missing some type definitions) and JSON-ld can also sort of do this as a result of typed properties being able to express a component inheritance hierarchy. This is a massive L for atproto.
Before them, Capnp did a great job with its schemas, though very few adopted capnp, for I think mostly dumb reasons (I suspect people saw its read-in-place APIs, which are less ergonomic than more conventional copy APIs, and decided the whole thing was a step down. But having in-place APIs didn't at all preclude the writing of a copy API, but I guess nobody who was a fan of capnp anticipated that the increased parsing speed granted by the slightly more complex in-place APIs wouldn't be enough to make them popular.)
But a problem shared by Capnp and ATProto is the mutation of types over time. This forbids us from using content-addresses for types (capnp addresses types with uuids, atproto with urls using dn). Using the content-address as the ID essentially means that anyone can re-host a type, and anyone who they serve it to will know that it really is the original type that was defined against that ID, which means that these type definitions will never go missing as long as someone's using them. It means that it doesn't matter who authored them, they'll be safe forever to build on.
But content-addressed data can't be altered. If you want to change a type, you can define a new one and define translators for the old one. If you want to add a few fields without replacing the type, you can add another component to the entity that has those fields (intersection types) (In languages that lack intersection types, A & B projects down to AWithB { a: A, b: B }. In rust, could also have impl Deref<A> for AWithB and impl Deref<B> for AWithB which would make it basically behave equivalently to an intersection type.) If you use a custom glossary, you can assign ids to these type components that are about as concise as capnp's field numbers. If you don't use your own glossary, each component will be id'd with the content-address of its type.
You may want to see the schema language. I haven't defined one yet. I've been writing types raw in the underlying cross-language cbor format. It's important that the schema language that most people write in is distinct from the schema format. Languages need to be able to evolve over time and no language should reign forever, while the format needs to be stable.
Help or fight?
I've been talking like these shortcomings are condemnatory of the other formats, but really maybe really we should just take those formats and improve them a little. But I don't want to, I'd love to just start from scratch, I don't think there are real benefits to interoperating with these systems given that they're all just microblogging communities (some of the most toxic commenters you can get), and their third-party login UX sucks (mastodon requires you to type your username in and afaik is multistage, bsky keeps logging me out), so like, what's the point???
But I believe we should try to colaborate anyway, for spiritual reasons.
So I'm going to talk about how pre-existing systems could be made more flexible.
Features:
- A composable, cross-language type system, format, and meta-protocol. A language of languages for social computing. A way of saying all that can easily be said to computers about what your networked objects are.
- A downcast operation that allows modular objects to be extended by anyone with additional fields without ever breaking other peoples' code.
- An upcast operation/Any type that allows, eg, arbitrary objects to be attached to posts in user-extensible web apps with enough type information for the computer to automatically fetch a viewer/editor that the community recommends for that object in that context.
- Modular types was made for [modular web], but it's intended for general use in all languages and frameworks whenever they need to communicate over the network.
- You don't have to validate type components you don't use.
- modular types are referenced with content-addresses, which means anyone can create a type, and if others use it, the type definition will stay available forever, while also guaranteeing that components of a type will never have id or member name collisions. To achieve this, we've expanded on the standard IPLD format by introducing a way of representing self-referential structures, which types often require. This is also sometimes needed for serialisation of object graphs [show examples] in high level applications.
- type system features:
- Inheritance
- Parametric types
- variance (eg,
List<int>subtypesList<num>, whileFunction<num>subtypesFunction<int>) - value inputs, eg: for stating a matrix's dimensions as part of its type.
- With a variable number of inputs ("variadic").
- variance (eg,
- Intersection types
- Sum types (discriminated unions/case classes/enums)
- If you're working in a language or style that can't handle that kind of type complexity, the binding generators will get it out of the way, because type checks can (and usually need to) be done at runtime, so your language doesn't need to understand the types to benefit here. Indeed, currently, no widely used programming language could support all of the features of the type system, but it seems so far that this is unproblematic, binding generators/codegen/macros have always been able to represent the data pretty smoothly in the host language.
- Canonical representation: Well, we have it, but mainly when you turn extensibility off (ie, stripped types), because it's actually impossible to have both canonicity and extensibility! Extensibility requires that sometimes a deep equals of upcasted objects will be allowed to disagree with a ref equals, due to the presence of extension components. There's also an implied requirement that the recipient shouldn't have to validate a type component they're not interested in, or to access its components, which means those components will be able to have any type tag and any order while still being valid as an intersection type of its supertypes.