[draft]
Modular Types (MT) was made for Modular Web, but it's intended for general use in all languages and frameworks whenever they need to communicate typed data, or type-tagged data, over the network.
Types are useful
They tell the computer how to parse the serialization format, how to represent the objects to users, how to structure documentation and help users to find it when they need it, how to generate bindings for different languages, how to assign globally unique IDs to different type components and interface handles of entities that will never clash, and how to notice, point out, and explain many classes of mistake that programmers might make, which is critical for terse code (ie, functional stuff, where small syntactic differences can completely change behavior) or whenever code that other code depends on needs to be changed.
Types are also just basic necessary part of what needs to be nailed down when humans want to agree on a shared data model, making up a large part of any protocol specification.
So it makes sense for a meta-protocol to have a good distributed type system, a language for specifying types, a format for representing types to automated tooling.
Commentary on related projects
A few upcoming meta-protocols have a decentralized process for maintaining datatype descriptions, usually called "schemas". ATProto has its "lexicons", Leaf (note, this is out of date, the protocol is being racked by the demands of various crdt sync libraries and I'm not sure any of this currently applies) are planning an entity component approach, which JSON-ld also supports.
But most of today's schema systems lack generics, type parameter bounds, dependent type parameters, or interfaces (or all of these things). Some of them lack extension. These are major shortcomings:
-
Generics let users plug different types together to make new types with no friction. I'm not totally sure what they'll use that for, specifically, but as protocol advocates we should support decentralized creativity in any possible way. Though, yeah, concretely, having generics eliminates most of the reasons people ever have for being vague about types. In, say, Go, a List is considered to contain just any old thing, but if you have generics there's rarely a reason to be vague about what's in the list.
-
Value parameters (Dependent type parameters) are parameters that are instances, values (rather than just types). eg,
Matrix<3,4>
represents a 3 by 4 matrix, taking naturals as parameters. Another common example is the length of fixed-length arrays. More novel examples of value type parameters that would emerge in a web operating system are aspects of file type information, the bitrate of an audio file, the dimensions of a photo or video, these too could be part of the type.[todo, this is kinda backwards] In addition to making type composition a lot more flexible, we can also use this feature to key type components with value parameters. For instance, a gift card Message object could be laden with any number of
Signature<author: Profile>
components, even though, in a sense these all have the same type, they can still be distinguished, by providing different signatoryauthor
values for each. (This opens the way to a deeper question of whether any value member of a class should be eligible to become a constant type parameter when it's fixed at typecheck time, eg, in this case, doesauthor
need to be explicitly defined a type parameter? specifically, or perhaps it's just a member value, ie,class Signature { date: DateTime, author: Profile, signing: Hash, signature: RSASig }
, and perhaps this implicitly defines a number of type parameters,Signature<date: DateTime = Dynamic, author: Profile = Dynamic...>
. which default to a Dynamic variant which means they can't be checked in type expressions or allocation.) -
Lacking interfaces is especially bad. Interfaces define member functions, ie, how you can interact with a remote actor, ie, a server. The interface is the first and sometimes last thing a user needs to know about an API.
-
By "extension" I mean defining a type that is extending a pre-existing type, and having that be recognizable and usable as that pre-existing type by any other party who aren't interested in your extensions. Any type system can do this. It's especially important in a decentralized context where you can never guarantee that another party will recognize your extensions. Atproto doesn't have it. Their approach to adding more fields is currently not really there. Leaf and JSON-ld sort of have this as a result of having an entity component approach. A subtype is essentially just a type that has one of the supertypes in its components.
Before them, Capnp did a great job with its schemas, though very few adopted capnp, for I think mostly dumb reasons (I suspect due to confusion about its read-in-place APIs, which are less ergonomic than more conventional copy APIs. Capnp having in-place APIs didn't at all preclude the writing of a copy API, but I guess nobody who was a fan of capnp anticipated that the increased parsing speed granted by the slightly more complex in-place APIs wouldn't be enough to make them popular.)
Lexicons lacks extension/inheritance/entities, making it basically inappropriate for decentralized development. It is a basic need to being able to add fields to an existing type without the involvement of the type's original author and without introducing a risk of collisions if a field of that name is being used by other extensions.
I'm not sure I'd want to go to JSON-ld [todo: research json-ld]
A strong type can project down to a weak type without problems. Dependent types translate down to member variables, const if possible. In languages that lack generics, members of type <T:Bound>
project down to Bound
. Runtime type checks have to be done on each end anyway due to trustlessness.
Why not use capnp? Because capnp schemas are mutable, which introduces friction (user must generate a UUID to write a new schema) and syntactical noise (fields have to be tagged with numbers) and creates security issues (imagine a situation where a false schema serves an important certificate with some field names swapped around [todo: unsure whether this can happen]) and social coordination issues (what if the capnp repo has financial issues, and what if a schema's owner starts abusing their position or otherwise blocking protocol evolution).
Types don't really need to be able to change over time. If you want to change a type, you can define a new one and define translators for the old one. If you want to add a few fields without replacing the type, you can add another component to the entity that has those fields (intersection types) (In languages that lack intersection types, A & B
projects down to AWithB { a: A, b: B }
. In rust, could also impl Deref<A> for AWithB
and impl Deref<B> for AWithB
which would make it basically behave equivalently to an intersection type.) If you use a custom glossary, you can assign ids to these type components that are about as concise as capnp's field numbers. If you don't use your own glossary, each component will be id'd with the content-address of its type.
Using the content-address as the ID essentially means that anyone can re-host a type, and anyone who they serve it to will know that it really is the original type that was defined against that ID, which means that these type definitions will never go offline, it doesn't matter who authored them, they'll be safe forever to build a standard upon.
You may want to see the schema language. I haven't defined one yet. I've been writing types raw in the underlying cross-language cbor format. It's important that the schema language that most people write in is distinct from the schema format. Languages need to be able to evolve over time and no language should reign forever, while the format needs to be stable.
Features:
- A composable, cross-language type system, format, and meta-protocol. A language of languages for social computing. A way of saying all that can easily be said to computers about what your networked objects are.
- A downcast operation that allows modular objects to be extended by anyone with additional fields without ever breaking other peoples' code.
- An upcast operation/Any type that allows, eg, arbitrary objects to be attached to posts in user-extensible web apps with enough type information for the computer to automatically fetch a viewer/editor that the community recommends for that object in that context.
- Modular types was made for [modular web], but it's intended for general use in all languages and frameworks whenever they need to communicate over the network.
- modular types are referenced with content-addresses, which means anyone can create a type, and if others use it, the type definition will stay available forever, while also guaranteeing that components of a type will never have id or member name collisions. To achieve this, we've expanded on the standard IPLD format by introducing a way of representing self-referential structures, which types often require. This is also sometimes needed for serialisation of object graphs [show examples] in high level applications.
- type system features:
- Inheritance
- Parametric types
- variance (eg,
List<int>
subtypesList<num>
, whileFunction<num>
subtypesFunction<int>
) - non-type inputs ("dependent types") eg: for stating a matrix's dimensions as part of its type.
- With a variable number of inputs ("variadic").
- variance (eg,
- Intersection types
- Sum types (discriminated unions/case classes/enums)
- This type system is quite sophisticated/complex, but if you're working in a language or style that can't handle that kind of type complexity, the binding generators will get it out of the way, because type checks can (and usually need to) be done at runtime, so your language doesn't need to understand the types to benefit here. Indeed, currently, no widely used programming language supports all of the features of the type system, but it seems so far that this is fine, binding generators/codegen/macros have always been able to represent the data pretty smoothly in the host language.