The Leaf Protocol is a data format on top of the Willow Protocol. Willow deals with how data is replicated and stored amongst peers, providing access control as well as key-value-based byte storage. On top of Willow, Leaf provides:
- A schema system and serialization format.
- A standard for creating an Entity-Component "Web of Data".
See Introducing the Leaf Protocol for a high-level overview of the ideas and motivation.
ℹ️ Note: This specification is a work in progress and has not yet been updated with the latest ideas. See the Leaf issues for pending modifications.
The three major components of the Leaf Protocol data model are Entity
s, Component
s, and Schema
s.
An Entity
represents any distinct "thing". This could be a chat message, a blog post, a comment, a feed, a profile, or anything else.
Entities are able to store data by attaching Component
s to them, and they may also be the target of Link
s.
Each Entity
is stored in a Willow Namespace
, under a specific Subspace
and Path
. The PathComponent
s must follow the rules below.
Note: In this spec we refer to Willow path components as
PathComponent
s, to distinguish it fromComponent
s forEntity
s in this specification.
Each PathComponent
for an Entity
, other than the last one, must be Borsh serialized data matching the following format:
enum PathComponent {
Null,
Bool(bool),
Uint(u64),
Int(i64),
String(String),
Bytes(Vec<u8>),
}
Additionally, the last PathComponent
in the Path
for an Entity
must always be empty, i.e. zero bytes.
ℹ️ Explanation: The empty
PathComponent
at the end of eachPath
makes sure that creating an entity will never accidentally trigger prefix pruning and cause other entities to be deleted. See "prefix pruning" in the Willow data model.This means it is possible to store an entity at
"Hello"
/"World"
/1
/[empty]
, and still be able to store an entity at"Hello"
/"World"
/[empty]
without overwriting it.This is incredibly useful for allowing for the existence of "Feed" entities, or other similar group entities, that describe the purpose of or add metadata relevant to entities in it's sub-paths.
This also means that each
NamespaceId
+SubspaceId
has one "default" entity, the one with only a single, emptyPathComponent
, that can be used to describe the subspace.
The Payload
of an entity must be a sorted list of ComponentId
s.
Note: Since
ComponentId
s are eachPayloadDigest
s, they must be sorted according to the total order of thePayloadDigest
. The Willow protocol requires thatPayloadDigest
s have a total order.
This sorted list of ComponentId
s is called an EntitySnapshot
.
The PayloadDigest
of the EntitySnapshot
is called an EntitySnapshotId
.
Component
s are pieces of data that may be attached to an Entity
. The PayloadDigest
of a Component
is called it's ComponentId
.
Components are stored individually in a content addressable store. This is usually the same store used by the Willow implementation for storing Entity
Payload
s.
The data of a Component
is Borsh serialized data matching the following format:
enum Component {
Unencrypted(ComponentData),
Encrypted {
algorithm: EncryptionAlgorithmId,
key_id: [u8; 32],
encrypted_data: Vec<u8>,
}
}
struct ComponentData {
schema: SchemaId,
data: Vec<u8>,
}
In the ComponentData
struct, the SchemaId
is the ID of the Schema
that describes the
data
field.
Components may be either encrypted or unencrypted. The Leaf Protocol allows any number of EncryptionAlgorithm
s to be implemented. It is up to implementations to choose which algorithms to support.
When a component is encrypted, the Borsh serialized data matching the ComponentData
struct will be encrypted using the algorithm and stored in the encrypted_data
field.
The interpretation of key_id
may be different between different encryption algorithms. This field may be used to store the public key in asymmetric key algorithms for example. An algorithm may choose to put the entire key into the key_id
field, or it may choose to store the key in the content addressed store and put its PayloadDigest
in the key_id
field.
ℹ️ Explanation: This design allows for individual
Component
s of anEntity
to be encrypted, even if other components are not encrypted. This could be useful, for example, on a user profile, where the Email for the user profile might be encrypted so that the user can choose to share it only with specific users or services.This does not prevent you from using Willow's own encryption mechanisms to encrypt the entire
Entity
or it'sPath
.
A Schema
is a description of the data that is in a component. The PayloadDigest
of a Schema
is called a SchemaId
.
The data of a schema is Borsh serialized data matching the following format:
struct Schema {
name: String,
format: BorshSchema,
specification: EntitySnapshotId,
}
The name
of the schema is a human-readable name, for documentation purposes only.
The format
is a Borsh serialized BorshSchema
. This BorshSchema
may be used to deserialize the Component
's data
.
The specification
is an EntitySnapshotId
that represents the human-readable specification describing how the component data is meant to be interpreted.
ℹ️ Explanation ( Component Specifications ): While the
format
in a schema is enough information to deserialize the component data, it does not give humans enough information to understand how it should be used in an application. For example, two different components might have exactly the sameformat
containing a singleString
type, even though one is mean to be an email address and the other is meant to be a name. It is the specification that distinguishes them from each other and provides guidance on how applications are meant to use the data.
ℹ️ Explanation ( Documenting Specifications ):
Schema
s,EncryptionAlgorithm
s, andKeyResolver
s all use anEntitySnapshot
to document theirspecification
s. This means that the documentation itself is described by theComponent
s in thatEntitySnapshot
.The simplest form of documentation would be to add a single UTF-8 Component to the
EntitySnapshot
, containing a human explanation of the specification. Alternatives could include using aMarkdown
component or anHTML
component. This is intentionally flexible, and may even include WASM modules if useful. ( See the note onEncryptionAlgorithm
s. )Since each
Component
used to document a specification must have it's ownSchema
, with it's own specification, you will always be able to follow the chain of specifications components and their schemas until you get to an UTF-8 component.
Because Schema
specification
s are EntitySnapshot
s that, in turn, contains Component
s with their own Schema
s, and all of them are linked by digest, it is impossible to create a Schema
that uses itself in it's specification documentation. In other words, Schema
s and specifications
create a Directed Acyclic Graph ( DAG ).
This situation means that the first schema that is ever created must have a specification
that is set to an empty EntitySnapshot
. This is called an unspecified schema.
Each specified schema, must eventually, down the chain of components and their schemas, be documented by an unspecified schema. This is not ideal, so we define one special case of unspecified schema, the UTF-8 Schema.
When all the following are true of a Schema
, it describes the UTF-8
schema:
- The
name
is set toUTF-8
- The
specification
is set to the ID of the emptyEntitySnapshot
- The
format
is set toBorshSchema::String
All BorshSchema::String
types are required to be UTF-8 strings. The UTF-8
Schema has no
specific meaning beyond it's own contents, and it is primarily meant for use in the specifications
of other components.
When any schema that is not the UTF-8 Schema has a specification
set to the empty EntitySnapshot
is is called an unspecified schema.
Unspecified schemas are generally discouraged, because although the format
will describe the data layout of a component with the schema, it does not give any indication how that data is meant to be interpreted by apps or humans. This makes unspecified schemas ambiguous, and one app may interpret an unspecified schema in a different way than another app.
Still, nothing prevents the creation of unspecified schemas, so they are allowed to exist and be used on Entity
s.
BorshSchema
s are used to describe the binary format of Component
data
. BorshSchema
s are themselves serialized with Borsh according to the following format:
enum BorshSchema {
Null,
Bool,
U8,
U16,
U32,
U64,
U128,
I8,
I16,
I32,
I64,
I128,
F32,
F64,
String,
Option {
schema: BorshSchema
},
Array {
schema: BorshSchema,
len: u32,
},
Struct {
fields: Vec<String, BorshSchema>,
},
Enum {
variants: Vec<(String, BorshSchema)>,
},
Vector {
BorshSchema
},
Map {
key: BorshSchema,
value: BorshSchema,
},
Set {
schema: Borshchema
},
Blob,
Snapshot,
Link,
}
The BorshSchema
allows us to represent the Borsh data model so that we can deserialize component data with it. We make a couple modifications to the normal borsh data model:
- We remove tuples. Structs are clearer and take up no more space for per-component storage.
- We add
Snapshot
andLink
types.
A BorshSchema::Blob
is serialized/deserialized as a PayloadDigest
.
A blob allows you to separate large binary data from the other data in a component. For example, an Image
component might describe the mime_type
and the size
of an image and store the data
of the image as a Blob
. Doing this allows you to download the image metadata without having to download the entire image when you read the component.
A BorshSchema::Snapshot
is serialized/deserialized as an EntitySnapshotId
.
A snapshot is is similar in purpose to a Link
but without a path. This may be useful for things like edit history components, where the older versions of the entity are not stored at any entity path anymore, but their snapshots are stored in a component on the new version of the entity.
A BorshSchema::Link
is serialized/deserialize using Borsh with the following structure:
struct Link {
namespace: KeyResolverKind,
subspace: KeyResolverKind,
path: Vec<PathComponent>,
snapshot: Option<EntitySnapshotId>
}
enum KeyResolverKind {
Inline([u8; 32]),
Custom {
id: KeyResolverId,
data: Vec<u8>,
},
}
A Link
is a reference to an Entity
. Links allow us to build expressive graphs with our entities.
The path
in the Link
specifies the path to the entity in the namespace
and subspace
.
The optional snapshot
allows you to put the EntitySnapshotId
in the link, so that even if the entity is changed, moved, or deleted you can still load the data of the entity, at the time that the link was made.
The namespace
and subspace
specify the KeyResolverKind
used to lookup the public keys that identify the spaces. The simplest KeyResolverKind
is the Inline
variant, which lets you hard-code the key. The Custom
variant, allows you to use any KeyResolver
and data
input to the KeyResolver
.
A KeyResolver
is a specification that describes a way to resolve some data to a NamespaceId
or a SubspaceId
. Key resolvers allow Link
s to have a level of indirection, possibly using DNS or other mechanisms such as DIDs to lookup a key, instead of hardcoding it.
The PayloadDigest
of a KeyResolver
is called a KeyResolverId
.
Key resolvers contain a specification
, similar to a Schema
, that documents how to implement the key-resolver. Each app must decide which key resolvers to implement.
KeyResolver
s are stored in a content addressed store, and their data is Borsh serialized data matching the following format:
struct KeyResolver {
name: String,
specification: EntitySnapshotId,
}
The name
of the KeyResolver
is a human readable name for documentation purposes.
The specification
is the ID of an EntitySnapshot
documenting the key resolution process. The specification is usually human documentation that preferably includes all of the information necessary to implement the key resolver.
An EncryptionAlgorithm
is specification describing how data may be encrypted and decrypted.
The PayloadDigest
of an EncryptionAlgorithm
is called an EncryptionAlgorithmId
.
The data of an EncryptionAlgorithm
is Borsh serialized data matching the following format:
struct EncryptionAlgorithm {
name: String,
specification: EntitySnapshotId,
}
The name
is a human readable name for documentation purposes.
The specification
is the ID of an EntitySnapshot
that documents the encryption algorithm. The specification is usually human documentation that preferably includes all of the information necessary to encrypt and decrypt data with the encryption algorithm.
Note: It is an interesting consideration that while the
specification
for anEncryptionAlgorithm
should always include human documentation describing the algorithm, it might also contain additional machine-readableComponent
s, such as a WASM module that can be used to actually perform the encryption and decryption.If a standardized interface for encryption modules was developed, it might be possible to allow clients to automatically download and execute compatible encryption modules automatically.
This kind of standard is allowed to develop on top of the Leaf protocol independently. Details on how this might be done is out of scope for this specification.
The Leaf Protocol specifies a data format on top of Willow storage, and not much else. All of the Willow features such as the Meadowcap capability system can work with Leaf seamlessly.
The goal of Leaf is simply to provide a more expressive format for storing rich data that can be incrementally understood by different applications.