-
Notifications
You must be signed in to change notification settings - Fork 0
Persistence versioning and migration
As an application changes over time, so will the data it needs to persist both on server and client. With each update, the application thus has to ensure that data persisted by a previous version of the application is migrated s.t. it can seamlessly be used by the next version. A robust data versioning and migration process is thus necessary to ensure that the user experience is not compromised due to data integrity issues that otherwise might result from application updates.
This document defines a data versioning scheme that allows the application to migrate persisted data gracefully from one application version to the next. For this purpose, two separate kinds of changes are considered: Changes to the overall database scheme and changes to the format of serialized data.
Application data on both client and server is stored in SQL-based relational databases. Here, data is organized in tables, where each table corresponds to one Rust struct. The table has a number of columns, where each column typically represents one field of the struct.
In the course of an update, a struct can change by having a field added, removed, and/or changed. Similarly, structs can be removed entirely or new structs can be added.
To allow our migration process to account for each such change, both the code that interacts with the database, as well as the database schema itself is versioned. The current version of the database schema is recorded in a dedicated table and updated at the end of each execution of the migration process. The current version of the code is hardcoded and changed whenever a struct that is persisted in the database is changed. For each change of the code’s version, a migration procedure has to be implemented that updates the table in accordance with the changes in the code.
Upon startup the application queries the database to detect if the version of its schema is different from the version of the code the application is running.
- If the version is the same, no migration has to be applied
- If the version of the database is higher than that of the application, the application will terminate, prompting the user to update the version of the application
- If the version of the database is lower than that of the application, the application will initiate the migration procedure to update the database to the next version. After the migration, the application will perform this check again.
This form of database schema migration is industry standard and supported by a variety of tools and libraries at different abstraction levels. For the server application (using Postgres), basic libraries such as sqlx
and more complex ORM-style libraries such as Diesel
and SeaORM
are available. For the client, libraries such as refinery
and rusqlite_migrations
offer the same feature.
A user installs the application at version 3. Upon first startup, the DB is initialized with that same version. After using the application for a while, the user updates the application to version 4. On the first startup after the update, the application checks the DB version to discover that the version of the DB is lower than that of the client. The application thus starts the migration process, which modifies the DB schema to update the DB to align it with the changes in the application and which increments the DB version number. After running the migration procedure, the application checks the DB version again to find that it now matches.
Not all pieces of data are represented as a column in a database table. For example, some data is encrypted before it is stored in the database, thus hiding any internals of the data. In some of these cases, data can be deserialized, migrated and then re-serialized. However, this is not possible for data encrypted at rest, where several migration routines can be executed before the corresponding data is next decrypted, upon which it has to be migrated properly before it can be used by the current version of the application.
To account for such cases, serialized data (encrypted or not) is versioned separately from the database schema and the non-serialized data within the DB. Instead of using a global version that applies to all of the data in the table, serialized structs are versioned individually.
More concretely, before serializing a piece of data, the version is added to be serialized and stored alongside the actual data. After deserialization, the data can be migrated step-by-step to the newest version independent of the scheme of the database that the serialized data was read from.
Rust itself provides the necessary tooling to implement a simple versioning scheme of serialized structs. For example, different struct versions can be collected in an enum, where the enum variant will be encoded as part of the serialization procedure.
Say there is a struct Credential
, which is serialized and stored in the DB as a blob.
#[derive(Serialize, Deserialize)]
struct Credential {
identity: String,
public_key: PublicKey,
}
The credential is not serialized directly. Instead, the application serializes an enum, which has a variant for each credential version. For a little extra robustness, we introduce a trait to mark enums that can be serialized for storage.
#[derive(Serialize, Deserialize)]
enum StorableCredential {
CredentialV1(Credential)
}
trait Storable: Serialize + DeserializeOwned + Sized {
fn serialize(&self) -> Result<Vec<u8>, serde::Error> { ... }
fn deserialize(bytes: &[u8]) -> Result<Self, serde::Error> { ... }
}
impl Storable for StorableCredential {}
If a StorableCredential
is deserialized, the application first has to turn it into a Credential
before it can access any of the fields.
impl From<StorableCredential> for Credential {
fn from(storable_credential: StorableCredential) -> Credential {
match storable_credential {
CredentialV1(credential) => credential,
}
}
}
If we now change the Credential
struct, for example, because we want to introduce a new field signature_scheme
, we rename the old struct, introduce a new struct and extend the enum.
#[derive(Serialize, Deserialize)]
struct CredentialV1 {
identity: String,
public_key: PublicKey,
}
#[derive(Serialize, Deserialize)]
struct Credential {
identity: String,
signature_scheme: SignatureScheme,
public_key: PublicKey,
}
#[derive(Serialize, Deserialize)]
enum StorableCredential {
CredentialV1(CredentialV1),
CredentialV2(Credential)
}
Now, whenever a StorableCredential
is deserialized, the application has to account for the possibility of the deserialized credential being an old CredentialV1
by amending the From
implementation. If that is the case, it has to turn it into a Credential
by adding the new field with a default value.
impl From<StorableCredential> for Credential {
fn from(storable_credential: StorableCredential) -> Credential {
match storable_credential {
CredentialV1(credential_v1) => {
Credential {
identity: credential_v1.identity,
signature_scheme: SignatureSchem::default(),
public_key: credential_v1.public_key
}
},
CredentialV2(credential) => credential,
}
}
}