Persistence versioning and migration

As an application changes over time, so will the data it needs to persist both on server and client. With each update, the application thus has to ensure that data persisted by a previous version of the application is migrated s.t. it can seamlessly be used by the next version. A robust data versioning and migration process is thus necessary to ensure that the user experience is not compromised due to data integrity issues that otherwise might result from application updates.

This document defines a data versioning scheme that allows the application to migrate persisted data gracefully from one application version to the next. For this purpose, two separate kinds of changes are considered: Changes to the overall database scheme and changes to the format of serialized data.

Database schema versioning

Application data on both client and server is stored in SQL-based relational databases. Here, data is organized in tables, where each table corresponds to one Rust struct. The table has a number of columns, where each column typically represents one field of the struct.

In the course of an update, a struct can change by having a field added, removed, and/or changed. Similarly, structs can be removed entirely or new structs can be added.

To allow our migration process to account for each such change, both the code that interacts with the database, as well as the database schema itself is versioned. The current version of the database schema is recorded in a dedicated table and updated at the end of each execution of the migration process. The current version of the code is hardcoded and changed whenever a struct that is persisted in the database is changed. For each change of the code’s version, a migration procedure has to be implemented that updates the table in accordance with the changes in the code.

Upon startup the application queries the database to detect if the version of its schema is different from the version of the code the application is running.

If the version is the same, no migration has to be applied
If the version of the database is higher than that of the application, the application will terminate, prompting the user to update the version of the application
If the version of the database is lower than that of the application, the application will initiate the migration procedure to update the database to the next version. After the migration, the application will perform this check again.

Tooling

This form of database schema migration is industry standard and supported by a variety of tools and libraries at different abstraction levels. For the server application (using Postgres), basic libraries such as sqlx and more complex ORM-style libraries such as Diesel and SeaORM are available. For the client, libraries such as refinery and rusqlite_migrations offer the same feature.

Example

A user installs the application at version 3. Upon first startup, the DB is initialized with that same version. After using the application for a while, the user updates the application to version 4. On the first startup after the update, the application checks the DB version to discover that the version of the DB is lower than that of the client. The application thus starts the migration process, which modifies the DB schema to update the DB to align it with the changes in the application and which increments the DB version number. After running the migration procedure, the application checks the DB version again to find that it now matches.

Versioning of serialized (encrypted) data

Not all pieces of data are represented as a column in a database table. For example, some data is encrypted before it is stored in the database, thus hiding any internals of the data. In some of these cases, data can be deserialized, migrated and then re-serialized. However, this is not possible for data encrypted at rest, where several migration routines can be executed before the corresponding data is next decrypted, upon which it has to be migrated properly before it can be used by the current version of the application.

To account for such cases, serialized data (encrypted or not) is versioned separately from the database schema and the non-serialized data within the DB. Instead of using a global version that applies to all of the data in the table, serialized structs are versioned individually.

More concretely, before serializing a piece of data, the version is added to be serialized and stored alongside the actual data. After deserialization, the data can be migrated step-by-step to the newest version independent of the scheme of the database that the serialized data was read from.

Tooling

Rust itself provides the necessary tooling to implement a simple versioning scheme of serialized structs. For example, different struct versions can be collected in an enum, where the enum variant will be encoded as part of the serialization procedure.

Example

Say there is a struct Credential, which is serialized and stored in the DB as a blob.

#[derive(Serialize, Deserialize)]
struct Credential {
  identity: String,
  public_key: PublicKey,
}

The credential is not serialized directly. Instead, the application serializes an enum, which has a variant for each credential version. For a little extra robustness, we introduce a trait to mark enums that can be serialized for storage.

#[derive(Serialize, Deserialize)]
enum StorableCredential {
  CredentialV1(Credential)
}

trait Storable: Serialize + DeserializeOwned + Sized {
  fn serialize(&self) -> Result<Vec<u8>, serde::Error> { ... }
  
  fn deserialize(bytes: &[u8]) -> Result<Self, serde::Error> { ... }
}

impl Storable for StorableCredential {}

If a StorableCredential is deserialized, the application first has to turn it into a Credential before it can access any of the fields.

impl From<StorableCredential> for Credential {
   fn from(storable_credential: StorableCredential) -> Credential {
      match storable_credential {
        CredentialV1(credential) => credential,
      }
   }
}

If we now change the Credential struct, for example, because we want to introduce a new field signature_scheme, we rename the old struct, introduce a new struct and extend the enum.

#[derive(Serialize, Deserialize)]
struct CredentialV1 {
  identity: String,
  public_key: PublicKey,
}

#[derive(Serialize, Deserialize)]
struct Credential {
  identity: String,
  signature_scheme: SignatureScheme,
  public_key: PublicKey,
}

#[derive(Serialize, Deserialize)]
enum StorableCredential {
  CredentialV1(CredentialV1),
  CredentialV2(Credential)
}

Now, whenever a StorableCredential is deserialized, the application has to account for the possibility of the deserialized credential being an old CredentialV1 by amending the From implementation. If that is the case, it has to turn it into a Credential by adding the new field with a default value.

impl From<StorableCredential> for Credential {
   fn from(storable_credential: StorableCredential) -> Credential {
      match storable_credential {
        CredentialV1(credential_v1) => {
          Credential {
            identity: credential_v1.identity,
            signature_scheme: SignatureSchem::default(),
            public_key: credential_v1.public_key
          }
        },
        CredentialV2(credential) => credential,
      }
   }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistence versioning and migration

Database schema versioning

Tooling

Example

Versioning of serialized (encrypted) data

Tooling

Example

Clone this wiki locally