Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault-quota: update design doc with some validate ideas #534

Merged
merged 1 commit into from
Sep 12, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 46 additions & 4 deletions vault-quota/Design.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ The design below only takes into account incremental propagation of space used
by stored files. It is not complete/verified until we also come up with a validation
algorithm that can detect and fix discrepancies in a live `vault`.

## Event watcher algorithm:
## DataNode size algorithm:
This is an event watcher that gets Artifact events (after a PUT) and intiates the
propagation of sizes (space used).
```
track progress using HarvestState (source: `db:{bucket range}`, name: TBD)
incremental query for new artifacts in lastModified order
Expand All @@ -29,10 +31,12 @@ for each new Artifact:
update HarvestState
commit txn
```
The above sequence does the first step of propagation from DataNode to parent ContainerNode.
This can be done in parallel by using bucket ranges (smaller than 0-f).
Optimization: The above sequence does the first step of propagation from DataNode to
parent ContainerNode so the maximum work can be done in parallel using bucket ranges
(smaller than 0-f). It also means the propagation below only has to consider
ContainerNode.delta since DataNode(s) never have a delta.

## Container size propagation algorithm:
## ContainerNode size propagation algorithm:
```
query for ContainerNode with non-zero delta
for each ContainerNode:
Expand All @@ -55,12 +59,50 @@ Container size propagation will be implemented as a single sequence (thread). We
something to the vospace.Node table to support subdividing work and enable multiple threads,
but there is nothing there right now.

## validation

### DataNode vs Artifact discrepancies
These can be validated in parallel by multiple threads, subdivide work by bucket.

```
discrepancy 1: Artifact exists but DataNode does not
explanation: DataNode created, transfer negotiated, DataNode removed, transfer executed
evidence: check for DeletedNodeEvent
action: remove artifact, create DeletedArtifactEvent
else: ??

discrepancy 2: DataNode exists but Artifact does not
explanation: DataNode created, Artifact never (successfully) put
evidence: dataNode.size == 0
action: none

discrepancy 3: DataNode exists but Artifact does not
explanation: deleted or lost Artifact
evidence: DataNode.size != 0 (deleted vs lost: DeletedArtifactEvent exists)
action: fix DataNode.size

discrepancy 4: DataNode.size != Artifact.contentLength
explanation: pending/missed Artifact event
action: fix DataNode and propagate delta to parent ContainerNode (same as incremental)
```

This could be accomplished with a single query on on inventory.Artifact full outer join
vospace.Node to get all the pairs. The more generic approach would be to do a merge join
of two iterators:

Iterator<Artifact> aiter = artifactDAO.iterator(vaultNamespace, bucket);
Iterator<DataNode> niter = nodeDAO.iterator(vaultNamespace, bucket);

The more generic dual iterator approach could be made to work if the inventory and vospace
content are in different PG database or server - TBD.

## database changes required
note: all field and column names TBD
* add `size` and `delta` fields to ContainerNode (transient)
* add `size` field to DataNode (transient)
* add `size` to the `vospace.Node` table
* add `delta` to the `vospace.Node` table
* add `storageBucket` to `vospace.Node` table (validation)
* incremental sync query/iterator (ArtifactDAO?)
* lookup DataNode by storageID (ArtifactDAO?)
* indices to support new queries
Expand Down
Loading