Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read traversal and COW #22

Open
gallir opened this issue Mar 3, 2018 · 1 comment
Open

Read traversal and COW #22

gallir opened this issue Mar 3, 2018 · 1 comment

Comments

@gallir
Copy link

gallir commented Mar 3, 2018

My case: I have always two clones, one for updating and the other for read traversal, every time a "transaction" is done I clone again the tree to force the next update to copy the nodes.

The problem with cow: even for this case it may occur that the reader traverses a subtree that's sill being modified (i.e. and incomplete). The cause is that mutableFor() changes t.root immediately and the changes are applied to the new root. The same happens for children derived from mutableChildren() that changes the also n.children[i].

The solution is to delay the change to the original pointer (root or children) until all changes in the subtree are finished.

I prepared a first patch wit the idea, only for t.root: master...gallir:cow

If it's correct and you agree I can prepare a patch for children and items if needed.

@gallir gallir changed the title Read traversal Read traversal and COW Mar 3, 2018
@gallir
Copy link
Author

gallir commented Mar 3, 2018

More info about the proposal.

I'm actually using https://github.com/tidwall/btree fork and implemented the entire idea on it:

  1. The equivalent to what I proposed above: gallir/tidwall.btree.old@d659c20

  2. Replaced mutableChildren() for insert and remove: gallir/tidwall.btree.old@1cae1dd

  3. Replaced mutableChildren in growChildAndRemove(): gallir/tidwall.btree.old@ac1d4ab

  4. Replaced mutableChildren in maybeSplitChild(): gallir/tidwall.btree.old@56f9d35

I'm testing it with real data and update from our production system (at dotw.com) and getting a lot less misses during concurrent traversals.

Although during the process I learnt that most of the time the nodes were and are not being reused with the exception of the merge node due to the check of the cow pointer if you do a clone after ever serie of updates. To improve it you have to avoid cloning and by doing it the current mechanism for freeing and reusing don't make too much sense for cases where updates are atomic/mutually exclusive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant