-
Notifications
You must be signed in to change notification settings - Fork 47
/
README
120 lines (92 loc) · 4.61 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
Betree: a small, simple implementation of a B^e-tree, as described in
the September 2015 ;login: article,
"An Introduction to B^e-trees and Write-Optimization"
by Michael A. Bender, Martin Farach-Colton, William Jannen, Rob
Johnson, Bradley C. Kuszmaul, Donald E. Porter, Jun Yuan, and Yang
Zhan
Code by Rob Johnson <[email protected]>
A B^-e-tree is an on-disk data structure with an interface similar to
a B-tree. It stores a mapping from keys to values, supporting
inserts, queries, deletes, updates, and efficient iteration. The key
features of a B^e-tree are extremely I/O-efficient insertions,
updates, and iteration, with query performance comparable to a B-tree.
See the above-referenced article for more details.
This distribution includes
- the B^e-tree implementation
- a test program that checks correctness and demonstrates
how to use the B^e-tree implementation
BUILDING AND RUNNING THE TEST PROGRAM
-------------------------------------
To build, run
$ make
$ mkdir tmpdir
$ ./test -d tmpdir
The test takes about a minute to run and should print "Test PASSED".
The test performs a random sequence of operations on a betree and on
an STL map, verifying that it always gets the same result from each
data structure. If it ever finds a discrepancy, it will abort with an
assertion failure, and will likely leave some files in tmpdir. A
successful run should leave tmpdir empty.
The code has been tested on a Debian 8.2 Linux installation with
- g++ 4.9.2
- GNU make 4.0
- libstdc++ 6.0.20
- libc 2.19
If you have trouble compiling or running the test on other systems,
please submit bug reports to [email protected]. Patches are
definitely appreciated.
GUIDE TO THE CODE
-----------------
test.cpp: The main test program. Demonstrates how to construct and
use a betree.
betree.hpp: The core of the betree implementation. This class handles
flushing messages down the tree, splitting nodes,
performing inserts, queries, etc, and provides an iterator
for scanning key/value pairs in the tree.
The betree is written almost completely as an in-memory
data structure. All I/O is handled transparently by
swap_space.
swap_space.{cpp,hpp}: Swaps objects to/from disk. Maintains a cache
of in-memory objects. When the cache becomes
too large the least-recently-used object is
written to disk and removed from the cache.
Automatically loads the object back into memory
when it is referenced. Garbage collects objects
that are no longer referenced by any other
object. Tracks when objects are modified in
memory so that it knows to write them back to
disk next time they get evicted.
backing_store.{cpp,hpp}: This defines a generic interface used by
swap_space to manage on-disk space. It
supports allocating and deallocating on-disk
space. The file also defines a simple
implementation of the interface that stores
one object per file on disk.
INTERESTING PROJECTS AND TODOS
------------------------------
- Implement logging, transactions, and MVCC. If this can be done in a
way that does not touch the internals of betree, that would be extra
cool.
- Implement range upsert messages (and range deletes). One way to
approach this might be to replace the currently-used std::map for
betree::node::elements with a boost interval map.
- Implement efficient garbage collection of nodes that contain only
keys that are covered by a range delete message.
- Implement "sub-nodes". Sub-nodes are written to disk contiguously
as part of their parent node, but can be deserialized individually.
This would enable the tree to write a node out to contiguous disk
space, enabling fast range queries over the node. Point queries,
however, would be able to deserialize only the sub-node needed to
answer the query, saving disk bandwidth.
- Modify system to track sizes in bytes instead of nodes, keys,
values, etc.
- Add multi-threading support.
- Implement checkpointing and saving/loading of the betree.
- Use boost serialization. The main challenge that I see is that the
deserialization code needs a context for the deserialization, but
boost serialization does not provide this.
- Implement compression and partial eviction (i.e. "evict" a node by
compressing it but keeping it in memory)
- Implement a backing_store that manages space in a single file.
- Reinsert results computed from several upserts into the top of the
tree so we don't have to recompute them.