Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on iteration #28

Open
jvsteiner opened this issue Nov 11, 2018 · 8 comments
Open

Question on iteration #28

jvsteiner opened this issue Nov 11, 2018 · 8 comments

Comments

@jvsteiner
Copy link

jvsteiner commented Nov 11, 2018

I'm using btree to provide an in-memory key-value store for use as a non-persistent substitute for other key-value databases in a cacheing application. The interface that I need to implement using this package requires that I provide an iterator which exposes Next() and Prev() methods.

I have done this using the AscendGreaterOrEqual and DescendLessOrEqual methods as follows: I store a pointer to the current item, and use it as a reference to ascend or descend from, when Next() or Prev() are called, respectively.

While, I believe that internally, iteration within the provided method calls is O(1), in my case, I believe that the lookup of the current item is always O(logN). Since there is not another obvious, O(1) way (to me at least) to restart iteration from a given Item, and move forward or backward one position in the manner I need, and which is common in most key-value store interfaces, I was wondering if the authors thought it might be useful to expose another way to control iteration in a step-by-step manner - O(1) - which keeps track of the current item.

@gconnell gconnell reopened this Nov 12, 2018
@gconnell
Copy link
Collaborator

Oops, sorry, fatfingered a response. Trying again...

You appear correct in all regards :)

While iterating, a move from one object to the next during an iteration call is O(1), the search for an item within the tree to start the iteration is O(logn), so a repeated (search+move) will be O(logn) * searches, or O(nlogn) assuming n searches into the btree.

You're right that we currently don't have a more robust iterator that knows its position and allows increment/decrement to move forward/back within the tree. I expect your use-case is not unique, however, so it seems like the addition of such an interface would probably be useful.

One note: currently we provide no guarantees about iterating during insertion... if you write to a btree (insert/delete), you invalidate all iterators currently iterating on that tree. I don't see that changing any time soon.

I'm interested if you can provide more information about your use-case: it appears to me that a call to Next() followed by a call to Prev() could be avoided by just caching the original value before calling Next(), since you're guaranteed to get back the same thing you're looking for? Are you calling Next() multiple times, then Prev() once? Or calling each an arbitrary number of times?

@cznic
Copy link

cznic commented Nov 12, 2018

@jvsteiner
Copy link
Author

jvsteiner commented Nov 12, 2018

@cznic I am indeed considering it, thanks!

@gconnell - In my use case, I don't think Next() followed by Prev() is ever required. In a given transaction (which is implemented by me) it would be multiple Next() or Prev() calls but never both in the same operation. Iteration might need to stop based on the values retrieved, so I can't accomplish this with a defined range.

Additionally, I have a different use case where I call Max() or Min() repeatedly, until some condition which again depends on the value retrieved is satisfied. (I remove the original maximum value, in case I need to move to the next - I use it very much like a stack when getting values, but I need fast inserts into sorted order. It would be good if those calls were O(1) as well, however, I suspect it's not possible to make the removals O(1), as sometimes the tree would need to be rebalanced.

I get that this data structure is not concurrent, and that it's not like my iterator gets a snapshot of the internal state or something - It would be nice, but my use case, it's not a priority. I wonder if it would be possible, though, to use Clone to accomplish something like this...

@jvsteiner
Copy link
Author

jvsteiner commented Nov 12, 2018

@cznic - I ran the benchmarks for btree, and it's much faster for inserts, although, indeed, the iteration is relatively more performant.

goos: darwin
goarch: amd64
pkg: github.com/google/btree
BenchmarkInsert-4                      	 3000000	       386 ns/op
BenchmarkSeek-4                        	 5000000	       349 ns/op
BenchmarkDeleteInsert-4                	 2000000	       796 ns/op
BenchmarkDeleteInsertCloneOnce-4       	 2000000	       804 ns/op
BenchmarkDeleteInsertCloneEachTime-4   	  500000	      2861 ns/op
BenchmarkDelete-4                      	 3000000	       436 ns/op
BenchmarkGet-4                         	 5000000	       355 ns/op
BenchmarkGetCloneEachTime-4            	 3000000	       473 ns/op
BenchmarkAscend-4                      	   10000	    151680 ns/op
BenchmarkDescend-4                     	   10000	    104672 ns/op
BenchmarkAscendRange-4                 	   10000	    187702 ns/op
BenchmarkDescendRange-4                	    5000	    255128 ns/op
BenchmarkAscendGreaterOrEqual-4        	   10000	    119307 ns/op
BenchmarkDescendLessOrEqual-4          	   10000	    187690 ns/op
BenchmarkDeleteAndRestore/CopyBigFreeList-4         	     100	  11307591 ns/op	  274526 B/op	      14 allocs/op
BenchmarkDeleteAndRestore/Copy-4                    	     100	  11955815 ns/op	  866424 B/op	    1162 allocs/op
BenchmarkDeleteAndRestore/ClearBigFreelist-4        	     200	   6535154 ns/op	    1509 B/op	       5 allocs/op
BenchmarkDeleteAndRestore/Clear-4                   	     200	   6735721 ns/op	  543568 B/op	    1051 allocs/op
PASS
ok  	github.com/google/btree	38.072s

@cznic
Copy link

cznic commented Nov 12, 2018

I ran the benchmarks for btree, and it's much faster for inserts ...

I don't see here which benchmarks and their results are the base for the comparison "much faster". Note that the 1e3, 1e4 etc suffixes of the modernc.org/b benchmarks are the number of items set, get, deleted etc. So to get time per item one has to divide the time/op value by 10^3, 10^4 etc.

@jvsteiner
Copy link
Author

I see - indeed that was not obvious from the published benchmarks, at least to me.

@jvsteiner
Copy link
Author

jvsteiner commented Nov 12, 2018

@cznic may I ask, where can I obtain the code for inspection? The URL's you provided only lead to the documentation, which does not contain , AFAICT, instructions on how to obtain the actual code. The import statements there, when you try to resolve them in the usual fashion redirect to godocs.

update

I go get worked on that import path, although it seems weird. can you provide some information about the organization behind this project?

@cznic
Copy link

cznic commented Nov 12, 2018

There's no organization behind above me purchasing a domain to make packages at github.com/cznic available independently from the hosting site as I've started the process of abandoning Github after it has been bought by its current owner.
See also https://golang.org/cmd/go/#hdr-Remote_import_paths

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants