hugedict

hugedict provides a drop-in replacement for dictionary objects that are too big to fit in memory. hugedict's dictionary-like objects implement typing.Mapping and typing.MutableMapping interfaces using key-value databases (e.g., RocksDB) as the underlying storage. Moreover, they are friendly with Python's multiprocessing.

Documentation

Installation

From PyPI (using pre-built binaries):

pip install hugedict

To compile the source, run: maturin build -r inside the project directory. You need Rust, Maturin, CMake and CLang (to build Rust-RocksDB).

Features

Create a mutable mapping backed by RocksDB

from functools import partial
from hugedict.prelude import RocksDBDict, RocksDBOptions

# replace [str, str] for the types of keys and values you want
# as well as deser_key, deser_value, ser_value
mapping: MutableMapping[str, str] = RocksDBDict(
    path=dbpath,  # path (str) to db file
    options=RocksDBOptions(create_if_missing=create_if_missing),  # whether to create database if missing, check other options
    deser_key=partial(str, encoding="utf-8"),  # decode the key from memoryview
    deser_value=partial(str, encoding="utf-8"),  # decode the value from memoryview
    ser_value=str.encode,  # encode the value to bytes
    readonly=False,  # open database in read only mode
    secondary_mode=False,  # open database in secondary mode
    secondary_path=None,  # when secondary_mode is True, it's a string pointing to a directory for storing data required to operate in secondary mode
)

Load huge data from files into RocksDB in parallel: from hugedict.prelude import rocksdb_load. This function creates SST files in parallel, ingests into the db and (optionally) compacts them.
Cache a function when doing parallel processing

from hugedict.prelude import Parallel

pp = Parallel()

@pp.cache_func("/tmp/test.db")
def heavy_computing(seconds: float):
    time.sleep(seconds)
    return seconds * 2


output = pp.map(heavy_computing, [0.5, 1, 0.7, 0.3, 0.6], n_processes=3)

Create dictionary backed by Sqlite: hugedict.sqlite.SqliteDict
Chain multiple dictionaries: hugedict.chained_mapping.ChainedMapping
Cache a dictionary so previously accessed keys are stored in memory: hugedict.cachedict.CacheDict or called hugedict.types.HugeMapping.cache

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
containers/manylinux2014_x86_64		containers/manylinux2014_x86_64
docs		docs
hugedict		hugedict
scripts		scripts
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hugedict

Installation

Features

About

Releases

Packages

Languages

License

binh-vu/hugedict

Folders and files

Latest commit

History

Repository files navigation

hugedict

Installation

Features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages