a very simple vector embedding database, you can say that it is a hash-table that let you find items similar to the item you're searching for.
I'm a databases enthusiast, and this is a for fun and learning project that could be used in production ;).
P.S: I like to re-invent the wheel in my free time, because it is my free time!
I'm using the
{key => value}
model,
key
should be a unique value that represents the item.value
should be the vector itself (List of Floats).
by default
vecdb
searches forconfig.yml
in the current working directory. but you can override it using the--config /path/to/config.yml
flag by providing your own custom file path.
# http server related configs
server:
# the address to listen on in the form of '[host]:port'
listen: "0.0.0.0:3000"
# storage related configs
store:
# the driver you want to use
# currently vecdb supports "bolt" which is based on boltdb the in process embedded the database
driver: "bolt"
# the arguments required by the driver
# for bolt, it requires a key called `database` points to the path you want to store the data in.
args:
database: "./vec.db"
# embeddings related configs
embedder:
# whether to enable the embedder and all endpoints using it or not
enabled: true
# the driver you want to use, currently vecdb supports gemini
driver: gemini
# the arguments required by the driver
# currently gemini driver requires `api_key` and `text_embedding_model`
args:
# by default vecdb will replace anything between ${..} with the actual value from the ENV var
api_key: "${GEMINI_API_KEY}"
text_embedding_model: "text-embedding-004"
- Raw Vectors Layer (low-level)
- send VectorWriteRequest to
POST /v1/vectors/write
when you have a vector and want to store it somewhere. - send VectorSearchRequest to
POST /v1/vectors/search
when you have a vector and want to list all similar vectors' keys/ids ordered by cosine similarity in descending order.
- send VectorWriteRequest to
- Embedding Layer (optional)
- send TextEmbeddingWriteRequest to
POST /v1/embeddings/text/write
when you have a text and wantvecdb
to build and store the vector for you using the configured embedder (gemini for now). - send TextEmbeddingSearchRequest to
POST /v1/embeddings/text/search
when you have a text and wantvecdb
to build a vector and search for similar vectors' keys for you ordered by cosine similarity in descending order.
- send TextEmbeddingWriteRequest to
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
"vector": [1.929292, 0.3848484, -1.9383838383, ... ] // the vector you want to store
}
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"vector": [1.929292, 0.3848484, -1.9383838383, ... ], // you will get a list ordered by cosine-similarity in descending order
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}
if you set
embedder.enabled
totrue
.
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
"content": "This is some text representing the product" // this will be converted to a vector using the configured embedder
}
if you set
embedder.enabled
totrue
.
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"content": "A Product Text", // you will get a list ordered by cosine-similarity in descending order
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}