-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement wildduck's storage architecture for efficiency and scalability #291
Comments
My current idea of distributed/scalable deployment is putting go-imap-sql on top of CockroachDB with message blobs stored in some block storage (e.g. S3). Attachment deduplication may be worth exploring though. |
I agree. Probably WD gets most gains from attachment deduplication rather than the specific storage backend. Deduplication can easily be done by storing attachment hashes, and may even bring a performance improvement as you would often not need to send a file to storage. Deleting messages with attachments would only delete the file and hash if it's the last message pointing to it. I'm not very familiar with the codebase, but I do have go experience, so I can help as soon as I find some bandwidth. |
I second the That also enables S3 compatible storage and can easily be self-hosted with minio. |
I do not want the maintenance burden of a separate server/machine/etc., neither wildduck, maildir, S3 or cockroachDB. I would appreciate the ability to store my mail in the same database as the metadata (e.g. PostgreSQL). Maybe not the same table as the metadata, but still. This would make consistent backups trivial and advanced search, filtering and analysis much easier. Same applies to attachments, would make things like for example deduplication trivial. |
Early versions of imapsql backend stored message contents as a blob in the same table as metadata. That turned out to be a performance problem. Now message contents are stored into abstracted "external storage", with the only currently available implementation being fs directory. It is definitely possible to add an implementation that just stores blobs in table rows. This should not cause performance problems if the table is separate from metadata. |
Use case
What problem you are trying to solve?
Maildir is less space efficient and less scalable than a clustered database as a mail store.
Note alternatives you considered and why they are not useful.
I've tried using Maildir over an S3 backend, but performance can be an issue.
Your idea for a solution
Compress messages, deduplicate attachments and store in a clustered database like MongoDB.
How your solution would work in general?
Wildduck stores messages and attachments in MongoDB. It compresses data and deduplicates attachments, greatly reducing storage requirements and allowing us to easily scale our deployments. I currently use it in production and works great.
The text was updated successfully, but these errors were encountered: