-
-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for protocols other than SSH #1070
Comments
Well, we have quite some plans with borg, but a own webdav backend is not a priority for the core devs for sure (I heard that webdav is a rather shitty protocol, btw.). That said, you could use any kind of tools that either mounts webdav as a FS (as you tried - assuming that the fs emulation is good enough) or first do a "local" backup and then additionally push that to your webdav server using separate tools. There might be some other backends some day, but likely that will first require bigger internal changes (see also the already existing ticket about S3 support). |
The thing with webdav vs ssh is that with SSH you can run borg copy on the server, and then the two talk to each other using ssh as the data pipe. This is not possible with webdav, no matter how you cut it webdav = full file access without involving borg at all which is universally bad for backups. I imagine whoever holds your backups is not super keen on letting you run random commands on their hosts and that's why you get webdav (or others might get ftp and the like) - i.e. full filesystem access protocol. |
webdav is not something thats implementable with a reasonable cost due to sloppy spec and many many sloppy implementations |
Alright, thanks all for your answers. I have Borg running in combination with WDFS (an ages old FUSE FS for WebDAV) now which seems to work fine. Only the performance is much lower than what I had with Duplicity; I now get 20 Mb/s max on a 100 Mb/s connection, Duplicity used to max out that speed. I don't know if WDFS or Borg is the bottleneck of course, although I suspect the former. On another machine (Debian) I'm trying Borg with davfs2 now, but here I'm running into some problems. When I try to initialise a Borg repository on a davfs2 mount, Borg crashes (see attachment for log).
I don't see that as "universally bad" at all. Backup software like Borg and Duplicity don't require any software on the backend which is actually a big pro, because that means that I can use it with any backend I want (theoretically). Cloud providers don't let you run code on their backend. So IMO it is actually a very good thing that Borg doesn't require software on the backend but still can do incremental forever backups. rdiff-backup for example can also do incremental forever but it requires software on the backend (and can't do encryption, but that's an unrelated issue) so I simpy can't use it.
So how does Duplicity do this? Duplicity supports a whole shitload of storage backends (which is great), including WebDAV. This means you don't have to use some FUSE filesystem, which is a good thing. |
Well, I think there's a misunderstanding here on your part. It's like saying that "borg can write directly to a local fs, so it does not require any software on the backend to be effective (And to support ext4/zfs/whatever).
Borg absolutely does require software on the backend (the other borg instance), otherwise it's just reduced to a local filesystem access be it provided by a local kernel driver (native fs, fuse) or a layer in the app itself (ld_preloaded library, all accesses going through a translation layer like what I suspect duplicity does). When it is said that borg supports ssh out of the box, what is meant is "via ssh we can start another borg instance on the other end and then communicate with it via ssh pipe, in fact we recommend to use ssh force command to ensure only borg could be started for a particular key so nobody has unrestricted access to the repo, only via borg protocol". |
a first step towards dumb storage might be a storage format that requires less logic |
Proper cloud storage support will require a completely different storage design, since the environment is completely different. Repository+LoggedIO produce strong guarantees on consistency and integrity (somewhat dependent on file system and hardware), which isn't really possible with anything that doesn't give similar guarantees to a POSIX-y file system. Which is why there have been various failures observed with things like DAV or FTP FUSE FSes. I think this has been discussed in the S3 thread already, but I recap anyway. A possible design could revolve around hash-packs (i.e. blobs of chunks identified by a hash-over-ordered-IDs) to avoid inevitable failures when trying to do segment-counting in the cloud. Those need indexing and versioned objects (=the manifest) need another storage strategy. Compaction needs different strategies. Concurrent accesses needs different strategies. Locking is per se not possible in the storage. And things like append-only are not possible etc. It would just be an entirely different Repository implementation that will be much more complex. There is really nothing stopping that from happening (just saying). |
there is 2 ways to go about that a) direct chunk upload - grat for s3, moot for dav/sshfs |
S3 PUTs cost money as well. Iirc about 0.5 cents per 1000. Storing just a couple million [small] chunks will cost much more in PUTs than actual storage costs. E: 100 GB, 10 million chunks = 3 USD/mo for storage, but the upload costed 50 USD. A hard drive is [much] cheaper in that scenario. |
Maybe I'm misunderstanding you but Borg doesn't require Borg to be installed on the backend, right? Hence why I can use FUSE for WebDAV support. It does support it though. I quote from the README:
I understand why this is better from a security point of view but for most off-site backups this is simply not possible. |
borg needs to be on the server in case of ssh, since a rpc protocol is used, not a file locking protocol |
It's required in case you are using ssh://.... repos. It's not required if you use sshfs through fuse. |
Thanks, I understand now. I missed the part about sshfs. I thought it was also possible to use ssh:// repos without having Borg installed on the backend, but that is not the case. |
rclone supports most of the traditional cloud services including Amazon Cloud Drive. They are also open source. Perhaps some "borrowing" of code could work here? |
Iits in go, i dont See any sane way to do that |
I'm looking for other protocols as well, specially plain old http or https would be nice. |
Support for more widespread protocols, such as sftp or webdav, would be helpful. My use case would be a remote repository accessible only through sftp. Update: I tried rclone which I discovered thanks to @DavidCWGA, and found this alternative solution:
Seeing comments from @enkore I suspect this solution has less guarantees w.r.t. consistency and integrity for the remote backup, although I don't know enough about borg to know what I'm loosing really. |
sshfs uses sftp, so: use sshfs? It will be slower than borg-ssh-borg. |
TBH, I would love to see more backends supported by borg, but for situations where you are able to provide SSH access I would always prefer that. Using SSH with |
Thanks @dimejo this satisfy my use case, since in this situation I have complete control of the remote server so I can install borg, and edit ssh configuration. FYI Here's what I'm doing in my sshd_config:
|
Hello. |
I didn't use rclone yet, but isn't the default usage pattern to rclone a local directory to a remote storage (not using FUSE)? And if that local directory is your borg repo, you will have a remote clone of that afterwards (see FAQ about copying/cloning borg repos). Considering rclone is in Go, I imagine it might be easier to integrate into other software written in Go, like restic, than into Python SW. |
It is, and it's advised to not use fuse due to issues with fs cache (in fuse/kernel). This is why if borg talked directly to rclone (bypassing fuse, fs layer) it should be more stable.
re go/python - yes and no. |
Hi! Rclone author here. I had a request for better borg integration with rclone... I implemented rclone serve restic to act as a special purpose gateway for restic. restic then uses an HTTP protocol to speak to rclone over a pipe to access all the backends rclone supports. I was wondering how difficult something like that would be for borg? I see borg has |
That would be a killer feature to have, instantly opening all clouds supported by rclone for direct access by borg-backup users. Borg developers, it would be very appreciated (by myself and tons of others) if you could help @ncw with this. Thanks in Advance, |
@ncw have a look into remote.py, this is where the client/server code is located. It is basically doing remote procedure calls over stdin/stdout/stderr channels of ssh. The code there is more or less stable, but the RPC interface is made in a way so it is extensible. |
Sorry to (again) revive this thread, I guess some never stop trying ;) I'm tempted like others before me to consider plugging in a "bucket-based" cloud backend (like Based on this comment by enkore in this thread, I was wondering if it is not possible to mitigate the consequences/requirements by focusing the use-case, and I was hoping to get some advice/pointers whether is could be feasible. As I understand from But what if one would be more blunt and store these files (chunks and index) by commands like
Then the granularity of consistency would be the size of each uploaded chunk? I.e.: if my backup is interrupted I would need to do two things to get the remote repo in a consistent state:
One could reduce the risk by making uploads "more atomic" by uploading temp and renaming. (This mechanism also appears in At the risk of appearing too naive with this strategy, I'd love to hear some opinions on the rough feasibility of it. (As a side note: I'm currently successfully running daily backups to a remote NAS over a slow link using an |
borg2: other backends (using other protocols) can now get implemented much easier than before, since #8332 was merged. |
I had a look at an example borgstore backend the posixfs backend. That interface looks relatively easy to implement. It would be straightforward to make a backend for rclone. This could either use the binary directly, or probably preferred start an API server on a unix socket or stdin/stdout. Is that of interest? I note that there is an sftp backend now which would work directly with I note also @ThomasWaldmann pull request to use the restic server which would also work with So maybe a dedicated rclone backend isn't needed? Though I think it could be more seamless if it starts and stops any rclone components itself rather than the user having to start it. Thoughts? |
@ncw Yes, rclone as a borgstore backend is very interesting - if that would work good, I guess we would not need to reinvent the wheel (wheel = talking to misc cloud storage providers). sftp: that already works, but is slow, see borgbackup/borgstore#44 - could maybe be less slow for low-latency connection on localhost though. borgstore REST client backend PR: there are some differences, so the restic server or the rclone restic-like server don't work "as is", but would need some (easy?) adaptions. |
In general this would be great I think since adds multiple backends at once. One caveat: I tried In other words, |
@ThomasWaldmann - I agree 100% with your comments on the sftp protocol and paramiko - have had lots of experience with both! I have made a first attempt at an rclone backend here: borgbackup/borgstore#46 |
I'm currently using Duplicity for my off-site backups which works quite well apart from one thing: it lacks incremental forever backups.
Borg does have this feature.
However, my backup backend only supports WebDAV. In Duplicity I can directly use a WebDAV backend but Borg only seems to support SSH. I could mount the WebDAV server locally, but davfs2 isn't available on FreeBSD (the platform of my fileserver). There is an analog called wdfs, but it hasn't been updated since 2007. Besides, I've always been told that mounting (off-site) backups locally isn't a good idea because that makes it more likely that your backups can be overwritten in some way.
So, if I want to use Borg instead of duplicity, I'd have two options, none of which are ideal:
So I was curious whether support for other protocols, like WebDAV, is a planned feature.
The text was updated successfully, but these errors were encountered: