Using rsync for backups, because it's not shiny and new

mesa@piefed.social · 5 months ago

Using rsync for backups, because it's not shiny and new

mesa@piefed.social · 5 months ago

Ive personally used rsync for backups for about…15 years or so? Its worked out great. An awesome video going over all the basics and what you can do with it.

Eager Eagle@lemmy.world · edit-2 5 months ago

It works fine, my issue with that it’s just not efficient. If you want a “time travel” feature, your only option is to duplicate data. Differential backups, compression, and encryption for off-site ones is where other tools shine.

suicidaleggroll@lemmy.world · edit-2 5 months ago

If you want a “time travel” feature, your only option is to duplicate data.

Not true. Look at the --link-dest flag. Encryption, sure, rsync can’t do that, but incremental backups work fine and compression is better handled at the filesystem level anyway IMO.

Eager Eagle@lemmy.world · edit-2 5 months ago

Isn’t that creating hardlinks between source and dest? Hard links only work on the same drive. And I’m not sure how that gives you “time travel”, as in, browsing snapshots or file states at the different times you ran rsync.

Edit: ah the hard link is between dest and the link-dest argument, makes more sense.

I wouldn’t bundle fs and backup compression in the same bucket, because they have vastly different reqs. Backup compression doesn’t need to be optimized for fast decompression.

bandwidthcrisis@lemmy.world · 5 months ago

I have it add a backup suffix based on the date. It moves changed and deleted files to another directory adding the date to the filename.

It can also do hard-link copied so that you can have multiple full directory trees to avoid all that duplication.

No file deltas or compression, but it does mean that you can access the backups directly.

koala@programming.dev · 5 months ago

Thanks! I was not aware of these options, along with what other poster mentioned about --link-dest. These do turn rsync into a backup program, which is something the root article should explain!

(Both are limited in some aspects to other backup software, but they might still be a simpler but effective solution. And sometimes simple is best!)

confusedpuppy@lemmy.dbzer0.com · edit-2 5 months ago

I use rsync for many of the reasons covered in the video. It’s widely available and has a long history. To me that feels important because it’s had time to become stable and reliable. Using Linux is a hobby for me so my needs are quite low. It’s nice to have a tool that just works.

I use it for all my backups and moving my backups to off network locations as well as file/folder transfers on my own network.

I even made my own tool (https://codeberg.org/taters/rTransfer) to simplify all my rsync commands into readable files because rsync commands can get quite long and overwhelming. It’s especially useful chaining multiple rsync commands together to run under a single command.

I’ve tried other backup and syncing programs and I’ve had bad experiences with all of them. Other backup programs have failed to restore my system. Syncing programs constantly stop working and I got tired of always troubleshooting. Rsync when set up properly has given me a lot less headaches.

Eldritch@piefed.world · 5 months ago

And I generally enjoy Veronica’s presentation. Knowledgable and simple.

mesa@piefed.social · 5 months ago

Her https://tinkerbetter.tube/w/ffhBwuXDg7ZuPPFcqR93Bd made me learn a new way of looking at data. There was some tricks I havent done before. She has such good videos.

Eldritch@piefed.world · 5 months ago

Yep, I found her through YouTube. Her and action retro’s content is always great.with some Adrian black on the side.

PortNull@lemmy.dbzer0.com · edit-2 5 months ago

Maybe I am missing something but how does it handle snapshots?

I use rsync all the time but only for moving data around effectively. But not for backups as it doesn’t (AFAIK) hanld snapshots

Eager Eagle@lemmy.world · edit-2 5 months ago

yeah, it doesn’t, it’s just for file transfer. It’s only useful if transferring files somewhere else counts as a backup for you.

To me, the file transfer is just a small component of a backup tool.

eleijeep@piefed.social · 5 months ago

You get incremental backups (snapshots) by using

--link-dest=DIR         hardlink to files in DIR when unchanged

To use this you pass in the previous snapshot location as DIR and use a new destination directory for the current snapshot. This creates hard links in the new snapshot to the files which were unchanged from the previous snapshot, so only the new files are transferred, and there is no duplication of data on disk (for whole-file matches).

This does of course require that all of the snapshots exist in the same filesystem, since you cannot hard-link across filesystems.

koala@programming.dev · 5 months ago

Ah, I didn’t know of this. This should be in the linked article! Because it’s one of the ways to turn rsync into a real backup! (I didn’t know this flag- I thought this was the main point of rdiff-backup.)

1984@lemmy.today · edit-2 5 months ago

I never thought of it as slow. More like very reliable. I dont need my data to move fast, I need it to be copied with 100% reliability.

Victor@lemmy.world · 5 months ago

But if you’re working with Linux you’re going to need to know it.

Nope. I never have needed to know it. I only ever used it because I was either curious to know how to use it or because it was more convenient than other solutions. But scp is basically just as convenient.

sugar_in_your_tea@sh.itjust.works · 5 months ago

It doesn’t do diffs, so it’s really bad if there’s a lot of duplicate data.

Victor@lemmy.world · edit-2 5 months ago

If you want to use it for backups, there are other solutions, so you still don’t need to use it or know it. You can use something else. That’s my only point. 🤷‍♂️

And “really bad” is all relative. If you are only backing up your home drive with documents or whatever, copying a few unnecessary gigabytes over a LAN connection isn’t too bad at all. But scp isn’t what you should be using for backups anyway. I only used rsync for file transfer…

sugar_in_your_tea@sh.itjust.works · 5 months ago

I use rsync for all kinda of things:

deploying static files to a public webserver (blog or whatever)
backups - scheduled systems/cron task w/ SSH key
copying stuff from a USB drive

I only really use scp if the system doesn’t already have rsync.

Victor@lemmy.world · 5 months ago

Alright. But you don’t need to know rsync. That’s my only point. 👍👍

sugar_in_your_tea@sh.itjust.works · 5 months ago

Sure, but you should probably be aware of what it is and what it does. It’s incredibly common and will be referenced in a ton of documentation for Linux server stuff.

Victor@lemmy.world · 5 months ago

You won’t need to unless you run a server in that case. 👍 But the only condition here was “working with Linux”.

Like I said, I’ve been using Linux at home and for work for over a decade, maybe 15+ years, never once did I need to use rsync or know what it is.

That being said, it was convenient when I used it, but never did I need it.

sugar_in_your_tea@sh.itjust.works · 5 months ago

This is the self-hosted community, so that’s the context I was assuming.

dum_lion@feddit.uk · 5 months ago

Y’all don’t seem to know about rsbackup, which is a terrible shame for you.

sugar_in_your_tea@sh.itjust.works · 5 months ago

Yeah it’s slow

What’s slow about async? If you have a reasonably fast CPU and are merely syncing differences, it’s pretty quick.

pathief@lemmy.world · 5 months ago

It’s single thread, one file at a time.

sugar_in_your_tea@sh.itjust.works · 5 months ago

That would only matter if it’s lots of small files, right? And after the initial sync, you’d have very few files, no?

Rsync is designed for incremental syncs, which is exactly what you want in a backup solution. If your multithreaded alternative doesn’t do a diff, rsync will win on larger data sets that don’t have rapid changes.

dohpaz42@lemmy.world · 5 months ago

Here’s how I approach old and slow:

Older software is mature and battle tested. It’s been around long enough that the developers should know what they’re doing, and have built a strong community for help and support.
Slow is okay when it comes to accuracy. Would I love to back up my gigabytes (peanuts compared to some of you folks out there with data centers in your attics) in seconds? Yes. But more importantly, I’d rather have my data be valid for if I ever need to do any kind of restore. And I’ve been around the block enough times in my career to see many useless backups.

Landless2029@lemmy.world · 5 months ago

I need a breakdown like this for Rclone. I’ve got 1TB of OneDrive free and nothing to do with it.

I’d love to setup a home server and backup some stuff to it.

tomkatt@lemmy.world · 5 months ago

Rsync is great. I’ve been using it to back up my book library from my local Calibre collection to my NAS for years, it’s absurdly simple and convenient. Plus, -ruv lets me ignore unchanged files and backup recursively, and if I clean up locally and need that replicated, just need to add —delete.

quick_snail@feddit.nl · 5 months ago

It’s slow?!?

HereIAm@lemmy.world · 5 months ago

Compared to something multi threaded, yes. But there are obviously a number of bottlenecks that might diminish the gains of a multi threaded program.

Tja@programming.dev · 5 months ago

With xargs everything is multithreaded.

okamiueru@lemmy.world · 5 months ago

That part threw me off. Last time i used it, I did incremental backups of a 500 gig disk once a week or so, and it took 20 seconds max.

Biscuit@ani.social · 5 months ago

Yes but imagine… 18 seconds.

NuXCOM_90Percent@lemmy.zip · 5 months ago

I would generally argue that rsync is not a backup solution. But it is one of the best transfer/archiving solutions.

Yes, it is INCREDIBLY powerful and is often 90% of what people actually want/need. But to be an actual backup solution you still need infrastructure around that. Bare minimum is a crontab. But if you are actually backing something up (not just copying it to a local directory) then you need some logging/retry logic on top of that.

At which point you are building your own borg, as it were. Which, to be clear, is a great thing to do. But… backups are incredibly important and it is very much important to understand what a backup actually needs to be.

non_burglar@lemmy.world · 5 months ago

I use rsync and a pruning script in crontab on my NFS mounts. I’ve tested it numerous times breaking containers and restoring them from backup. It works great for me at home because I don’t need anything older than 4 monthly, 4 weekly, and 7 daily backups.

However, in my job I prefer something like bacula. The extra features and granularity of restore options makes a world of difference when someone calls because they deleted prod files.

Colloidal@programming.dev · 5 months ago

Borg gang represent!

tal@olio.cafe · edit-2 5 months ago

I would generally argue that rsync is not a backup solution.

Yeah, if you want to use rsync specifically for backups, you’re probably better-off using something like rdiff-backup, which makes use of rsync to generate backups and store them efficiently, and drive it from something like backupninja, which will run the task periodically and notify you if it fails.

rsync: one-way synchronization

unison: bidirectional synchronization

git: synchronization of text files with good interactive merging.

rdiff-backup: rsync-based backups. I used to use this and moved to restic, as the backupninja target for rdiff-backup has kind of fallen into disrepair.

That doesn’t mean “don’t use rsync”. I mean, rsync’s a fine tool. It’s just…not really a backup program on its own.

neidu3@sh.itjust.works · 5 months ago

+1 for rfiff-backup. Been usinit for 20 years or so, and I love it.

melfie@lemy.lol · 5 months ago

Having a synced copy elsewhere is not an adequate backup and snapshots are pretty important. I recently had RAM go bad and my most recent backups had corrupt data, but having previous snapshots saved the day.

melfie@lemy.lol · 5 months ago

Don’t understand the downvotes. This is the type of lesson people have learned from losing data and no sense in learning it the hard way yourself.

tomenzgg@midwest.social · 5 months ago

How would you pin down something like this? If it happened to me, I expect I just wouldn’t understand what’s going on.

melfie@lemy.lol · edit-2 5 months ago

I originally thought it was one of my drives in my RAID1 array that was failing, but I noticed copying data was yielding btrfs corruption errors on both drives that could not be fixed with a scrub and I was also getting btrfs corruption errors on the root volume as well. I figured it would be quite an odd coincidence if my main SSD and 2 hard disks all went bad and I happened upon an article talking about how corrupt data can also occur if the RAM is bad. I also ran SMART tests and everything came back with a clean bill of health. So, I installed and booted into Memtester86+ and it immediately started showing errors on the single 16Gi stick I was using. I happened to have a spare stick that was a different brand, and that one passed the memory test with flying colors. After that, all the corruption errors went away and everything has been working perfectly ever since.

I will also say that legacy file systems like ext4 with no checksums wouldn’t even complain about corrupt data. I originally had ext4 on my main drive and at one point thought my OS install went bad, so I reinstalled with btrfs on top of LUKS and saw I was getting corruption errors on the main drive at that point, so it occurred to me that 3 different drives could not have possibly had a hardware failure and something else must be going on. I was also previously using ext4 and mdadm for my RAID1 and migrated it to btrfs a while back. I was previously noticing as far back as a year ago that certain installers, etc. that previously worked no longer worked, which happened infrequently and didn’t really register with me as a potential hardware problem at the time, but I think the RAM was actually progressively going bad for quite a while. btrfs with regular scrubs would’ve made it abundantly clear much sooner that I had files getting corrupted and that something was wrong.

So, I’m quite convinced at this point that RAID is not a backup, even with the abilities of btrfs to self-heal, and simply copying data elsewhere is not a backup, because something like bad RAM in both cases can destroy data during the copying process, whereas older snapshots in the cloud will survive such a hardware failure. Older data backed up that wasn’t coped with faulty RAM may be fine as well, but you’re taking a chance that a recent update may overwrite good data with bad data. I was previously using Rclone for most backups while testing Restic with daily, weekly, and monthly snapshots for a small subset of important data the last few months. After finding some data that was only recoverable in a previous Restic snapshot, I’ve since switched to using Restic exclusively for anything important enough for cloud backups. I was mainly concerned about the space requirements of keeping historical snapshots, and I’m still working on tweaking retention policies and taking separate snapshots of different directories with different retention policies according risk tolerance for each directory I’m backing up. For some things, I think even btrfs local snapshots would suffice with the understanding that it’s to reduce recovery time, but isn’t really a backup . However, any irreplaceable data really needs monthly Restic snapshots in the cloud. I suppose if don’t have something like btrfs scrubs to alert you that you have a problem, even snapshots from months ago may have an unnoticed problem.

koala@programming.dev · 5 months ago

Beware rdiff-backup. It certainly does turn rsync (not a backup program) into a backup program.

However, I used rdiff-backup in the past and it can be a bit problematic. If I remember correctly, every “snapshot” you keep in rdiff-backup uses as many inodes as the thing you are backing up. (Because every “file” in the snapshot is either a file or a hard link to an identical version of that file in another snapshot.) So this can be a problem if you store many snapshots of many files.

But it does make rsync a backup solution; a snapshot or a redundant copy is very useful, but it’s not a backup.

(OTOH, rsync is still wonderful for large transfers.)

tal@olio.cafe · 5 months ago

Because every “file” in the snapshot is either a file or a hard link to an identical version of that file in another snapshot.) So this can be a problem if you store many snapshots of many files.

I think that you may be thinking of rsnapshot rather than rdiff-backup which has that behavior; both use rsync.

But I’m not sure why you’d be concerned about this behavior.

Are you worried about inode exhaustion on the destination filesystem?

koala@programming.dev · 5 months ago

Huh, I think you’re right.

Before discovering ZFS, my previous backup solution was rdiff-backup. I have memories of it being problematic for me, but I may be wrong in my remembering of why it caused problems.

vext01@lemmy.sdf.org · 5 months ago

I used to use rsnapshot, which is a thin wrapper around rsync to make it incremental, but moved to restic and never looked back. Much easier and encrypted by default.

ominous ocelot@leminal.space · edit-2 5 months ago

rsnapshot is a script for the purpose of repeatedly creating deduplicated copies (hardlinks) for one or more directories. You can chose how many hourly, daily, weekly,… copies you’d like to keep and it removes outdated copies automatically. It wraps rsync and ssh (public key auth) which need to be configured before.

SayCyberOnceMore@feddit.uk · 5 months ago

Hardlinks need to be on the same filesystem, don’t they? I don’t see how that would work with a remote backup…?

suicidaleggroll@lemmy.world · 5 months ago

The hard links aren’t between the source and backup, they’re between Friday’s backup and Saturday’s backup

MangoPenguin@lemmy.blahaj.zone · 5 months ago

Surely restic or borg would be better for backups?

Rsync can send files and not delete stuff, but there’s no versioning or retention settings.

sugar_in_your_tea@sh.itjust.works · edit-2 5 months ago

For versioning/retention, just use snapshots in whatever filesystem you’re using (you are using a proper filesystem like ZFS or BTRFS, right?).

MangoPenguin@lemmy.blahaj.zone · 5 months ago

How does that get sent over rsync though? Wouldn’t you need snapshots on the remote destination server?

Why not just use a backup utility instead?

sugar_in_your_tea@sh.itjust.works · 5 months ago

Yes, async copies files to the remote server, the remote server takes regular snapshots.

Why not just use a backup utility instead?

What is that utility providing that snapshots + rsync doesn’t. If rsync + snapshots is sufficient, why overcomplicate it with a backup utility?

MangoPenguin@lemmy.blahaj.zone · 5 months ago

The main things that come to mind are you have to test/monitor 2 seperate actions instead of 1, and restores of single files could be more difficult since you need to login to the backup server, restore the file from a snapshot, then also copy that file back to your PC.

chronicledmonocle@lemmy.world · 5 months ago

If you add --delete-before, it absolutely can delete stuff.

MangoPenguin@lemmy.blahaj.zone · 5 months ago

Yeah but then it’s not really a good backup!

srjd7cpsmjvja3cyae@lemmy.ml · 5 months ago

Duplicity also works really well for backups and you can encrypt them with your own openpgp key.

https://duplicity.gitlab.io/

Wispy2891@lemmy.world · 5 months ago

I tried to use it via tailscale but it disconnects very easily - is to be expected?

offspec@lemmy.world · 5 months ago

I would not expect rsync to have frequent disconnects, no.