Coda File System

Re: problems with truncated backups

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 7 Jul 2004 14:29:54 -0400
On Tue, Jul 06, 2004 at 08:09:12PM -0700, Steve Simitzis wrote:
> on second thought, i've given up on the coda backup system. no offense
> to the smart people who make coda a great system to use, but the backup
> system is too fragile to be trusted for production applications. nothing
> worries me more than the possibility of "write-only" backups.

Actually, once we hit more than 100 replicated volumes on our machines
the whole manual scheduling of full vs. incrementals across backup tapes
just became too complicated. So we've been using Amanda for backups :)

It did need a couple of tweaks, the Amanda client side was easy enough
with a couple of wrapper scripts, but the server refused to parse any
dumplists that referred to backup-type 'Coda'. As far as the server is
concerned it really didn't matter what the actual dumptype is and simply
removing a single hardcoded test made everything work.

> after a couple of years of success using rdiff-backup for source code
> and general system backups, i just started using it for my coda
> backups. rdiff-backup reads from the mounted coda filesystem and
> writes the backup to cheap disk. it maintains a live mirror with diffs
> going back as far as you want. it's an excellent tool.

There are advantages and disadvantages to such an approach,

- All files and directories have to be fetched by the client every time
  a backup is run. The server already has all the data and can
  efficiently do an incremental dump which limits the data it sends to
  only recently changed files/directories.

- Fetching a complete volume (voldump) is more efficient than fetching
  all individual files/directories.

- It is not easy to exploit parallelism when there are multiple servers.
  Any server-side backup scheme can easily pull the data from the
  servers at the same time.

- If there is a conflict a whole subtree will not be backed up. The
  server-side scheme will capture complete volume replicas so nothing
  can ever get overlooked. The disadvantage for server-side backup is
  that it grabs all replicas and as such ends up with several identical
  copies of any file that is not in conflict.

- Client-side backup will take a long time, during which the underlying
  filesystem can change, so it might miss parts of the tree if
  directories or files were added/removed or renamed. Server-side
  backups are an exact snapshot of a volume at the time the backup
  started.

But if it works, great. The most important part of backups is whether it
is possible to reliably recover and the volume dumps do have the problem
that they require a server for restore and that there are issues with
the size of the codadump files. The good news here is that Satya wrote a
working codadump -> tar converter, which will make the recovery of data
from Coda volume dumps a lot easier.

Jan
Received on 2004-07-07 14:34:01