Coda File System

Re: Coda reliability question

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 10 Dec 2001 11:16:41 -0500
On Mon, Dec 10, 2001 at 04:47:08PM +0100, Torbj|rn Lindh wrote:
> In June 2000 I installed coda on a number of our clients and set up a
> coda server (a Linux). It all seemed to work rather nicely until I
> tested the server by simply pulling the power cord.
> 
> After reboot and a lot of fsck it came up but the coda server just
> would not restart.

There are many reasons why the server might not want to restart. If your
system clock is fast, (or the CMOS clock is slow), time will jump back
and RVM will refuse to to commit a transaction that occured 'in the
past', as all transactions are supposed to be serialized.

Another reason could be that although the metadata is committed to disk,
Linux writes the container files asynchronously in the background. Fsck
fixes up the minor discrepancies (i.e. does it's best to covers up that
something might have been lost). But Coda hates silently losing things
and asserts as soon as the metadata in RVM doesn't match the on-disk
data. The original systems on which Coda was developed (Mach and NetBSD)
have synchronous filesystems and typically have such a discrepancies for
a short amount of time, with Linux there is at least a 30 second window.
Things might be better with a journalling filesystem (reiserfs or ext3),
but I haven't tried.

Then I've occasionally seen the RVM log being corrupted, as server
typically truncate the log pretty often (I believe after we've acked any
operation to the client), there commonly isn't that much in there and it
is possible to use rvmutl to 'reset/clear' the log. This is pretty much
what fsck is doing, simply discarding whatever seems suspicious and
falling back to a hopefully consistent previous state.

> And as far as I could tell I would have to restore the backup in that
> kind of situation.

Very likely, if it is just clock-skew, simply wait (or bump the clock),
if it is the RVM log try to clear it out, if it is the container files
hack the volume salvage code to be less strict about consistency and
instead have it try to fix the metadata to paper over the fact that we
just lost filedata.
 
> So I simply gave up on coda since this is something it just has to
> handle if we are to use it.
> 
> Is this crash sensitivity still there? Server replication is of course
> nice, but it won't help if there is a power outage, since all the
> servers will be lost. (UPS:s are nice, but I am not comfortable with
> them as the last line of defence.)

Assuming that not all UPS's will fail simultaneously,

As long as one server manages to shut down cleanly (and flush all
pending operations to disk) it is possible to reconstruct the other
replicas from the surviving one. When a server is lost, it can be
reinstalled from scratch (i.e. empty RVM and /vicepa partitions), then
the volume replicas that were lost have to be recreated with 'volutil
create_rep', the original volume list can be recovered from
/vice/vol/remote on the SCM, but can also be pieced together from the
information in the VRList and other files that are replicated on all
servers in /vice/db.

Then clients will see the missing objects on these empty replicas as a
server-server conflict and they trigger resolution which copies the data
back into the replica. So a simple 'cfs strong;ls -lR /coda/volume' will
trigger an on-line rebuild of the replicas.

Jan
Received on 2001-12-10 11:16:50