Coda File System

Re: codasrv crashes, won't come back up, production server down :(

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 13 Aug 2004 14:10:16 -0400
On Fri, Aug 13, 2004 at 06:02:27AM -0700, Steve Simitzis wrote:
> while trying to run backups, codasrv crashed, yet again. now i'm
> unable to bring it back, and i fear that data loss may be the result
> of trying to run backups.
> 
> when i try to restart coda, it fails:
> 
> Assertion failed: size == s, file "/usr/src/redhat/BUILD/coda-6.0.6/coda-src/resolution/recov_vollog.cc", line 386

Never seen this trigger before, it is a consistency check on the
resolution log, after walking the list we check if the actual space used
by the log matches the space we've accounted for in RVM.

The size is normally updated in the same transaction as when we add a
new log record, so it is unusual that these values are off. What is also
somewhat unusual is that we have a resolution log with anything in it
for a singly-replicated volume because in that case there is nothing to
resolve against.

Now for a way to recover this....


There is a global 'AllowResolution' flag that will turn off all
resolution related code, (including the SalvageLogs checks). It should
be possible to start the server this way. But since we disable some
essential code no data should be written! There is likely some RVM
corruption, hopefully limited to only this volume but once the server is
up, it should be possible to fetch all the data from this volume.

/etc/coda/server.conf
    resolution=0

I don't know really why this flag is there, globally turning off
resolution looks a bit dangerous to me.

Jan
Received on 2004-08-13 14:12:13