Coda File System

Replacing a lost server

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 27 Sep 1999 15:34:46 -0400
On Mon, Sep 27, 1999 at 04:48:36PM +0200, Ivan Popov wrote:
> Hello!
> 
> I can't answer myself the follofing question and hope
> somebody here can!
> 
> If there are two or more servers
> carrying the same read-write replicated volumes,
> and if one of the servers temporarily goes down,
> that's no problem. Reintegration happens automagically.
> 
> But what if one of the servers goes down because of a hardware fail
> and I need to reinstall it from scratch?
> 
> Can I avoid using (tape) backups of the lost replica?
> In other words, can I recreate a replica from other replicas?

Hi Ivan,

Yes, it is possible to use (runt-)resolution to repopulate a freshly
initialized server. This is a reasonably difficult process which is
partly manual, partly done by Coda.

The manual part is about recreating the lost volumes on the new server.

First we need to get some vital information about what volumes existed
on the dead server. There are several places where this can be found,
such as /vice/vol/VolumeList on the crashed server, or as
/vice/vol/remote/<deadserver>.list on the SCM. The VolumeList file is
rewritten when a server is restarted, so restarting an reinitialized
server will make us lose the information there. The remote/.. file on
the SCM is lost when new volumes are created. So keep a leash on any
loose running sysad's while recovering the following information.

This file looks something like:

P/vicepa Hdeadserver.coda.cs.cmu.edu T1fb3e6 F44b18
P/vicepb Hdeadserver.coda.cs.cmu.edu T10df2b F10cede
Bvmm:source.4.7.1.0.backup Ic9000100 Hc9 P/vicepa m0 M0 U30d0 Wc90000ff C35ecc3ec D35ecc3ec B0 A0
Wvmm:source.4.6.5.0 Ic9000102 Hc9 P/vicepa m0 M0 U3060 Wc9000102 C35ed88b6 D35ed88b6 B36c8fe12 A1dbd
Wvmm:root.0 Ic9000082 Hc9 P/vicepa m0 M0 U8 Wc9000082 C35d1e3b0 D35d1e3b0 B36c902cf A151
Bvmm:u.rvb.0.backup Ic9000103 Hc9 P/vicepa m0 M0 U37d8 Wc9000069 C35f0b977 D35f0b977 B0 A0
Bvmm:source.4.6.5.0.backup Ic9000104 Hc9 P/vicepa m0 M0 U3060 Wc9000102 C35ff3847 D35ff3847 B0 A0
....

We don't need to restore the backup volumes, they are automatically
recloned during the next backup run, and the partition information is
recreated when the server is set up. So we are only interested in the
lines starting with `W'. (grep '^W' ...)

The first field is the volume name, and when the createvol_rep script
was used to create this volume it is the name of the replicated volume +
an index number ('0' in this case). The second field (I<volume
identifier>) is also important. The P/vicepa field is which partition
held the volume, I don't believe it is necessary to have the volume on
an identical partition on the reinitialized server.

Then we have to get just one more bit of information from
/vice/vol/VRList, which is available on any server. This file looks
something like:

vmm:source.4.7.1 7F0004CC 3 c90000ff dd0000c1 c70000b5 0 0 0 0 0 E0000203
vmm:source.4.6.5 7F0004CE 3 c9000102 dd0000c4 c70000b8 0 0 0 0 0 E0000203
vmm:source.4.6.6 7F0004D0 3 c9000106 dd0000c7 c70000bb 0 0 0 0 0 E0000203
vmm:s.tmp 7F0004D1 3 c9000108 dd0000c9 c70000bd 0 0 0 0 0 E0000203 

The first field is the replicated volume name (this time without the
index number). The second field is the replicated volume id which is the
information we were still missing. (As a consistency check, one of the
subsequent volumeid's should correspond with the volume id we found
earlier for this volume).

Now we have recovered the following information: underlying volumename
(which is the one with the index number), the replicated volume-id,
underlying volume-id and the partition. Now we can safely reinitialize
or set up a brand new server.

After the server has come up, it will start logging a lot of errors
about volume lookups failing. But don't worry, we'll have that fixed
soon...

On the new server, for each volume run:

'volutil create_rep <partition> <volumename> <replicated volumeid> <underlying volumeid>'

And that was the hard and manual part. We have reconstructed the basic
structure of the server, now we have to resolve the missing contents.

On any client, run 'cfs strong' and 'cfs cs' (checkservers). This tells
the client to try and stay strongly connected, and makes sure that we
have a connection to the newly initialized server.

Then for each recreated volume, go to the root of that volume in the
/coda namespace, and do either 'volmunge -a `pwd`' or 'ls -lR'...

And hold your breath while the magic happens. Hmm, maybe not, it often
takes quite some time to resolve a whole server :)

> Another aspect is how could I add servers, extending the set of replicas
> (i.e. had 2 servers, volumes replicated x 2, now I want to have the same
> volumes replicated x 3 on three servers. create + copy + remount?)

Theoretically it should be possible, practically nobody has done that
and therefore it is unlikely to work. Your best bet is creating new
volumes and copying over the data. You'll have to figure out some way to
copy the ACL's too.

Jan
Received on 1999-09-27 15:36:53