Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Mon, 14 Jan 2002 19:31:33 -0500

On Mon, Jan 14, 2002 at 11:58:49PM +0100, Ivan Popov wrote:
> One important thing would be a "don't" list - I suspect my problems might
> be due to some basic mistake like using wrong libdb version, that I have
> not thought about. (btw, what are the symptoms of incompatible libdb?)

I wish I had a "don't list", but there are just so many things that seem
to go wrong when setting up replicated servers. Many (or most) of these
problems stem from the way the binary VRDB and VLDB files store
information and the information that is returned to the clients by the
ViceGetVolumeInfo rpc2 call (more on these later), which makes us very
sensitive to oddities in DNS lookups and routing setups.

> Time skew between servers? Is two minutes too much? I run ntp but during
> the installation ntp may be out of service...

We try to be pretty resilient as far as time-skew is concerned. The
updateclnt daemon which is running on all servers contacts the updatesrv
process on the SCM and checks for files with a different timestamp. If
any such file is found it is assumed to be updated and the new version
is fetched, then the timestamp on receiving side is forced to match that
of the sending side. In case of problems you can always check the
whether the timestamps and filesizes in /vice/db are the same on all
machines.

> It is no problem to set up a new non-scm machine, even create volumes on
> it (shared with the old server, too), as long as I don't add an entry
> for it in the "servers" file.

How did you manage to create volumes on the non-scm machine if it isn't
in the servers file? I can only see possible corruption coming resulting
from that.
> 
> I can't use the new volumes - they become just dangling symlinks. No
> surprise as "servers" does not contain may be the very important
> reference?..
> (but, what is the meaning with "servers"? all server hostnames are present
> anyway in VSGDB?)

All of the stuff inside the codasrv process doesn't directly store
ip-addresses, but uses the 'serverid'. The servers file is simply the
mapping from serverid to servername. To make it even more difficult, DNS
lookups are blocking and would block the whole process (all LWP
threads), so we actually compare everything by ip-address and the
servernames in the 'servers' file are only looked up during codasrv
startup and the in memory representation of the servers file has just
the serverid and the _first_ ip-address that is returned by
gethostbyname. This is why coda servers have problems on multihomed
hosts that use multiple ip-addresses on different interfaces, we simply
don't handle more than one address.

Similarily the VLDB and VRDB files (volume location and replication
information) store the serverid's and not hostnames or ip-addresses.
When handling the ViceGetVolumeInfo call the server pulls the requested
information out of the VLDB or VRDB file and replaces the serverid's
with the server's ip-address that was resolved during server startup.
Here we also are unable to return a list of addresses for a server, even
if we could store it. The correct thing to return would have been the
servername, because then the client could resolve the name and get the
right ip-address to connect to the server from the client's perspective,
not the server's perspective.

As the VRDB and VLDB files are created by the SCM, this all depends
greatly on whether the SCM managed to correctly resolve all servers in
the servers file to ip-addresses, any failed resolves will result in a
0.0.0.0 address (I believe Shafeeq added a test to 5.3.17 to block a
server from starting up when that happens).

When there are 2 servers with the same serverid in the "servers" file,
both 'replicas' are pointing to the same machine after the volume was
created (although both replicas are created on different machines).
Similarily when the same server is listed twice with different id's some
parts of the code might find the first serverid, while others find the
second serverid, leading to considerable confusion about volume
location (I've seen both cases happen).

I'm not entirely sure what happens when a server is not found in the
servers file, but the resulting serverid will either be that of the last
server in the servers file, or be pretty much undefined.

So right after installing the non-scm RPM, the steps would be something
like,
- Create the entry in the servers file, make sure that the id is unique,
  and not any of 0, 127 or 255 (0, the prefix for replicated volumes, or
  (unsigned char)-1).
- Create an entry in the VSGDB for both the SCM and the non-scm server.
  Ofcourse with an unique VSG number (f.i. E0000101 SCM non-scm)
- Possibly create a vicetab entry for the non-scm's /vicepa, etc. (if
  you want to use a centralized vicetab), or make sure that vicetab is
  _not_ listed in /vice/db/files to avoid updateclnt from "updating"
  this file.
- Restart the SCM codasrv.

- Run the vice-setup script on the non-scm and go through the setup.
- Run updateclnt -h `cat /vice/db/scm`
- Check whether the files in /vice/db are in sync with those on the SCM.
- Start the non-scm codasrv process.

- Try to create a replicated volume.

Technically the only difference between the SCM and non-SCM servers is
that the SCM server is running the updatesrv and rpc2portmap processes
and that all the other servers are using those to make local copies of
the files in /vice/db. So adding users and creating volumes should be
done on the SCM, otherwise noone will see the new information (and it
will be lost as soon as updateclnt syncs in an updated version of the
VRDB/VLDB/auth2.pw/*.pdb files.

> If I try to follow documentation (like adding the new server to servers
> file on scm), the old server stops working with VolumeValidation failed
> (and on the client a cruel child killing its parent)...

Venus is unable to get the root object of the /coda tree, so the mount
operations fails and the forked off process that made the blocking mount
syscall kills it's parent because without a mounted /coda there is not
much use for living any longer.

> About the documentation - somewhere it talks about populating vicetab on
> scm with new servers' entries - no way! it just makes scm (and the non-scm
> party as well) to complain about /vicepa being double-used...

Interesting. The first entry in the vicetab identifies for which server
that line applies. So if you have,

SCM 	/vicepa ftree width=64,depth=3
non-SCM	/vicepa ftree width=64,depth=3

and you see a complaint like that, both of your servers seem to be
resolving to the same ip-address (again here we do the comparison by
ip-address, not name).

> Two "unusual" things about my setup:
>  - kerberos-based authentication only
>  - /vice on the second server is not in /vice but in /somewhere/else/vice,
>    perfectly legitimate (and it works well as an independent server)
> To the best of my knowledge those shouldn't make any difference.

Agreed, these shouldn't be causing any problems.

Jan

Coda File System

Re: add servers to a Coda cell