Coda File System

Coda + high availability

From: Jan Harkes <>
Date: Thu, 19 Jul 2001 09:02:16 -0400
The email became quite long and touched various subjects, so I've
decided to split the reply. 

On Thu, Jul 19, 2001 at 01:43:58AM +0200, Cain Ransbottyn wrote:
> webserver, /packages/qmail will be mounted on the fileserver.) If one of the
> servers fails, another server can do failover by bringing up an ip:alias of
> the 'dead' client and coda-mount the /packages/*binary*.

Ip takeover could be bad for Coda, the server then might not realize
that the client has 'died' and will have callbacks for a different set
of files compared to what the client has.

A possible solution for this is to make sure that the masquerade option
is set to 1 in /etc/coda/venus.conf. That way venus will bind to an
arbitrary (kernel-assigned) port on the client. I'm not sure whether the
kernel gives out sequential portnumbers, or random ones, but the idea is
to avoid the chance that both clients are using the same local port for
talking to the server.

> If the fileserver fails (network or hardware probs), will it be easily to
> automagically umount the /web on the 'previous coda fileserver' and make a
> symlink from /local/web on the client to /web on the client. This is
> scripted and should all be done in a couple of seconds? We didn't succeed to
> do this with 'regular NFS'. Will this umounting work with coda ? Will we be
> able to umount the /web in let's say 2 seconds after the hardware crash ? (I
> assume we first have to auto-kill the applications using the current mount).
> Is this a good way of thinking or are we driving ourself nuts ?

The umount is blocked by the Linux VFS as long as there are 'live'
inodes. So you either have to kill all applications, or...

    # ln -s /coda/web /web
    *failure detected*

    # rm -f /web
    # ln -s /local/web /web
    # killall -9 venus

When the venus process dies, all pending and subsequent operations in
/coda will trigger EIO (i/o error) and typically the processes will
either fail or try again in which case they'll use the local copy.

Received on 2001-07-19 09:03:46