Coda File System

Re: Coda 6.07 | Replication Problems

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 28 Oct 2004 17:21:26 -0400
On Thu, Oct 28, 2004 at 01:40:13PM -0700, redirecting decoy wrote:
> Once I had m1 and m2 setup, I created a Root volume
> using the following command:
> 
> createvol_rep CodaRoot m1/vicepa m2/vicepa
> 
> The above command created CodaRoot.0 on m1 and
> CodaRoot.1 on m2.  So in theory I should have a
> replicating Root Volume Correct? 

Only if /vice/db/ROOTVOLUME contains the name of the replicated volume.
i.e.
    $ cat /vice/db/ROOTVOLUME
    CodaRoot

> The output of "cfs whereis /coda/m1" and "cfs whereis
> /coda/m2"
> tells me that the files reside on m1 and m2. So it
> looks like its working so far.
> 
> Then I did a "venus-setup m1,m2 500000 on m1"
> and "venus-setup m2,m1 500000 on m2", and started
> venus on both machines.

??? m1,m2 that doesn't do much, the first argument to venus-setup is
only used as the default realm name we use when someone uses clog
without specifying a realm. Venus doesn't ever even read the value of
that variable.

If you currently look at /coda/m1, you're not really just seeing the
files stored on m1, but the client will automatically look at m2. That
happens because any lookup for CodaRoot returns the location of both
CodaRoot.0' and CodaRoot.1. The same happens when you look at /coda/m2.

The problem here is that in both cases you really only are using a
single server for authentication and volume location queries, so if m1
goes down the /coda/m1 tree becomes mostly unusable.

At this point you really want to have a realm that groups several
servers together. Ideally this would be done with DNS SRV records, but
you can also use the /etc/coda/realms file on the client. Add a line
like "myrealm m1 m2" and /coda/myrealm will now show the same CodaRoot
volume, but this time we aren't dependent on any single server for
authentication or volume location queries.

The problem that you are seeing is that the server isn't aware that the
client is looking at the same set of files from different contexts, so
it only sends back a single callback message and the client only
invalidates the local cache copy for one realm. As far as the client is
concerned, /coda/m1, /coda/m2, /coda/<ip-addr>, etc. are all completely
different realms. So if it gets a message that /coda/m1/test has been
modified, it doesn't realize that this also affects /coda/m2/test,
/coda/<ip-addr>/test etc.

(it also means that if you get tokens for /coda/m1, you don't
automatically have tokens for /coda/m2).

> 1) How do I force changes made to a file/dir to
> propagate to every client without restarting venus on
> every client?

They propagate fine, you're just not (yet) supposed to look at the same
realm through all possible names it has from a single client. If you had
only looked at /coda/m1 on m1, and /coda/m2 on m2, or grouped the
servers in a common realmname (/coda/myrealm) the server callbacks would
have properly invalidated the parent directory in which the new files
were created.

> 2) Is it possible to mount a volume in the /coda
> directory instead of /coda/m1/'volume'?  I would like
> to be able to do this, since it seems that a change on
> m1:/coda/m1 does not propagate to m1:/coda/m2 unless I

No, not possible. Again, the data propagated just fine, it is the server
that failed to send a callback to invalidate the cached /coda/m2
directory. You can use 'cfs getfid /coda/m1 /coda/m2' and see the
different version vectors. Then try to 'cfs flushobject /coda/mX' to
force the stale data out of the cache and run getfid again.

> 3) How do I delete a replicated volume? Or any volume
> in general.  I've tried the purgevol and purgevol_rep
> commands without success. The programs appear to work,
> but "volutil getvolumelist" still shows the volume I
> wanted to delete. Say I wanted to delete my root
> volume; CodaRoot.0 on m1 and CodaRoot.1 on m2. How
> would I do that quickly and easily ?

purgevol_rep works for me, don't know why it wouldn't. Maybe you didn't
notice the...

    $ purgevol_rep CodaRoot
    Only testing, use 'purgevol_rep --kill CodaRoot' to really purge the volume
    ...
    Don't forget we were only testing
    use 'purgevol_rep --kill CodaRoot' to really purge the volume

> and other times tells me that the Connection state is
> "Connected".  How do I keep it connected to tell it to
> reconnect?  None of the commands that I have tried
> appear to make any difference.

Probably because the server only has a callback connection for one of
the two realm-contexts, as a result only one will consider itself
'connected'. When the client rebinds the disconnected context it will
probably consider it up for a while (as the bind succeeded), but later
on notice that the callback connection is not there and switch back to
disconnected.

Jan
Received on 2004-10-28 17:25:39