Coda File System

Re: Coda 6.07 | Replication Problems

From: redirecting decoy <redirectingdecoy_at_yahoo.com>
Date: Fri, 29 Oct 2004 13:58:52 -0700 (PDT)
Still having problems with coda.  I now have all 8
machines acting as clients, with 2 of them as
replication servers.  It's really annoying, because it
works... just not that well. Aside from the venus
problems I'm  still having, I am now having a problem
with disconnected volumes.  I can see the volume from
all the clients, and it works for a little while, but
then when I do a "cfs lv /coda/machines/whatever", it
tells me it's disconnected.  One minute I can see a
directory listing, then next minute I get a
"Connection Timed out".  Oddly enough, once in a while
if I "clog user" again, then I can see the directory
again, this isn't consistent though.  Seems pretty
random, sometimes it works on some machines, some
times it don't.  

I get 1 error in my SrvErr log:
could not open key 2 file: No such file or directory

Don't know what's causing that.  I haven't had any of
these problems before I attempted to get replication
working.

Think I'm having a problem with the servers aswell. 
When I do a "cfs cs" on some of my nodes where coda
doesn't work correctly it tells me: "m1 and m2 are
still down", which can't be right because the node
sitting right next to it can see the server just fine.
 I don't think this is a firewall problem as all my
nodes have identical settings, and this was not a
problem before upgrade and replication.

Also have a question about server failover:
If I have 2 servers, one scm and one not, and I pull
the plug on either one, then all the data should go to
the replicated volume on the other server, correct? 
Right now, I have a realm setup as "machines", which
points to m1 and m2.  So to my understanding venus
will always try m1 first, then m2. Is this correct? My
problem is, that if I unplug m1, it takes a really
long time for any client to decide to use m2.  Is
there a way to change this behavior, so that I can
speed things up?  And if m1 comes back up, then will
it automatically be updated with changes from m2, or
will all my clients start using m1 again and ignore
any updates.  I can't seem to get my clients stable
enough for use, so I can't really test this.

Any ideas on improving coda performance aswell?  Right
now codasrv process takes up between 50% - 90% Memory
on both servers.  I have a 25M rvm log file and a 315M
rvm data file.  Anything I can do to make it more
efficiant ?

Any help/ideas would be greatly appreciated. Thanks.

-RD

--- redirecting decoy <redirectingdecoy_at_yahoo.com>
wrote:

> Ok, so now I am using a realms file that looks like
> so:
> /usr/local/coda/etc/realms:
> mymachines        m1,m2
> 
> then I ran "Venus-setup mymachines 500000".  So now
> I
> can go to /coda/mymachines.  This works fine.
> 
> Although, I am still having problems with venus and
> propagation.  I created a new replicated volume
> called
> "Coda.Storage" and mounted the volume using "cfs mkm
> /coda/mymachines/storage". For the moment, I am only
> using m1 and m2 as the clients as well. I copied
> around 90 megs of data into my new volume from m1.
> But
> when I look at m2, some of the files were copied,
> but
> some were not.  I don't understand why.  I recieve
> no
> error messages in the log files.  Propagation Does
> work, but seems very shakey.
> 
> Also, venus is still giving me problems.  Once I
> stop
> venus, and try to restart it using just "venus &", I
> get the following message:
>
-------------------------------------------------------
> 10:46:34 Coda Venus, version 6.0.7
> 10:46:34 /usr/coda/LOG size is 13338112 bytes
> 10:46:35 /usr/coda/DATA size is 53348068 bytes
> 10:46:35 Loading RVM data
> 10:46:35 Last init was Fri Oct 29 09:45:44 2004
> 10:46:35 Last shutdown was clean
> 10:46:35 Starting RealmDB scan
> 10:46:35        Found 3 realms
> 10:46:35 starting VDB scan
> 10:46:35 Fatal Signal (11); pid 31813 becoming a
> zombie...
> 10:46:35 You may use gdb to attach to 31813
>
-------------------------------------------------------
> 
> In order to get it working again, I have to do
> "venus
> -init &".  Doesn't that destroy any changes that I
> made to the filesystem ?
> 
> output of: "venus -init &"
>
-------------------------------------------------------
> 10:47:16 Coda Venus, version 6.0.7
> 10:47:16 /usr/coda/LOG size is 13337017 bytes
> 10:47:16 /usr/coda/DATA size is 53348068 bytes
> 10:47:16 Initializing RVM data...
> 10:47:16 ...done
> 10:47:16 Loading RVM data
> 10:47:16 Starting RealmDB scan
> 10:47:16        Found 1 realms
> 10:47:16 starting VDB scan
> 10:47:16        0 volume replicas
> 10:47:16        0 replicated volumes
> 10:47:16        0 CML entries allocated
> 10:47:16        0 CML entries on free-list
> 10:47:19 starting FSDB scan (20833, 500000) (25, 75,
> 4)
> 10:47:19        0 cache files in table (0 blocks)
> 10:47:19        20833 cache files on free-list
> 10:47:22 starting HDB scan
> 10:47:22        0 hdb entries in table
> 10:47:22        0 hdb entries on free-list
> 10:47:22 Initial LRDB allocation
> 10:47:22 Mounting root volume...
> 10:47:22 Venus starting...
> 10:47:22 /coda now mounted.
>
-------------------------------------------------------
> 
> After I do that, I shutdown venus and restart it
> using
> just "venus &", which finally starts to work
> correctly. For a while anyway.
> 
> Output of working "venus &"
>
-------------------------------------------------------
> 10:47:40 Coda Venus, version 6.0.7
> 10:47:40 /usr/coda/LOG size is 13338112 bytes
> 10:47:40 /usr/coda/DATA size is 53348068 bytes
> 10:47:40 Loading RVM data
> 10:47:40 Last init was Fri Oct 29 10:47:16 2004
> 10:47:40 Last shutdown was clean
> 10:47:40 Starting RealmDB scan
> 10:47:40        Found 1 realms
> 10:47:40 starting VDB scan
> 10:47:40        2 volume replicas
> 10:47:40        0 replicated volumes
> 10:47:40        0 CML entries allocated
> 10:47:40        0 CML entries on free-list
> 10:47:40 starting FSDB scan (20833, 500000) (25, 75,
> 4)
> 10:47:40        1 cache files in table (0 blocks)
> 10:47:40        20832 cache files on free-list
> 10:47:40 starting HDB scan
> 10:47:40        0 hdb entries in table
> 10:47:40        0 hdb entries on free-list
> 10:47:40 Mounting root volume...
> 10:47:40 Venus starting...
> 10:47:40 /coda now mounted.
>
-------------------------------------------------------
> Any Ideas ?
> 
> Thanks again
> 
> -RD
> 
> 
> 
> 
> --- Jan Harkes <jaharkes_at_cs.cmu.edu> wrote:
> 
> > On Thu, Oct 28, 2004 at 01:40:13PM -0700,
> > redirecting decoy wrote:
> > > Once I had m1 and m2 setup, I created a Root
> > volume
> > > using the following command:
> > > 
> > > createvol_rep CodaRoot m1/vicepa m2/vicepa
> > > 
> > > The above command created CodaRoot.0 on m1 and
> > > CodaRoot.1 on m2.  So in theory I should have a
> > > replicating Root Volume Correct? 
> > 
> > Only if /vice/db/ROOTVOLUME contains the name of
> the
> > replicated volume.
> > i.e.
> >     $ cat /vice/db/ROOTVOLUME
> >     CodaRoot
> > 
> > > The output of "cfs whereis /coda/m1" and "cfs
> > whereis
> > > /coda/m2"
> > > tells me that the files reside on m1 and m2. So
> it
> > > looks like its working so far.
> > > 
> > > Then I did a "venus-setup m1,m2 500000 on m1"
> > > and "venus-setup m2,m1 500000 on m2", and
> started
> > > venus on both machines.
> > 
> > ??? m1,m2 that doesn't do much, the first argument
> > to venus-setup is
> > only used as the default realm name we use when
> > someone uses clog
> > without specifying a realm. Venus doesn't ever
> even
> > read the value of
> > that variable.
> > 
> > If you currently look at /coda/m1, you're not
> really
> > just seeing the
> > files stored on m1, but the client will
> > automatically look at m2. That
> > happens because any lookup for CodaRoot returns
> the
> > location of both
> > CodaRoot.0' and CodaRoot.1. The same happens when
> > you look at /coda/m2.
> > 
> > The problem here is that in both cases you really
> > only are using a
> > single server for authentication and volume
> location
> > queries, so if m1
> > goes down the /coda/m1 tree becomes mostly
> unusable.
> > 
> > At this point you really want to have a realm that
> > groups several
> > servers together. Ideally this would be done with
> > DNS SRV records, but
> > you can also use the /etc/coda/realms file on the
> > client. Add a line
> > like "myrealm m1 m2" and /coda/myrealm will now
> show
> > the same CodaRoot
> > volume, but this time we aren't dependent on any
> > single server for
> > authentication or volume location queries.
> > 
> > The problem that you are seeing is that the server
> > isn't aware that the
> > client is looking at the same set of files from
> > different contexts, so
> > it only sends back a single callback message and
> the
> > client only
> > invalidates the local cache copy for one realm. As
> > far as the client is
> > concerned, /coda/m1, /coda/m2, /coda/<ip-addr>,
> etc.
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
Received on 2004-10-29 17:04:13