Coda File System

Re: unable to access volumes since the setup of another CODA server (non SCM)

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 7 Apr 2008 12:44:54 -0400
On Mon, Apr 07, 2008 at 06:09:15PM +0200, Pierre LEBRECH wrote:
> Before, I had one SCM and 2 CODA clients. Then I have added one non-SCM
> CODA server. Since that time, clients come up sucessfully but I cannot
> access my volumes anymore. clients tell me : "resource temporarily
> unavailable".
> 
> I don't know what to do.
> I restarted venus many times but it didn't help.

Just restarting often doesn't help as much because clients use
persistent memory. So if they have something incorrectly cached in RVM
it will still be there after a restart.

A clean restart would involve wiping all persistently stored state. This
does mean that any local changes that have not reintegrated will be lost.
Reintializing a client involves either creating an empty file named
"/var/lib/coda/cache/INIT" and then restarting the client, or shutting
down the client and starting it by hand by running "venus -init" as root.

One thing I'm curious about, how did you add the non-SCM server? Adding
a server to the realm doesn't actually do anything because all original
volumes were created to only exist on the SCM replica. So to get
replication you either have created completely new volumes and then
replaced the old realm root with one of the new replicated volumes, or
twiddled with volutil create_rep and editing the various files under
/vice/db to add a new replication site to the existing volumes.

Creating new volumes from scratch would be the most reliable way, except
that clients do tend to keep information about the old volumes around
so I have had to reinitialize (some) clients after changing the root
volume to a different one.

> Oh, interesting : I've just stopped the non-SCM CODA server and then I
> was able to access my volumes from my 2 coda clients.
> Then, I started my non-SCM CODA server again, and afterwards accessing
> the volumes was impossible (resource temporarily unavailable).
> 
> So, what is the problem with my second CODA server? If I stop it then I
> can access to my volume.

It could be that a volume replica is missing on the second server, or
the second server is incorrectly identifying itself so that clients get
confused. Look at the server log in /vice/srv/SrvLog, it may indicate
some problem about not being able to find volume replicas.

> Conflict? If yes, with what? my volume? Why? How I can solve my problem?

A conflict should normally just turn the conflicting file or directory
into a dangling symlink, as we need to be able to access something so
that we can repair it. A resource unavailable error would indicate that
we cannot get any answer for a request.

In some cases we cannot turn the conflicting object into a dangling
symlink because of an active reference in the kernel, file is open and
being read, or some process is keeping a directory pinned, but if you
are successfully restarting your client then we are unmounting and such
references wouldn't survive a remount either way.

Jan
Received on 2008-04-07 12:46:39