Coda File System

Re: volume chimes has unrepaired local subtree(s)

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 22 Jan 2002 11:50:50 -0500
On Tue, Jan 22, 2002 at 11:19:50AM +0100, Ivan Popov wrote:
> On Mon, 21 Jan 2002, Tom Carroll wrote:
> 
> > Upon a beginrepair, the following is stated
> >
> > Could not allocate new repvol: Object not in conflict
> > beginrepair failed
> >
> > Any ideas how to resolve the conflict?

The "allocate new repvol" error sounds familiar. This is a result of one
of the replicas being inaccessible. Either one of the servers is down,
or there is a server/server conflict hidden underneath the local/global
conflict. The current conflict expansion code in the clients handles
both situations very differently and as a result we cannot repair these
kinds of conflicts from a single client.

Run, 'cfs br' (beginrepair) on the object in conflict, it should then
expand into a directory, do an 'ls -l' on the directory and it should
show

    global -> #volumename
    local/

if the client is disconnected from the servers. In this case, check the
servers and network connections and run 'cfs cs' to force a server
probe, which should typically bring the global volume back.


    global -> @7fxxxxx.xxxxx.xxxxx
    local/

if there is an underlying server-server conflict. The only way to fix
this is to go to the same location on another client and fix the
server-server conflict first. After that the reintegration should just
start again, although you might get a local-global conflict if the
server-server repair had to make any changes to the replicas.

'cfs er' (endrepair) will collapse the directory back into the conflict
link. You have to collapse before repair can detect that there actually
is a conflict.

> I have got a similar effect by:
> 
> 1. setting tight quota on a volume
> 2. updating a file (becomes a conflict, file inaccessible)
> 3. beginrepair, preservelocal (well I ignore the quota restriction here
>    once more...), endrepair, commit -> hangs forever
>    intterrupt causes Segmentation violation death
> 4. Files becomes visible again but not modifiable or sometimes
>    stays a danglink link or (broken?) dir - do not recall exactly.
> 5. repair is impossible as "Object not in conflict" ...
> 
> Is there any general way to solve "not existing" conflicts?
> I could not get rid of such objects without sacrificing the whole
> client cache.

Interesting, we don't use quota's, and I believe they are still left
over from the AFS2 days. They are definitely not well tested. It could
very well be that the repair marked both objects as not-in-conflict and
synced the versionvectors so that clients don't (really) notice there is
something wrong. There is a magic undocumented cfs call that might help
here. It should not be used lightly, but can get you out of a situation
where a conflict was marked as cleaned up. It's 'cfs markincon' and it
fiddles with the versionvector of an object, so it is really dangerous
stuff.

Jan
Received on 2002-01-22 11:51:07