Coda File System

Re: "Unknown error: 198" during re-integration

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Sat, 5 Jul 2003 15:29:31 -0400
On Sat, Jul 05, 2003 at 05:17:58PM +0200, Peter Sch?ller wrote:
> While testing disconnected operation I have run into a new problem. When 
> issuing a reconnect followed by a checkservers after having made 
> modifications in disconnected mode, venus often ("often"? see below)
> fails with errors such as:
> 
> 15:27:55 to /usr/coda/spool/1000/coda.root@_coda.tar
> 15:27:55 and /usr/coda/spool/1000/coda.root@_coda.cml
> 15:27:55 Reintegrate: coda.root, 100/544 records, result = Unknown error: 198
...
> This happens on both my Linux 2.4.20 system with Coda 5.3.20, and on my 
> FreeBSD 4.8 STABLE (cvsup:ed today and built today) with the same version of 

Yeah, that is fixed with the 6.0 servers. It only happens when you have
singly replicated volumes, and all of our volumes happen to have 2 or 3
replicas so I never hit the problem. Basically the version vector test
is too restrictive and the reintegration will abort due to conflicting
updates (in fact updates it just reintegrated from the same client
moments ago).

> Coda. On Linux it's not even possible to restart Venus because /coda is never 
> unmounted, so I have to reboot each time. On FreeBSD I can just restart 

We don't automatically unmount on linux because that fails pretty much
all the time (i.e. as long as any process has it's cwd in Coda or a open
file reference).

But after killing venus you can always try to unmount it by hand, and if
that fails search around with lsof for processes that have a reference
to /coda (lsof | grep /coda) and kill those and retry the umount.

> 15:38:21 volume coda.root has unrepaired local subtree(s), skip checkpointing 
> CML!
> 15:43:22 Reintegrate coda.root pending tokens for uid = 1000

Correct, there was a reintegration conflict detected, which has not yet
been resolved by the user. And since the client was restarted, you
obviously don't have tokens. Which explains the second message, we can't
reintegrate until the user has obtained authentication tokens.

> Note that when it breaks it will flush a few hundred changes successfully 
> (looking at the venus log and the output of cfs listvol /coda), but then get 
> stuck somewhere with the above errors.

That sound like the singly replicated volume reintegration problem. The
first reintegration succeeds, but the second one fails because the
directory version vectors on the server are not what the client expected
(due to the modifications made by the first reintegration).

Jan
Received on 2003-07-05 15:32:08