Coda File System

Re: Problems with replication on two servers.

From: <u+codalist-wk5r_at_chalmers.se>
Date: Fri, 24 Apr 2009 11:26:53 +0200
Hello Jan,

On Thu, Apr 23, 2009 at 04:52:48PM -0400, Jan Harkes wrote:
> > 18:25:10 GetVolObj: Volume (1000002) already write locked
> > 18:25:10 RS_LockAndFetch: Error 11 during GetVolObj for 1000002.1.1
> > 18:25:46 LockQueue Manager: found entry for volume 0x1000002
> 
> The volume xxx already write locked sounds very ominous, but it is
> really just a debugging message added to help debug Rune's issues.

> He is running a non-replicated server, so his testing never hit the
> resolution case, and either way it doesn't seem to have solved his
> issues, so I'll probably revert this change. Especially as now there is
> no queueing on these locks so readers are in some cases not able to
> obtain the lock.

Trying to minimize confusion: I get this message in a replicated scenario
quite regularly "forever", the only way to get rid of this situation
is to restart the server with the runaway lock. It is not the same
as the harmless messages on the single server.

I did not complain loudly as I see this on servers where one of them
has a slow connection which potentially can be flooded (say by sftp's
unflexible resend policy) and become unreliable. You said it is an
unsupported configuration :)

My observed error on the clients is though
"Resource temporarily unavailable", not a dangling link.

I have a realm with a volume in this state right now.
It is not going to recover on its own, nor could I use repair.

[So this has nothing to do with our non-replicated server, which
apparently does not deadlock - fully conforming to your expectations,
but it still has the "unexpected delays" issue which may look
as a deadlock.]

Regards,
Rune
Received on 2009-04-24 05:29:23