Coda File System

Re: spontaneous local/global conflict, and how things got worse

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 26 Aug 2003 10:55:15 -0400
Btw. the address you are sending from is not subscribed, so I've been
forwarding your emails to the list. All non-subscriber emails are
'moderated' and you wouldn't guess how much spam I caught that way.

On Mon, Aug 25, 2003 at 12:51:02PM -0400, Matthias Drochner wrote:
> Now I tried a "cvs update" in the tree, and suddenly got an
> error message: "cannot make directory CVS in .: File exists"
> The "CVS" directory had turned into a symlink "@7f...".
>
> There was no other client, so that shouldn't happen as I understand it...

5.3.20 and singly replicated volumes have a bug that causes reintegrated
directory operations to conflict with those that were reintegrated
previously.

So if my CML contains 'create A/foo ; create A/bar' and they are
reintegrated at the same time it works fine, but if they are sent in
separate reintegration attempts they conflict with each other because
the directory version vector for the second operation doesn't match
anymore. This was fixed in 6.0.

> What is going on here - did I do something wrong or is that a bug?

That is a bug, I'm not sure whether it is still around or not. The
assertion triggers because we have either,

 - an object that is marked as locally modified and not reintegrated but
   we didn't find an entry in the modification log for this object.
 - an object is not marked but we found a CML entry.

It shouldn't be too hard to change this so that the dirty flag is set
according to whether we've got a log entry. However these cases really
shouldn't happen, which is why it is an assert.

The removal of the CML entry and the clearing of the dirty flag should
as far as I know happen within the same RVM transaction. Very strange.

> 17:29:25 Reintegrate: coda.root, 45/45 records, result = Unknown error: 198

So some operation within these 45 records is on an inconsistent object.

> 17:29:30 Reintegrate: coda.root, 37/45 records, result = SUCCESS

And from this I guess that would be operation #38.

> [restart venus]
> 17:58:23 Reintegrate: coda.root, 2/7 records, result = SUCCESS

Did we just lose a CML entry??? 45 - 37 = 8, and according to this there
are only 7. I wonder what happened to the problematic operation #38.

> [restart venus]
> 18:03:07 fatal error -- Assertion failed: file "fso0.cc", line 282

I guess we just found it...

Jan
Received on 2003-08-26 10:57:02