Coda File System

Re: Unresolvable Conflicts

From: Martin Emrich <emme_at_emmes-world.de>
Date: Fri, 23 Jul 2004 05:20:20 -0400
Hi!

On Wednesday 21 July 2004 21:06, Jan Harkes wrote:

> On Wed, Jul 21, 2004 at 02:12:08PM -0400, Martin Emrich wrote:

> Well, it is a reintegration conflict, but since the file doesn't exist
> on the server we can't expand it correctly. So repair fails and even
> forgets to re-collapse the expanded tree.
>
> This is a combination of several problems. First of all the conflict
> probably should have been on the parent directory. I don't know why it
> didn't mark that one instead, maybe it still had active references.
> Those can be caused by things like an application or shell keeping the
> directory cache entry pinned and we can't turn the thing into a dangling
> link.

IMHO, there should not have been any conflict at all, as even the parent 
directory never existed on the server or another client. In fact, by the time 
the error appeared, I used this volume from only one client, my notebook.

> The second problem is that repair is very paranoid and refuses to do
> anything if it can't reach all copies. That really shouldn't be
> necessary in all cases, if a server is dead or unreachable it would
> still be useful to perform a partial repair, even though the conflict
> would come back as soon as the missing server returns.

All my repair attempts happened at home while being strongly connected to the 
server.

> > Unless I resolve this, this volume won't be reintegrated (I already have
> > accumulated 957 CMLs ;-)
>
> cd out of the parent directory, then do a cfs er baghira-engine to
> collapse the tree and hopefully flush the cached data from the kernel.

Didn't work. Maybe I should note that this problem persists for several weeks 
now (my first mail never reached codalist). So the client as well as the 
server have been rebooted several times already.

> Then 'ls -l' and hopefully the parent will show up as a conflict. 

Sorry, no effect. Isn't there an undocumented "cfs 
rip-directory-out-of-existence <directory>" command? I could even delete the 
volume "packaging" as it contains only build directories for debian packages, 
which can easily be restored. But I'm afraid that this error could hit other 
more important volumes.

> > On another volume, I had this strage behaviour today:
> >
> > martin_at_gwaihir:/coda/darkzone/organizer$ ls -l
> > ls: kcal-remote: Die Wartezeit für die Verbindung ist abgelaufen
> > ls: knotes: Die Wartezeit für die Verbindung ist abgelaufen
> > insgesamt 4
> > drwxr-xr-x    2 martin   nogroup      2048 2004-06-09 07:55 adressen
> > drwxr-xr-x    2 martin   nogroup      2048 2004-07-01 19:52 kalender
> >
> > (German for "Connection timed out"). I already restarted the server
> > components and the client, nothing happens. This volume stays
> > disconnected, too.
>

Strangely, this error disappeared since I wrote the original mail. Maybe there 
was some timout running or so...

> Well, that could be the other reason why the global replica isn't
> accessible. Maybe the server is unreachable for some reason. Do you have
> multiple network addresses for the server? Are there any firewalls
> between the client and the server? Does 'cfs cs' (checkservers) help?

I was at home behind the firewall, cfs cs worked. Never mind, now it works 
again.

>
> > What can I do (except backing up everything from another client, removing
> > the two volumes and making new ones) ?
>
> You can copy the tarball from /usr/coda/spool/<userid>/volumename.tar
> which should contain all the changed files which are in the CML, it
> doesn't have symlink, rename or remove operations though.
>
> Then a 'cfs purgeml' will flush all the pending operations from the
> local cache at which point it should be possible to bring the volume
> back into connected state.

That sounds pretty hardcore, Maybe I'll give it a try after my exams...

Ciao

Martin
Received on 2004-07-23 11:18:25