Coda File System

Re: "No such file or directory" on connected volume

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 2 Jul 2001 15:03:30 -0400
On Sun, Jul 01, 2001 at 09:53:25AM +0200, Ivan Popov wrote:
> The problem occured after several files - a .c file fails to be opened
> - tested on two machines, 486DX66 and Pentium-166, 200 Mb cache in both
> cases. The volume was switching to disconnected by itself, but the missing
> file is still "missing" (though visible in the directory list) even after
> manual reconnect.

The elusive 'lost store' bug. I'm still unsure how/why it hits, but
during reintegration files are sometimes considered 'reintegrated' even
when the server has never seen the CML entry.

One possible scenario is a retried cml operation in between which a
store was optimized away. The cancellation code in the client drops all
CML entries as reintegrated when the returned storeid is not found.

> Neither did I before I stressed Coda with the gcc self-compile.
> It surely does depend on processor load (race condition?).

Not really, however Coda uses an userspace thread package that requires
explicit yield. With larger caches and slower CPU's the time between
yields can potentially become longer than a couple of seconds. RPC2
messages can then get lost (or the reply is sent too late) because the
rpc2 socketlistener isn't picking up the incoming stuff quick enough.

Jan
Received on 2001-07-02 15:03:37