Coda File System

Re: A couple of read/write questions

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 21 Mar 2007 15:22:10 -0400
On Wed, Mar 21, 2007 at 03:24:36PM +0100, Davor Ocelic wrote:
> 1) I set up a coda Client (CVS) and cfs lv /coda/realm/ . It finds the
> LAN server and reports WriteDisconnected state.
> 
> I cd into the realm, clog as admin and mkdir test. The process hangs forever,
> and cfs lv reports that there is one CML entry (about 200 bytes) waiting for
> re-integration, and nothing more happens.

Was your local userid root? I can only think of the following fix that
went in after 6.9.0, but that went into CVS a while ago and you are
already running a CVS version.

    http://www.coda.cs.cmu.edu/trac/changeset/2f783b94a9852f1d36caaa937055f76b20ecd1ce

> it is created. I try copying two files to it, and the task finishes
> quickly. One does get copied, while for the other it prints
> 'Connection timed out'.
> 
> Immediately after I try the copy again, both files work, and all subsequent
> operations (creating, reading, deleting a thousand files) had no more
> problems.
> 
> What debugging options could I enable to exactly find out what happened
> in the beginning?

First thing I try is to create some (shell) script that reliably
reproduces the problem, something where I set up some state on the
servers, reinit the client, and run a couple of operations. That's
probably 90% of finding the bug, because from that point I can increase
the venus or codasrv debug level, set various breakpoints with gdb, etc.

> 2) After using 'vi' to edit a file yesterday, I see two Vim swap files somehow
> got created (.file1.swp and .file1.swo). When I do 'ls' and 'rm' on them:
> 
> # ls -al | less
> ?--------- ? ? ? ? ?  .file1.swo
> ?--------- ? ? ? ? ?  .file1.swp
> ...... (and the rest of OK files)... 

Actually 'ls' does this when stat(2) fails. Strace will probably tell
you that it is returning either ETIMEDOUT or ENOENT.

> # ls .file1*sw*
> ls: .file1.swp: No such file or directory
> -rw------- 1 root nogroup 0 Mar 21 15:10 .file1.swo

Interesting, ENOENT for the .swp file, but the other suddenly worked.

> # rm .file1*sw*
> rm: cannot remove .file1.swo: No such file or directory
> rm: cannot remove .file1.swp: No such file or directory

And now we get ENOENT for both, which is unusual since we just got the
attributes for the .swo file on the previous ls.

In any case, the directory contents is cached and contains entries for
both .file1.swo and .file1.swp, but maybe our cached copy is stale and
the files no longer exist on the server so our getattr (stat) requests
fail with ENOENT. In some cases a reference count goes wrong, there is a
conflict, or we have pending changes that have not yet been reintegrated
and the client is unable to fetch the correct directory contents.
Sometimes it is a conflict on the directory that we are in, and the
kernel isn't allowing venus to turn an active directory into a symlink,
just doing "cd .. ; ls" will reveal the conflict.

Jan
Received on 2007-03-21 15:23:59