Coda File System

still losing with unresolvable conflicts, and Coda/IPsec HOWTO

From: Greg Troxel <gdt_at_ir.bbn.com>
Date: Tue, 17 Sep 2002 14:19:18 -0400
I'm still having serious Coda usability trouble, and I'm wondering if
someone could replicate my setup or parts thereof.

  Server: NetBSD 1.5.3/i386 (SCM and singly-replicated volume),
	  P6-200, 256 MB RAM
	  2002-06-03 cvs Coda
  Client: FreeBSD 4.6/i386
	  2002-09-05 cvs Coda
	  P6-200, 128 MB RAM

  IPsec between client and server (10 minute SA, racoon, certificates,
  ESP with AES-128 and HMAC-SHA1)

  client has
    mapprivate=1
    masquerade=1

  client security policy database
    # Coda masquerading
    spdadd 0.0.0.0/0[any] 0.0.0.0/0[370] udp
	    -P out ipsec esp/transport//require ;
    spdadd 0.0.0.0/0[370] 0.0.0.0/0[any] udp
	    -P in ipsec esp/transport//require ;
    spdadd 0.0.0.0/0[any] 0.0.0.0/0[2432] udp
	    -P out ipsec esp/transport//require ;
    spdadd 0.0.0.0/0[2432] 0.0.0.0/0[any] udp
	    -P in ipsec esp/transport//require ;

  client runs cfs (see FreeBSD /usr/ports/net/cfs), with ciphertext
  in /coda/home/gdt/secret, mounted on /crypt/gdt

  files in cfs under RCS, edited with emacs



To replicate the trouble, edit smallish files (400 bytes is enough)
that are kept under RCS with emacs vc mode.  Start with the file
checked in, and ^X^Q to check out, then edit, save, and ^X^Q to check
back in.  This does lots of operations very quickly, and can overwhelm
the connection to the server, particularly if the server is busy and
not that fast to start with.  (cvs update of netbsd source tree was in
process, and the server has only one disk.)

Sometimes, this provokes 'connection unreachable' to the server.
When this happens, the client goes disconnected, and in the most
recent case, I had a CML of 16 entires:

(encrypted file names replaced with path-x)

  Store   /coda/home/gdt/secret/path-a/path-b/path-c1 (length = 399)
write original file, I think

  Create  /coda/home/gdt/secret/path-a/path-b/path-c2
  Chmod   /coda/home/gdt/secret/path-a/path-b/path-c2 (mode = 644)

  Remove  /coda/home/gdt/secret/path-a/path-b/path-c3
  Rename  /coda/home/gdt/secret/path-a/path-b/path-c2 (to: /coda/home/gdt/secret/path-a/path-b/path-c3)
  Remove  /coda/home/gdt/secret/path-a/path-b/.pvect_path-c3
  Symlink /coda/home/gdt/secret/path-a/path-b/.pvect_path-c3 (--> e9c222dc)
The .pvect symlink to nowhere stores the IV for the file.
  Store   /coda/home/gdt/secret/path-a/path-b/path-c3 (length = 401)
It seems odd how the remove of c3 is before the rename and store; I
would expect rcs to create a new file and do an atomic rename at the
end to avoid losing the ,v file.

  Create  /coda/home/gdt/secret/path-a/path-b/path-c4
  Chmod   /coda/home/gdt/secret/path-a/path-b/path-c4 (mode = 444)
  Remove  /coda/home/gdt/secret/path-a/path-b/path-c5
  Rename  /coda/home/gdt/secret/path-a/path-b/path-c4 (to: /coda/home/gdt/secret-bbn/path-a/path-b/path-c5)
again there is the remove/rename out of order.

  Remove  /coda/home/gdt/secret/path-a/path-b/.pvect_path-c5
  Symlink /coda/home/gdt/secret/path-a/path-b/.pvect_path-c5 (--> 32e325ac)
  Chmod   /coda/home/gdt/secret/path-a/path-b/path-c3 (mode = 444)
  Store   /coda/home/gdt/secret/path-a/path-b/path-c5 (length = 1260)
writing the new RCS file.


So, I did 'cfs cs' and got a response, and was now in a conflict
state.  Running repair, I did checklocal and found no conflict.  (Not
surprising, since nothing had changed on the server, since no other
clients were active.)  I did preservelocal, which succeeded.  I then
did 15 more preservelocals without doing checklocal, and then typed
'end'.  repair hung for a while (a minute?) and then exited with an
ioctl error.  Venus had then unmounted /coda and was wedged.

Starting venus again, I got an assert in fso0.cc, line 282 (version
4.54 - I am up to date).
Then, since I needed to get work done, I ran 'venus -init'.


So, it seems that there is a problem when a server times out where a
conflict is labeled when there is no actual conflict.  The client
should have been able to just reintegrate the changes on reconnecting.
I suspect there is some problem where the server has received and
acted on a request but the client does not get the ack and thus
considers the request still outstanding.  Presumably this is the Store
above, since that was the first item on the CML.

There is a second bug, which is that this repair situation ended in
error rather than successfully causing reintegration.

On reinit, I found that the ,v file was 0 length, and thus was lost.
I decided it wasn't worth pulling it from backups, since I didn't
really need the history of this file (I'm a compulsive rcs user for
things not in cvs....).


I don't know if this will get better when the realms branch gets
merged and the repair code uses it, or how far away that is - this
problem is pretty serious in terms of making Coda awkward for doing
real work.
Received on 2002-09-17 14:21:39