Coda File System

problem reading files from coda when disconnected

From: <shivers_at_ccs.neu.edu>
Date: Tue, 01 May 2007 12:50:29 -0400
I am trying out the new coda release, not with success. Here's a description
of my failed effort.

- create a vice setup on a debian x86 box. The box, lambda.csail.mit.edu,
  lives in a real machine room with a fat pipe to the net.

- create a venus setup on an ubuntu x86_64 box with 
  + 10Gb of cache
  + big DATA & LOG:
    % ls -l DATA LOG 
    -rw------- 1 root root 1201865464 2007-05-01 12:07 DATA
    -rw------- 1 root root  300468736 2007-05-01 12:09 LOG

  This box lives in my office at Northeastern, about 2 miles away from the
  server over at MIT. It also has a university-grade net connection. It has
  4Gb of ram and 10Gb of (encrypted) swap space on 3 spindles.

  Here is the /etc/venus.conf:
    realm="lambda.csail.mit.edu"

    # 10 Gb of local file caching
    cacheblocks=10000000

    errorlog="/var/log/coda/venus.err"
    logfile="/var/log/coda/venus.log"

    rvm_log="/var/lib/coda/LOG"
    rvm_data="/var/lib/coda/DATA"
    cachedir="/var/lib/coda/cache"
    checkpointdir="/var/lib/coda/spool"

    pid_file="/var/run/coda-client.pid"
    run_control_file="/var/run/coda-client.ctrl"
    marinersocket="/var/run/coda-client.mariner"
    mapprivate=1

  DATA, LOG and the cache tree all live in my ext3 /var file system --
  no raw partitions of any kind being used here.

- copy 2.6 Gb / 26k files of my home dir into my coda filesys, using 
    cp -pr <stuff> /coda/lambda.csail.mit.edu/user/shivers/.

  This runs for a while, then completes w/no problem.
  I can poke around in the coda dir from the shell using cd, ls & more,
  no problem.

  Meanwhile, codacon is scrolling writeback messages like crazy. Eventually,
  everything is copied back to the server, cfs lv shows no pending CML
  entries, and a du -sk of the vice directory shows that it's got all 2.6Gb
  of the bits.

- hoard it all with
    hoard add /coda/lambda.csail.mit.edu d+

  Note that my 2.6Gb hoard should fit entirely within my 10Gb cache.

  This runs for a while, then terminates successfully.

- Test it by walking the whole tree & reading every file, using find(1):
    find /coda/lambda.csail.mit.edu/user/shivers -type f -exec /tmp/eat {} \; -print
  where /tmp/eat is a simple shell script that cats its args to /dev/null:
    #!/bin/sh -
    exec cat "$@" > /dev/null

  This runs & completes successfully. Codacon shows no server->client file 
  transfer during this. So far, so good! Now for the trouble.

- Test it again by saying "cfs disconnect" and then redoing the find
  tree-walk to read the whole subdir a second time.

  This runs along fine for a while, with a silent codacon, then codacon
  suddenly outputs

    ValidateAttrsPlusSHA CVS(4.7f000000.baf.28f8) [0] ( 11:56:43 )
    Probe ( 11:57:21 )

  and the find tree walk hangs. After a minute or two, codacon says
    unreachable lambda.csail.mit.edu ( 11:58:10 )
  and the find walk resumes with the following output:

    find: ./research/mrlc/mrlc/spim/CVS: Permission denied
    ./research/mrlc/mrlc/spim/mips-syscall.h
    ./research/mrlc/mrlc/spim/endian.c
    ./research/mrlc/mrlc/spim/buttons.h
    ./research/mrlc/mrlc/spim/mips-syscall.o
    ./research/mrlc/mrlc/spim/display-utils.c
    find: ./research/mrlc/mrlc/confpaper: Permission denied
    find: ./research/mrlc/mrlc/paper: Permission denied
    find: ./research/mrlc/mrlc/CVS: Permission denied
    find: ./research/mrlc/mrlc/source: Permission denied
    find: ./research/mrlc/mrlc/harness: Permission denied
    find: ./research/mrlc/mrlc/paper1: Permission denied
    ./research/mrlc/okasaki-msg
    .
    .
    .
  and the rest of the find walk has these "Permission denied" message
  scattered throughout the transcript.

- Then I go poke around in the file system. I now have trouble accessing
  the problem directories. For example:

    % ls -ld research/mrlc/mrlc/spim/CVS
    drwxr-xr-x 1 shivers nogroup 2048 2007-04-30 16:04 research/mrlc/mrlc/spim/CVS
    % ls research/mrlc/mrlc/spim/CVS
    ls: research/mrlc/mrlc/spim/CVS: Permission denied
    % cfs la research/mrlc/mrlc/spim/CVS
    research/mrlc/mrlc/spim/CVS: Connection timed out
    %

   Timed out? Hey, I hoarded the file and *disconnected*. Why is venus
   even trying to connect at all?

   Here is what cfs lv says while I'm in the disconnected state:
    % cfs lv /coda/lambda.csail.mit.edu/
      Status of volume 7f000000 (2130706432) named "/"
      Volume type is Replicated
      Connection State is Unreachable
      Reintegration age: 0 sec, time 5.000 sec

- Then I reconnect with
    cfs reconnect
  Now I can see the problem directories with no trouble. I redo the find
  tree-walk a third time and it completes with no problems.
  + codacon shows *no* server->client file motion. 
  + my network load meter shows only minor traffic.
  So I have reason to believe the whole tree walk ran entirely out of the
  cache.

I'm mystified. By the way, I don't think it's because I'm running on an
x86_64 client. I have gotten similar problems when running with my simple
x86 notebook as the client this week.
    -Olin
Received on 2007-05-01 13:56:58