Coda File System

Coda connectivity lossage

From: <shivers_at_cc.gatech.edu>
Date: Tue, 20 Jul 2004 15:15:29 -0400
I am trying to set up a coda filesys. My experience is that when the
system works, it's very nice. This is, however, rare. Mostly the system
acts in flaky & sensitive ways that require constant intervention. Can
anyone advise me?

I am running all the latest stuff: release 6.0.6-1 on both client & server.
Server & clients all run on linux boxes. The server has 1Gb of RVM, in a file,
not a partition, due to hints I've seen in the docs & on the mailing list
about paging, mmaping, etc. (1Gb of RVM is an undocumented option handled by
the vice-setup scripts.) The RVM log is 25Mb, on a raw partition. The files
live in a 400Gb ext3 filesys on /vicepa. For the initial trial, I started with
my personal music collection -- 9Gb of mp3 & flac files. The mp3 files are
roughly 1-5 Mb each; the flac files are 5-30Mb each. So: a small number of big
files.

The server is sitting in a real machine room, with real network connectivity:
gigabit ethernet to a lan, and a real pipe to the internet from there.

I set up several clients:
  CM: Client "CM" is a linux box sitting behind a standard home cable modem.
  It has constant connectivity ranging up to 1Mb/s.

  LOCAL: Client LOCAL is on the server; no network in the picture.

  WAN: Client WAN is a linux box sitting on an ethernet in my office at 
  Georgia Tech -- no cable-modem between it & the Net.

I mention the cable-modemness of the client connection, because Jan has posted
earlier saying that the asymmetry of cable-modem bandwidth confuses coda's
bandwith measurements -- it assumes incoming bandwidth equals outgoing.

-------------------------------------------------------------------------------
Failure 1:

The first thing I did was copy 9Gb of music files into my coda fs on the LOCAL
client -- that is, the files were copied from a local ext3 filesys into a coda
filesys *on the machine where the coda server runs*. This worked, thought it
was a little weird to see "red zone -- stalling blah blah" messages on such
net-less operation.

I was able to access these files from client CM & WAN in onesies & twosies
with no problem. Then I tried, on client WAN (that is, the client that
communicates with the server over a long-distance Internet connection, but
doesn't have a wimpy cable modem connecting it to the Net):

    find . -type f -exec md5sum {} \;

At first, it ran like a champ. Then it didn't. Here's the tail of the
recursive md5sum walk:
    4c507b84f2191ad0c9e8921e0f543ac7  ./affection/cd.db
    152c03eacd67ba3f28462abcacd85453  ./affection/track08.cdda.flac
    8cd0cf505aa05c7bebbac9fa94560289  ./affection/track08.cdda.mp3
    841ab15fa7bde3e019c06f7b0394351d  ./affection/audio_02.inf
    9e6b48f2d7fe648f83bec2a221a8e5d8  ./affection/track11.cdda.flac
    ae7ca0a4a359a8f92d9a079a7cc8e364  ./affection/audio_12.inf
    0ccaf063cbf89f7345955257d96134ad  ./affection/cdp-q
    4c1d9cc34e790c8fb52975a9973ce10d  ./affection/track13.cdda.mp3
    4f9377b72053579bcb80e59bb5ad610e  ./affection/audio_10.inf
    44e6c8a3596b59f589d934c624141e8d  ./affection/track07.cdda.flac
    9bea35fbbb5f679a8de559bdfd37bf6c  ./affection/track10.cdda.mp3
    4b0fb02ca5944804cc403b6ff1f3797a  ./affection/audio_01.inf
    md5sum: ./affection/track05.cdda.flac: Connection timed out
    find: ./affection/audio_08.inf: Connection timed out
    find: ./affection/track05.cdda.mp3: Connection timed out
    find: ./affection/audio_11.inf: Connection timed out
    find: ./rampal1: Connection timed out
    find: ./rampal2: Connection timed out
    find: ./sleepbeauty+toyshop: Connection timed out
    find: ./porter-on-mind: Connection timed out
    find: ./th-md5s: Connection timed out
    find: ./mozart-horn-concerti3: Connection timed out
    find: ./th: Connection timed out
    find: ./algreen: Connection timed out
    find: ./bush-story: Connection timed out
    find: ./mozart-wind-concerti: Connection timed out
    find: ./oconor-piano: Connection timed out
    find: ./eagles-hits: Connection timed out
    find: ./anything-goes-yoyo: Connection timed out
    find: ./beeth-piano1: Connection timed out

Again, note that this lossage occurred on a system with no cable modem, and
presumably symmetric bandwidth to the server.

-------------------------------------------------------------------------------
Failure 2:

On client CM -- the system connected to the Net via a home cable-modem --
I attempted to copy a really large dir of media (about 200Gb) into my coda
filesys:

    cp -prv . /coda/lambda.csail.mit.edu/shivers/music-npx

It managed to copy about 15 files, then codacon began blatting out 

    Red zone, stalling writer ( 00:33:35 )

messages, and then the client went write-disconnected.

  % cfs lv ~/c
  Status of volume 0x7f000000 (2130706432) named "coda:root"
  Volume type is ReadWrite
  Connection State is WriteDisconnected
  Minimum quota is 0, maximum quota is unlimited
  Current blocks used are 10324696
  The partition has 371998976 blocks available out of 382693232
  Write-back is VIOC_STATUSWB: Invalid argument

Note the weirdo final line -- "VIOC_STATUSWB: Invalid argument"? What's that? 

During this time, the net connection was completely solid. I mean, I might
have gotten less than my nominal 1Mb/sec, but the connection was always there.

So the real-world operation of coda here is that if you start writing a lot of
data, you disconnect, and then your writes just fail. So you can't ever count
on some operation actually working; it could very easily fail mid-stream.

-------------------------------------------------------------------------------
Failure 3:

On client CM -- the one connected via cable-modem -- I also did a

    find . -type f -exec md5sum {} \;

in the coda dir holding the 9Gb of music. It won for a couple of files, then
began to barf out msgs like this

    md5sum: ./thelonious/track03.cdda.flac: Connection timed out
    find: ./thelonious/track03.cdda.mp3: Connection timed out
    find: ./thelonious/audio_17.inf: Connection timed out
    find: ./thelonious/track07.cdda.mp3: Connection timed out
    .
    .
    .

-------------------------------------------------------------------------------

What I find, in general, is that I cannot rely on file ops completing. Apps
that access my coda files sometimes win and sometimes seem to drive the
system into disconnected state, and then I must go through a
    cfs wr
    cfs cs
    cfs lv .
dance to reconnect. This happens when I am on a client with a completely
stable connection to the ethernet. We are not talking phone lines here.
This essentially renders coda unusable.

I tried jacking up the timeout & retry values on the client and server to
see if that would help. Maybe it did, some. But I am still definitely losing.

I also tried doing a
    cfs strong
I don't have a super-clear idea of how this would affect my operation --
the one-line description with the cfs doc is that it prevents the system
from ever going into weak-connectivity mode, but that doesn't mean it would
prevent the system from going write-disconnected. In any event, when I do
this, my client becomes more or less coda-catatonic.

Some questions:
1. Am I doing something wrong?

2. Do other people lose in this way? / Are other people winning?
   I do not see similar reports on this mailing list. Is it that no one
   is hammering on their servers with big files? Is it that no one is
   connecting via cable modems? I don't have a good feeling for how many
   people are really using coda and in what configuations.

3. Is coda not ready for really big repositories (800Gb filesys, 1Gb rvm
   metadata)?

4. Any advice at all?

I'm a little dismayed to be losing at such a simple stage of useage. I'm
not having problems with reintegration conflicts or any of the real voodoo.
I'm getting hosed just reading & writing files *while connected*.

BTW, I'm also surprised that coda is having problems with asymmetric network
connections like cable modems in 2004. The lion's share of mobile connections
these days at private residences is through connections of this sort.
    -Olin
Received on 2004-07-20 15:48:17