Coda File System

Re: Yellow zone, slowing down writer

From: Jason A. Pattie <>
Date: Thu, 19 Feb 2004 14:37:25 -0600
Hash: SHA1

Jason A. Pattie wrote:
| Jan Harkes wrote:
| | It means that your client is write-disconnected and performing more
| | operations faster than it can reintegrate with the server. If we didn't
| | slow down the writes, your client would end up using up the rest of the
| | reintegration log quite quickly and most likely crash. If your client is
| | not reintegration at all (possibly caused by a conflict) it will at some
| | point reach the 'red zone' and block any mutating operations. This is
| | pretty fatal in itself, because each blocked operation will use up a
| | worker thread and there are only about 20 of those.
| |
| |
| |>I did an 'cfs strong' command, but that didn't change anything.  Is it
| |>possible that the cache is getting full (only 400MB) and needs to purge
| |>least recently used entries or something and so it's taking lots of
| |
| |
| | I've said this many times before, there is no such thing as guaranteed
| | connected operation in Coda. If anything goes wrong during a write/store
| | operation the client will silently switch to write-disconnected
| | operation (logging state). If the server is slow to respond we switch to
| | a logging state. And reversely, when the client can't be reached by the
| | server, the server triggers the disconnect were are likely to switch to
| | a logging state.
| |
| | The only thing that cfs strong does is prevent the client from listening
| | to the often incorrect 'bandwidth estimates' from the RPC2 communication
| | layer, so that transitions only happen in error cases and not based on
| | incorrect estimates. In fact, if you were already write-disconnected
| | before calling cfs strong, the client will never discover that the
| | network actually has good bandwidth and will never transition to the
| | connected state.
| So where do I begin the process of troubleshooting what went wrong?  I'm
| currently in the Red state.  The scenario is that I'm on the actual file
| server (scm) transferring files from one location on another partition
| to the /coda/<realm>/<dir> location using tar.  I.e., the file server is
| running both codasrv and coda-client (venus).  Is this a bad thing?
| Should I only connect from a remote venus instance (i.e., my laptop,
| etc.)?  I would think this would be a scenario that never gets
| "disconnected".  I also don't see how anything could "go wrong" in this
| scenario, especially since I'm copying data that shouldn't have changed,
| but I can delete all the data in the directory and start the copy over.
| ~ Technically, I did have a previous copy of the data in the directory
| before I had created users and groups that matched those of the original
| directory.  My assumption was that I could reissue the tar command and
| replace all the files with the correct ownership and permissions.  If
| this is not the case, then I can easily delete all files and start the
| copy over again.

I ended up shutting down coda-client (venus).  I was not able to start
it up again (I think it actually crashed and gdb hung forever), so I did
a 'venus -init'.  I deleted all the data in the directory and started
the copy over again.  This time, when I saw that it was checkpointing
and reintegrating changes and some were succeeding and others were
failing, I typed Ctrl-Z on the tar to pause it while coda caught up.
Then when I saw EndDataWalk, I fg'd the process and let it continue.
This seemed to have much better results (although I had to wait quite
some time for it to catch up each time I paused), and the tar process
was finally able to finish.  I paused it about 2 or 3 times to wait for
it to catch up.  Now I'm diff'ing the original directory with the coda
copy.  Hopefully, all the data will be there.


- --
Jason A. Pattie
Xperience, Inc. (
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Debian -


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
MailScanner thanks transtec Computers for their support.
Received on 2004-02-19 15:43:41