Coda File System

Re: increasing realms found and other client weirdness

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 11 Mar 2005 10:25:05 -0500
On Fri, Mar 11, 2005 at 09:59:00AM -0300, Gabriel B. wrote:
> yesterday i let the client "uploading" some data to the server. today,
> it was hanging there. I canceled the cp command, checked the tokens.
> everything seemed fine

It sounds like your client got disconnected from the server
(reintegration conflict maybe?) and it ran out of log space to keep
track of the changes that still need to be reintegrated with the server.

It blocks all write operations, reads can still go through as long as
there are worker threads available. There's only 20 worker threads and
every blocked write 'consumes' one, so things can get to a grinding halt
pretty quickly.

> then, i restarted venus thinking it's some kind of cache. and started
> it again.

Venus has a persistent cache that survives restarts (and reboots) the
only way to clear the venus cache is to run 'venus -init' or to create a
file named /usr/coda/venus.cache/INIT, on Debian that would be
/var/lib/coda/cache/INIT, where the various paths are set to comply with
the Linux file system standard or whatever the name is.

> Well, i have only realm. one SCM, one line in /etc/coda/realms.
> But venus said it found 3. And i recall it saying 2 the startup before this.

There are 2 realms that are being used internally. One realm is the
'localhost' realm, and it is used to hold the various realm mountpoints
that are visible in the /coda directory. The other realm is Repair
related, for local/global repair we have to move locally modified files
out of the way so that we can show the version that is currently the
servers. So the locally modified files are relocated to the 'Repair'
realm until the conflict is resolved.

> I was copying about 7Gb of small files to a coda server that has about
> 20Gb of free space in /vicepa, 20M of rvm log, and 130M of rvm data.
> all of them are files since i can't repartition the server without
> knowing that we will use coda for sure.

I'm pretty sure that files are better nowadays. I've been using them
exclusively now that we moved all our servers to new hardware.

> btw, right after starting the server i send the SIGWINCH signal. but

SIGWINCH? Ah, I see, never actually used that myself. I just run
'volutil setdebug 10'.

> it generate nothing in the logs. On the other hand, my venus.log is
> more than 90M!
> 
> the lines right before the hand in venus.err are
> 05:01:30 Checkpointing p.www
> 05:01:30 to /usr/coda/spool/0/camboinha.servers_p.www.tar
> 05:01:30 and /usr/coda/spool/0/camboinha.servers_p.www.cml

That's a pretty good indication that your client is disconnected from
the server for some reason, as it is checkpointing a copy of the pending
changes (safety feature just in case your client dies, any lost changes
should be in that tarball).

> 09:07:58 DispatchWorker: signal received (seq = 2417081)
> 09:21:46 DispatchWorker: signal received (seq = 2417109)

And this is a user hitting '^C', it doesn't do much for venus though,
since there is no code that can unwind the worker thread that is
handling the 'aborted' request as it might have grabbed a couple of
locks. So venus ends up (eventually) completing the aborted opereration,
at which point the kernel will spit out some error message about an
unexpected reply.

Jan
Received on 2005-03-11 10:26:46