Coda File System

Re: venus randomly dies

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 1 May 2003 09:48:54 -0400
On Thu, May 01, 2003 at 04:26:59AM -0700, Steve Simitzis wrote:
> lately, i've watched venus randomly die on one of my clients. it seems
> to take place in the middle of the night, when it's getting used the
> least. i'll restart venus, and it will continue to run along without
> any problems. i'm running venus with maxclients set to 100, fwiw.

Simply increasing the number of worker threads is not a cure all. The
system becomes unbalanced.

There is a limit of nine RPC2 connections per user. So now you have 100
threads trying to gain access to 9 RPC2 connections. So most of them
will end up waiting to get access to the network, and there is a good
chance that some threads will be starved. I'm not sure the wakeup
mechanism is fair, it probably just wakes up everyone and the first one
who is scheduled in grabs the connection. All other threads go back to
WAITING.

> just before it dies, it spews about 30,000 lines of "WAITING" and
> "WAIT OVER" in a matter of a minute or two.

So we probably have ~90 threads in WAITING state, who are woken up
whenever an RPC operation finishes and ~89 of them go back to WAITING.

> any suggestions about what the problem could be? it seems to happen
> roughly once each week or so. packet loss has been suggested as a

The tcpdump was indicating that there is packet loss. Not very
significant, but somewhere between .1 and 1% which is quite considerable
for a reliable switched link. Ofcourse the loss could occur at the
server side, it might not grab incoming data quick enough.

> suspect otherwise. also, this apparent suicide seems to take place in
> the middle of the night, when the traffic is otherwise minimal.

Actually....

> [ H(06) : 0657 : 04:03:10 ] HDBDaemon just woke up

Just add '/coda' to the list of PRUNEPATHS in /etc/updatedb.conf.

Jan
Received on 2003-05-01 09:52:01