Coda File System

Re: Codasrv crash - Reason?

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 7 Sep 2004 16:32:06 -0400
On Tue, Sep 07, 2004 at 04:26:35PM +0200, Markus Wiesecke wrote:
> I am running a codaserver which I have inherited from a colleage leaving
> the group. Thus I am not very experienced with this stuff. 
> >From team to time, the codasrv crashes, and I do not see any reason for
> I, for example tonight. The CodaSrv-Process was still running, but no
> longer answering queries.
> The LogFile from the time reads as follows (sorry if I am pasting to
> much, but I do not want to miss the point):

I don't really see any indication that it crashed, the fact that you
could shut it down with the init script means that it was still
accepting connections as we use 'volutil shutdown' which sends a
shutdown RPC command and not a signal.

I'm not entirely sure why it is getting NAK messages from the clients
and why there is no indication that clients are trying to rebind to the
server.

> 18:37:22 Callback failed RPC2_NAKED (F) for ws 129.70.139.98:32811
> 18:37:23 Callback failed RPC2_NAKED (F) for ws 129.70.139.45:2430
> 18:57:24 Callback failed RPC2_NAKED (F) for ws 129.70.139.61:32771
> 00:01:25 Callback failed RPC2_NAKED (F) for ws 129.70.139.44:2430
> 00:01:27 Callback failed RPC2_NAKED (F) for ws 129.70.139.164:2430
> 00:01:27 Callback failed RPC2_NAKED (F) for ws 129.70.139.75:2430
> 00:01:27 Callback failed RPC2_NAKED (F) for ws 129.70.139.166:2430

> Can you see any reason for the crash? The IP 129.70.138.34 belongs to a
> laptop, which I suppose that it was shut down at the time the RPC2_DEAD
> appeared in the logs - may this be a reason for a crash?

RPC2_DEAD simply means that we were unable to reach the client, so that
would coincide with it being a laptop that is taken offline. The NAK
messages indicate that the client doesn't think the connection was
active anymore so they are more worrysome. It almost looks like the
server is still able to talk to the clients, but the clients are unable
to connect back.

Since none of the network related state is stored persistently, a
restart should fix most if not all of the problems. I just wonder what
the problem is.

Is there anything in /vice/srv/SrvErr?

Jan
Received on 2004-09-07 16:33:07