Coda File System

Re: Venus timeout variable?

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 6 Apr 2001 15:59:30 -0400
On Fri, Apr 06, 2001 at 06:56:07PM +0200, Steffen Schaefer wrote:
> Hi,
> my question: If the system gets disconnected, then a lot of time pasts by,
> till venus recognized that the network was down. Is it possible to change
> the timeout value, or is this a RPC2 specific parameter?

Yes, it is possible to change the parameter.
No, it is not adviseable to do so due to the way RPC2 works.

Whenever the client sends an rpc2-request, it might get lost (UDP is
unreliable). Most request can be serviced reasonably quickly, so instead
of sending an ACK when the request is received, the server just starts
processing the request and the rpc2-reply is the implicit ack.

So when a request (or reply) is lost, the client times out and
retransmits the request. Now if the server has already sent a reply
(i.e. reply was lost), it will simply repeat the reply. When the server
never saw this request before, the request will be processed.

Now there is a third case which is tricky. Not all request can be dealt
with quickly so the server might still be working on the request. In
this case the server reports RPC2_BUSY, which for the client is an
intermediate ACK, and it will wait for a full timeout period before
retrying (as the reply could have been lost).

The handling of the incoming messages and sending back RPC2_BUSY can be
thought of as being pretty much a simple operation, however Coda is
using cooperative threads with no preemption. So the thread that is
doing the long computation has to yield periodically so that the
rpc2_socketlistener can grab the incoming messages and reply with
BUSY's.

And sometimes the server doesn't yield often enough to avoid network
timeouts, this is in some cases due to excessive looping, or blocking
library/system calls. It is also related to network latency and how
soon the client gives up while waiting for the RPC2_BUSY reply.

So when you change something like the client's rpc2 disconnect timer.
It will actually introduce problems of clients switching to disconnected
operation for no apparent reason during f.i. server-server resolution,
or client-server reintegration, while breaking callbacks, or possibly
even during the set up of a new incoming rpc2 connection.

In general a 30 second timeout seemed to give the best balance between
giving up soon enough, and not giving up too soon.

Have you ever checked how long it takes for an idle TCP connection to
die. Especially when there are no ICMP errors, which is common when
there a network cable is pulled or breaks, or in wireless networks when
you walk out of radio range.

Jan
Received on 2001-04-06 16:04:21