Coda File System

Re: Coda 5.4.19 slow hoarding

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Fri, 24 May 2002 17:17:45 -0400
On Tue, May 14, 2002 at 06:25:53PM +0200, Steffen Neumann wrote:
> We upgraded to 5.4.19 now 
> coda seems to bee hoarding very slowly.
> 
> Client cache is 1G. venus.log shows 
> a suspicious W-A-I-T sequence in its logs. 
> 
> What gives ? It seemed to be smoother a while ago.

Haven't really noticed any particular slowness myself, but I did notice
something in your logs.

> [ H(06) : 0020 : 18:20:09 ] WAITING(CommQueue) pri = 2, for 1 at pri 3
> [ H(06) : 0020 : 18:20:09 ] WAIT OVER, elapsed = 2.5
> [ H(06) : 0020 : 18:20:09 ] WAITING(CommQueue) pri = 2, for 1 at pri 3
> [ H(06) : 0020 : 18:20:09 ] WAIT OVER, elapsed = 0.2
> [ H(06) : 0020 : 18:20:21 ] WAITING(CommQueue) pri = 2, for 1 at pri 3
> [ H(06) : 0020 : 18:20:21 ] WAIT OVER, elapsed = 0.5

These waits are 'pretty normal'. Threads that are talking to weakly
connected volumes defer to 'higher priority threads'. The hoard daemon
is a lower priority thread, so it yields whenever anything else tries to
talk to any server.

The code is assuming that the weak connectivity is caused by a network
bottleneck at the client, and tries to avoid starting any new hoard
related RPC calls when there are other RPC's still active. Here we see
the hoard daemon being 'blocked' for 2.5, 0.2, and 0.5 seconds. All
these little waits will probably add up significantly.

> [ W(20) : 0000 : 18:20:25 ] *** Long Running (Multi)GetVolumeStatus: code = -2001, elapsed = 1382.3 ***
> [ W(20) : 0000 : 18:20:35 ] *** Long Running (Multi)GetVolumeStatus: code = -2001, elapsed = 1566.7 ***
> [ W(20) : 0000 : 18:20:45 ] *** Long Running (Multi)GetVolumeStatus: code = -2001, elapsed = 1108.9 ***

These are interesting, every 10 seconds there is a 'cfs lv' call that
goes to the server and takes more than a second. Not only should such an
RPC call be pretty fast (it's about 400ms over my PPP link from home),
It looks like you have > 1000ms ping roundtrip times between your client
and your server, which would explain a lot of the pauses that you see.

But as far as I know there is nothing in venus that triggers this
GetVolumeStatus call. Are you running 'watch cfs lv' or something that
tries to keep track of the # of CML entries? There is a '-local' flag
that avoid the server RPC if you are only interested in local volume
information like connected state and CML entries.

Jan
Received on 2002-05-24 17:17:51