Coda File System

venus problems over slow links (bw estimates and reintegration)

From: Greg Troxel <gdt_at_ir.bbn.com>
Date: Sat, 23 Jun 2001 12:24:54 -0400
With recent coda cvs on fairly recent freebsd RELENG_4:

notebook at home on ethernet, with 28.8 ppp link (both ends freebsd)
to ethernet with coda server.  Things generally have been working ok.

I put a 100kish tgz in coda, and untarred it.  I think I was write
disconnected at the time.

I noticed that I was going disconnected during reintegration.  'cfs
cs' would go back to wd, and try to reintegrate.  The bandwidth
estimates got way bigger than the 2500 that might be justifiable, and
the reintegration timed out.  This happened repeatedly.

I then munged the bw estimation code in venus to clamp at 2000, since
I didn't want to stay disconnected all weekend.  venus then
reintegrated fine.  I didn't try stopping/starting venus; it seemed
like my clamp didn't get hit though.

tcpdump showed huge blasts of packets (10 in a row) going out to the server.

So, it seems like RPC2 needs selective acks and TCP-friendly
congestion control :-)  (<duck>Either that or actual TCP.</duck>)

I also saw some 2980 byte packets, while 'ls -l .'.  These got
fragmented, but it worked anyway.

While I'm rambling, it would be really nice if hoarding could be
configured to run very slowly.  I wonder how well the bw estimation
code really works, and if having one be able to configure 'if < 10000,
assume 2000' might not be helpful.  Then have hoarding only use 10% of
the available bw.  Another thought is to have venus record the bw
estimate at failure and divide by two, refuse to go above that for 10
minutes or one CML is handled, whichever comes later.
Another thought is to limit rpc2 to one outstanding packet instead of
going disconnected, and stay in that mode for a while.
Received on 2001-06-23 12:24:58