Coda File System

Re: coda client hangs

From: Ivan Popov <pin_at_medic.chalmers.se>
Date: Tue, 24 May 2005 18:31:52 +0200
Hi Patrick,

On Tue, May 24, 2005 at 09:43:32AM -0600, Patrick Walsh wrote:
> > I did not observe that before. Was that the clients doing updates or those just
> > accessing Coda readonly?
> 
> 	The clients doing updates.

At least some starting point to look for the reason.
Another conclusion is that as long as you do not do updates on
"serving" clients, your "readonly" service will be relatively stable.

> > in server response time and possibly disconnect. It might lead to unexpected
> > conflicts, but I did not see crashed or hangs because of that.
> 
> 	There shouldn't be conflicts since each machine writes to its own
> directory.  We haven't seen any conflicts in quite awhile.  I haven't
> noticed disconnects either...

That's good.
I did not ever see a similar pattern... :-/

> > You may want to try the modular one, see if it helps and otherwise motivate me
> > badly to fix it :)
> 
> 	The modular one?  I just searched through the coda source tree and I
> only found one clog.c.  Where is the alternative version?  How is it
> different?  Or am I misunderstanding something?

It is not in the Coda sources, as it is not merged into.
I have reimplemented clog in 2003 and it is available as a patch.
You may look at

/coda/konvalo.org/sw/pm/1/TOP/c/coda/V/cvs20050303/L/1/BUILD/patch.modular-clog

and also at the corresponding

/coda/konvalo.org/sw/pm/1/TOP/c/coda/V/cvs20050303/L/1/NOTES

I was using the code since then and did not find evident problems,
it should be at least as reliable as the old one, and quite a bit
more capable. You may want to set up an "authentication options announcement"
service on your servers, for conveniency, or just supply all relevant
information on the command line in your cron scripts. Be sure to list
all of your tokenservers (authentication servers) as clog tries to talk
to all of them and uses the first one which responds, providing a level
of failover - unless the server crashes in the middle of the transaction.

See

http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2004/6824.html

> > I would not just restart venus after a crash, but reinit instead - as rvm
> > state is probably corrupted and you can expect another crash or other
> > weird behaviour.
> 
> 	Hmmm.  OK.  Should I just do this as standard practice in my startup
> script?  What's the negative?  I lose my cache and startup times are
> slower?

Startup is in fact faster as venus does not analyse the cache.

Opening files is slower as long as they are not in the cache.
The servers and the network are carrying a heavier load while populating
the cache.

> 	Yes, that would certainly work in our setup although it isn't pretty at
> all and i wouldn't want the servers rebooting all the time.

It is ugly of course and I would not reboot the servers - it will possibly
cause side effects like coinsiding updates leading to either reintegration
from clients or resolution between the servers.
Both may potentially add to your troubles.

So if your updating clients happen to reside on servers, I'd possibly
move them out, as it may solve several problems at once...

> > In an extreme, your company might consider contributing to the development,
> 
> 	I wish we had the spare manpower...

I meant rather investing part of the money that would go to a commercial
product otherwise... There is probably people who would happily hack Coda
instead of doing less exciting things for living? :)

> 	By the way, I've done an analysis of all programs that keep files open
> on coda and have made sure that this no longer happens (at least in the

A good move.

> idle case).  I'm waiting anxiously to see if we get more signal 11's...

Me too.

Regards,
--
Ivan
Received on 2005-05-24 12:33:16