Coda File System

Re: Red zone, stalling writer

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 22 Sep 2005 13:27:37 -0400
On Thu, Sep 15, 2005 at 12:00:31PM +1200, Jeremy Bowen wrote:
> I've had a few repeats of an issue while testing coda lately.
> I'm getting into a situation where the following message is repeated every 
> second:
> eg.
> Red zone, stalling writer ( 11:36:41 )
> 
> Having a quick look at the code (coda-src/venus/vproc.cc line 572), this could 
> be due to the last case regarding a dirty cache.
> 
> 	redzone = !free_fsos ||
> 		  free_mles <= MaxWorkers ||
> 		  free_blocks <= (CacheBlocks >> 4); /* ~94% cache dirty */
> 
> I'm adding and deleting a large number of temporary files in order to
> run some benchmarks on the filesystem so I guess this could cause the
> cache to become exhausted. 
> 
> Once I get into this situation, it doesn't seem easy to get out of without 
> killing the client and server :-(
> Does this sound normal ? Is there anything I could do to mitigate the effects 
> of this ?

If you get the 'red zone' messages, you're already past the 'yellow
zone' ones where every write was delayed for a couple of seconds.

The problem seems to be that your client is not pushing updates back to
the server and is running out of local storage space to store the dirty
state. The slowing down/stalling is done in the hope that this will
allow the background reintegration thread to push some of the dirty
state back to the servers.

However in some cases the reintegration will not proceed. This happens
when the servers are unavailable, or when the servers detected some
conflict and are blocking reintegration. Without the yellow/red zone
logic which is slowing down or even stalling further local mutations,
your Coda client would have crashed. But really it is only a symptom of
the real problem which is that reintegration not making progress fast
enough to get rid of the local dirty state.

Of course you could trigger the problem if the cache is relatively small
compared to the files you are creating. If you have a 500MB cache and
try to store a 600MB file, it will block everything until this file has
been pushed back to the servers. After that, the client will refuse to
fetch the file because it won't fit in the local cache.

Jan
Received on 2005-09-22 13:28:39