Coda File System

Re: Adding a replicating server

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 21 Apr 2005 17:11:19 -0400
On Thu, Apr 21, 2005 at 01:37:45PM -0600, Patrick Walsh wrote:
> > btw. turning the resolution log back on is done with,
> > 
> >     volutil -h $oldserver setlogparms $replicaid reson 4
> 
> 	In another post I saw you had this:
> 
> volutil -h $scm setlogparms $replicaid reson 4 logsize 8192
> 
> 	Do I not need to specify a logsize anymore?

I think that if you do not specify the logsize it will just turn
resolution back on with whatever logsize the volume had when it was
initially created.

If you do want to set the logsize, you have to make sure that the newly
created replica is set to the same size, I suspect that the resolution
code might expect that all replicas are able to track the same amount of
history (but I could be wrong on that).

> > The only solution right now is to either flush every object that
> > exists in the resized volume which purges the stale volume information
> > from the client cache, or to reinitialize your clients. So for me with
> > my single testvolume the following worked,
> 
> 	If I understand correctly, flushing the cache can cause bad things to
> happen if you have writes that haven't been committed.  Would this be
> considered a safe process for flushing a volume:

Uncommitted writes and resizing is something that I don't really want to
think about just yet and right now is probably asking for trouble. There
isn't a way to completely flush a volume with pending operations (or if
there are open files/directories) since any of those cases keeps objects
pinned in the cache. As long as there are objects holding a reference
count on a volume we keep the volume structures around.

That is why the other change I just committed works better, when we
trigger the client to refetch volume information (cfs checkvolumes), it
compares the returned replica list to the cached data and corrects the
internal structures that represent the volumes and replicas. Also any
such change will disconnect all MultiRPC2 connections to the old server
group and the next operation will create a new connection to the new
group. The code to do all this was written a while ago, but it was never
reached because we exited as soon as we discovered we already had the
volume information cached. I'm pleasantly surprised that it actually
worked the first time.

The only thing needed now is to make the server inform the client that
it should recheck the volume information so that connected clients can
discover the new replica.

I have to do some more tests, if clients pick up the change after a
disconnection, I can have the server tear down all incoming rpc2
connections and the resulting reconnect would do the trick. A more
subtle solution would be to use the volume callback message as a
trigger to revalidate volume information, that doesn't cause a full
disconnect and would only affect clients that actually use the volume.

Jan
Received on 2005-04-21 17:12:28