Coda File System

Re: Coda deployment

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 12 Jun 2003 13:14:19 -0400
On Thu, Jun 12, 2003 at 06:31:50PM +0200, Dick Kniep wrote:
> On Thu, 2003-06-12 at 17:09, Jan Harkes wrote:
> > On Thu, Jun 12, 2003 at 01:22:03PM +0200, Dick Kniep wrote:
> > > So, to avoid these kind of problems, could I disconnect the servers and
> > > regularly reconnect them to allow for the replication? (every hour or
> > > so) or do I get into another mess? Result will off course be that I will
> > > have some conflicts if people start working on a document on both sides,
> > > but I think that is a lesser evil, and it will only occur seldom.
> > 
> > You'll be fixing up conflicts the rest of your life and hate me :(
> 
> I don't understand this. Certainly if we synchronize every 45 minutes,
> it would be impossible for the users to work on both sides? So, synching
> conflicts would occur only sporadic?

Not really, servers don't really know anything about replication. They
are mostly independent and unaware of each other. Replication is a
meta-thing that is only really observed by a client. So when a client
writes to a file, he writes individual copies of that file to all
(available) servers, same thing for directory operations. A server
simply bumps it's own entry in the version-vector. After the individual
operations are completed, the client informs all replicas that were
involved where the operation succeeded. At this point the server bumps
the parts of the version vector appropriate for the other server that
have a current copy.

That is really all that a server does. Now, when some client tries to
access the updated file or directory, it first gets status information
from all (available) servers and compares this. If the version vectors
are identical, we know everything is consistent and will fetch the data
from the server with the best connectivity. If it is not we will tell
the servers with different versions to 'resolve the conflict', which is
not efficient and can validly fail for many reasons (file ownership,
access permissions, objects moved between directories) and in some cases
fail because of bugs in the implementation of the resolution process.

Now if a server disappears and then reappears, the client has to
reestablish connectivity. It will use a quick volume version check to
see if anything has changed in a volume, and if that fails it will fall
back on reestablishing callbacks for the individual files in the volume
in batches of 50 objects with ValidateAttrs. Any object that fails
validation will have to be examined later by getattr. For hoarded (aka
sticky) objects this happens during the next hoard walk which occurs
about once every 10 minutes, for all other objects this only happens
when a user actually tries to access the file.

So if you stay disconnected for 45 minutes, and then reconnect for 15
minutes, the volume checks will probably fail. The validateattrs will
single out anything without a problem and the hoardwalk should trigger
resolution for hoarded objects. But the rest of the files will not get
'resynchronized' unless some user actually tries to access them, but by
that time you're likely already disconnected.

> Furthermore, if the speed of connection is below 50 KB (which it most
> certainly is!) then Coda switches to weak network connectivity, which

So it would already get bogged down by the fact that every Coda client
will try to get attributes for files from both locations, and when the
server is brought back, all clients will try to reestablish callbacks to
every cached object. And anything that is different and does require
resolution will load the link even more, probably causing various
clients to temporarily go disconnected (happens as soon as it takes more
than 15 seconds for a server response to arrive), and when those client
recover they will start bashing the link again trying to revalidate all
cached objects.

What you want it some sort of a 2nd class replica/proxy server at one
site that operates like a trusted middleman for all local clients that
want to access remote data. It can cache files to avoid unnecessary data
traffic across the slow link and will forward callbacks for the local
clients. It might even be responsible for feeding updates back to the
master. However, Coda does not provide this functionality, and I don't
know of anything else that does. There is a chance that Intermezzo can
do it as every client supposedly works like a server.

Jan
Received on 2003-06-12 13:17:02