Coda File System

Re: Come-and-go system with data replication

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Tue, 28 Oct 2003 12:14:04 -0500
On Tue, Oct 28, 2003 at 08:49:31AM -0000, josef.schwarz_at_bt.com wrote:
> > No you have to let the two servers started, running, in order them to 
> > replicate all data....
> > When one is down for a while, everything work corectly 
> > because there is 
> > still one up that could hang the work...
> > And when the down server get back up, he will syncronize and  get all 
> > new data automaticaly...
> 
> No, but the point is that I want it to be more flexible. The system
> shall not be dependant that the absent server comes back, another
> server should take over his position. 
> So that's not possible with Coda, is it?

That is perfectly possible.

One thing is that although Coda clients are built with weak and
intermittend connectivity in mind and can switch IP-addresses and such,
the servers really should not arbitrarily switch addresses and be
'available' as much as possible.

Some of the reasons are implementation issues, i.e. how servers announce
themselves to clients and how they handle conflict resolution. The other
is that clients have only a limited cache space and associated with that
only require a finite log for operations that were performed during
disconnection.

But servers have to deal with possibly hundreds or thousands of clients
that might be gone for several weeks (or get reinitialized and never
return). So the server can't really keep all that much state around. And
because we use a similar operation log (the resolution log) to resolve
conflicts between servers we would need an infinite log to be able to
deal with long disconnections.

These resolution logs are only truncated when we know that all replicas
are in sync. So having 3 servers, but only 2 available at any given time
won't help either.

> > - STOP thinking in IP... you are going in such a complicated 
> > way for a 
> > couple of day , getting aroud modifying conf files / commands !! I 
> > really think you are in a wrong way of work...  Try to make it work 
> > correctly,  before modifying things and go into  no-ways !
> > ( there is enough to do !!! )
> 
> Well, IP addresses rather than hostnames... It really seems that the
> set-up with IP addresses was no good idea. And that's one thing I
> don't understand - how one can design a system which can not handle IP
> addresses at the same time; it should not be that much effort, one
> check if it's an ip address and when it is one, skip the
> gethostbyname()?

The set up with IP addresses works just fine, in fact both clients and
servers internally only know about each other as a single IPv4 adddress.
However using IP addresses is a problem, it makes it impossible to deal
with things like multihomed hosts, or do failover when a machine fails.

Coda actually attempts to deal with these things by mapping the 'realm'
to a group of servers that will be queried for volume information. But
if you don't give it enough information to work and tell it the name or
IP address of just a single server, then the client won't be able to
handle some failover cases.

> > 'cfs lv /coda/172.16.1.1' and 'cfs lv /coda/172.16.3.1'
> > 
> > That mean you have two clusters !!!!!
> 
> Sorry, I don't know what you mean. I am not working with clusters. The
> reason that the two clients seem to be in different subnets is that
> the underlying vpn infrastructure requires it.
>
> But this really should not matter, since the appropriate files are
> adapted, and it is working basically.

I'll try to explain....

Before 6.0.0 Coda clients had to be configured to talk to a group of
'root servers' for a specific installation. This group is a proper
subset of all available servers.

When the client wants to locate some volume it can ask any of the
rootservers, if a server is unreachable it will simply try the next one
until it either runs out of accessible servers, or succeeds. The volume
location information contains information about which servers really are
carrying the various replicas of a volume.

With 6.0 the list of 'rootservers' is pulled out of DNS. The realm name
is used in a DNS query and the results are interpreted as all the hosts
that are able to respond to volume location queries. So you could use
several IN SRV records, or CNAME, or even IN A so that we can map the
realm name into a set of ip-addresses that will be used for volume
location information. Now if you can't modify DNS data, there is an
/etc/hosts like solution available in /etc/coda/realms.

The /etc/coda/realms file ofcourse has the same problems as /etc/hosts,
different clients might have different entries, and you don't get a
globally consistent view.

So what happens in your situation when you talk to /coda/172.16.1.1 is
that you are telling the Coda client that there is only a single server
usable to answer volume location queries for a realm named '172.16.1.1'.
Then when you access /coda/172.16.3.1, you tell the client that there is
a realm named '172.16.3.1' that also has only a single responsible
volume location server.

There is no way for the client to know that both of these are really the
same, in a way you've even explicitly told the client that they are in
fact different by using different 'realm names'. So the client dutifully
binds to both realms, and if you fetch an object in one realm it will
need to be refetched when you access it in the other. And when either
server goes down you completely lose access to uncached volumes through
the path related to that server.

Coda servers do get a bit confused by this, it tracks clients based on
the ip-address. So the server is just seeing this one client fetching
the same files several times, it doesn't really mind about that all that
much. But when something changes it will only send one callback message
to the client, and that is exactly where the problems start.

Because the client really approaches the server from 2 individual
instances, if you modify something under /coda/172.16.1.1/ the server
should really be sending a callback to the identical (but modified)
object in /coda/172.16.3.1/. If something is changed on the server by
another client, the server should be sending callbacks to both instances
instead of just one, etc. So both instances on the client will quickly
diverge and then when you try to write to the realm that hasn't seen any
callbacks, the server will reject the operation because it is performed
on stale data and the client gives up and declares a conflict.

Jan
Received on 2003-10-28 12:19:57