Coda File System

Re: Heartbeat Failover

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Sun, 13 Jul 2003 17:50:02 -0400
On Sat, Jul 12, 2003 at 12:44:05PM -0700, Tim Hasson wrote:
> What if you had a 2nd 100mbps NIC in each of the SCM and the NON-SCM, 
> dedicated to CODA rep, with their own private network (say 192.168.0.1, and 
> 192.168.0.2) linked by a crossover cable and /etc/hosts would look like this 
> in both machines:
> 192.168.0.1      SCM
> 192.168.0.2      NonSCM
> and you setup Coda on both machines using those priv hostnames.

Then when a client asks for the location of a volume, it will get the
answer 'oh, it's replicated at 192.168.0.1 and 192.168.0.2', which is
pretty useless for clients that are not on your private backbone
network.

> 1. LVS-DR (Direct Routing) for load balancing and failover to the real
> servers (the coda servers) using 1 or more LVS Directors (Linux only
> for now)

Bad idea, in contrast with stateless NFS servers, Coda servers do keep
state around. Any (authenticated) connection is bound to a specific
server. When you redirect the packets to another Coda server, it will
respond with a NAK and the client will have to reconnect and revalidate
it's local cache with the new server.

> 2. IP-Takover, if you wanted 1 coda server to take over the other's ip
> if goes down by sharing a VirtualIP (LVS stuff) on the lo interface.
> The machine taking over an ip will watch if the other machine goes
> back up and change its IP back to the original.

No need, a client always sends all requests to all available replicas in
parallel, a dead server simply causes a 15 second 'stall' while we make
sure the request wasn't simply dropped by the network. At that point,
the server is marked dead and we perform an occasional background probe
to check whether the server came back.

> CODA of course, runs as root, so if there was a security hole in any
> of its listening daemon's, that could result in a remote root, which
> is not good. So my question is, has anyone at the least audited coda's
> services? Is it possible to firewall coda's ports and exclusively
> allow the trusted clients IP's?

There are not many reasons why it all daemons are running as root. The
biggest reason was probably because it was simpler.

There are currently no known bufferoverflows.

> Also, what happens if the SCM crashes?

Nothing serious, the only reason the SCM is special is that it is the
read/write copy of server-side databases (volume replication/location
and authentication).

If the SCM is dead and you want to add/remove a volume or user, you can
simply go to the other servers and set a new scm server name in
/vice/db/scm, start the rpc2portmap and updatesrv daemons on the new scm
and restart the updateclnt daemons on all other machines.

> Having setup venus on the scm like: venus-setup localhost 1000000
> and on the non-scm like: venus-setup scmhostname 1000000

Why not set the clients up like,

    venus-setup scmhostname,nonscmhostname 100000

But even the way you have it, when the SCM has crashed the client will
use the already cached rootvolume name. And if the rootvolume is
replicated it won't even need to have the actual volume data cached.

Consider that it is possible to start a Coda client without any network
connectivity to the servers, a crashed SCM really isn't a big deal.

> What if files changed on the non-SCM and the SCM later comes online, is the 
> integration seemless or do I have to manually reintegrate the changes?

If the volumes were replicated, clients will detect a version difference
between both servers and trigger server-server resolution.

> Which brings up the question, what is "realms"? is there documentation
> on it anywhere? I read something on the list about /coda/realm being
> the root volume instead of the older /coda. If so, what are the
> advantages over the old style?

A client doesn't need to access any server or have any cached
information to start up and mount /coda. There is also no client
configuration for rootservers, or authservers, just cache size. It is
also impossible to get a conflict on the /coda/ directory, which was
quite hard to repair because the conflict obscured the special file
that user applications (cfs/repair) need to talk to the Coda client.

Only when we access something like /coda/testserver.coda.cs.cmu.edu,
we use DNS to find the servers, get the name of the rootvolume, volume
location, and connect to the servers. I no longer need to reinitialize
my client to look at the testserver, main coda or a local experimental
setup. As my email is delivered in Coda, this made life a lot easier, I
don't have to redirect incoming email to a local temporary file while
I'm testing some new code on the alpha servers because my client can see
and access both realms simultaneously.

> I was able to finally create a 1GB RVM partition (and 30M Log) on both 
> machines (thanks to Jan for notes on how to fit it), using the following 
> options with rdsinit:
...
> On a P3 800Mhz, 256M RAM, 1.4GB SWAP, the fileserver startup time was about 
> 2min - 2.5mins, which I think isn't very bad. Is that Ok?

How much is the system pushed into swap at that point? If private mmap
is enabled, the server should be mostly paging parts of RVM, only dirty
page are pushed to swap. If you have close to a GB of swap usage this
was a 'slow' startup :) 2 minutes sounds short for a server that has to
push a Gig to swap, but FreeBSD's MM system is pretty fast so it is
possible.

>   starting address of rvm: 0x70000000 (1879048192)

Why did you move the start address up from 0x50000000? Was there a
conflict with shared libraries or something?

Jan
Received on 2003-07-13 17:53:10