Coda File System

Re: Building a coda "appliance"

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Sat, 30 Jun 2007 00:37:20 -0400
On Wed, Jun 27, 2007 at 10:36:35AM -0700, Yan Seiner wrote:
> I'm trying to build a pair of coda "appliances" - basically embedded 
> boxes with a VPN and coda server/client, each acting as a samba server 
> to its network.  The goal is to have two identical replicas of the same 
> data.
> 
> One side would be the server, the other would be the client.  Otherwise 
> the boxes would be identical.
> 
> I've got coda built and installed, and now I'm trying to map out my 
> approach.
> 
> The hardware consists of a 200 MHz ARM CPU with 32 MB of RAM.  The data 
> consists of approximately 300 GB of CAD files.
> 
> Is this enough RAM?  Can the RVM metadata be kept in a swap partition or 
> do I need physical RAM for it?

Sounds like your hardware is in the same ballpark as the linksys NSLU I
have at home. I guess it 'could' run a server, but I really haven't
tried.

The metadata is VM backed, so having swap space is definitely useful,
the server doesn't really care about physical ram except that swapping
will slow it down, which in turn would cause the client to switch to
disconnected or weakly connected operation even over a well connected
network.

We use a private mmap of the RVM data file, so in low memory situations,
clean in-memory pages are simply discarded and paged back in. Dirty
pages are written to swap.

A problem with your setup is that the box that runs the server will also
have to run a client in order to provide local access. But that will
mean that the meta-data is cached both by the server as well as the
client and possible some in the kernel and in the samba daemon. And I
think that would really get a bit tight with only 32MB of memory.

Other problems are that clients connected to the samba daemon won't be
able to repair conflicts, so a conflict is pretty much deadly in such a
setup (and in Coda's optimistic model unavoidable).

Also unlike samba and nfsd daemons, the Coda servers are stateful, they
remember which clients fetch a copy of what objects and send callbacks
if any of the files change. Every callback requires a bit of allocated
memory, with many files x many clients it does add up, but in your case
you'd only have two, maybe three, clients.

Our old Coda deployment ran on reasonably modest hardware, the Coda
testserver was a Pentium 90 with 64MB of memory and it didn't really
have much trouble, although it did have swap, rvm log, rvm data and
the file data (/vicepa) on separate spindles (4 scsi drives)
The main server group used to consist of something like PII 200Mhz with
128MB of memory, but again we spread swap, rvm log, rvm data and file
data across different disks.

> Also, how should I structure this so that all the data is available to 
> both sides, even in the event of a VPN failure?  (These boxes would be 
> pretty much on opposite sides of the globe, so I can't really be sure 
> the VPN will be available 100%.)  This means that the client would have 
> to actively hoard all the data?  Is that practical?  Or should I use a 
> different approach?

It really depends on how many file objects you are talking about. If
each file is 1GB, then we're just talking about ~300 files and I would
not see any possible problem hoarding everything.

If each file is ~4KB then I don't think it is feasible (at the moment)
the client won't be able to keep all the metadata in memory and it will
basically bring the device to a virtual halt in a swap frenzy about
every 10 minutes during the hoard walk.

Have you considered a setup that periodically mirrors or syncs both
sites with something like unison or rsync. I just think that if your
clients are going to be using a stateless filesystem to access the data
on the appliances, they would just suffer from the drawbacks of Coda's
weaker consistency model (no file locking, files becoming inaccessible
due to conflicts) without really benefitting from Coda's features
(persistent local disk cache, fast access to cached file data, writeback
logging and log optimizations, directory ACLs for access control).

Jan
Received on 2007-06-30 00:40:16