Coda File System

metadata, logs and files

From: Peter J. Braam <braam_at_cs.cmu.edu>
Date: Mon, 9 Feb 1998 11:30:37 -0500 (EST)
OK what is RVM? What is RVM data and RVM log? Where are the files.

Let's only talk about the server first.

RVM stands for recoverable virtual memory.  The RVM data file is memory
mapped into VM at the startup of the server.  The data file holds all
metadata and directory contents on the server.  Typically this data is
written to a raw partition, but a file .e.g /rvm/DATA could be used too,
but it is much slower - see below. 

The regular files are stored in a "partition" or directory declared in
/vice/db/vicetab in a tree structure, and referenced by a number.
Ideally we would reference file directly by inode number. In particular
the structure of the files stored on the server does not share the same
naming and directory image as it does on the clients -- the naming and
directory contents is in RVM.  Typically the files are stored under
/vicepa. 

Updates to metadata are made through a transaction system which is part of
RVM.  The contents of the VM image of the RVM data file are changed after
a "rvm_begin_transaction" routine.  But before an address range is
modified, Coda does rvm_set_region(addr, len) to copy out the old values.
In that way if the transaction fails, it can be restored.  When the
transaction commits, the RVM data on the disk is not updated directly but
an RVM LOG record is written to disk, which contains the new values for
all the regions set during the transaction.  This log record is FLUSHED to
disk -- and this explains why having the RVM LOG on a raw partition on its
own disk is important:

 - the writes are sequential and append only (until wrap around), so using
a disk by itself prevents arm movements
 - by using a raw partition we eliminate arm movements arising from
updating things like mtime of the LOG. 

There is another file which suffers from heavy syncing.  This is an
auxiliary database file in the /vicepa directory named FTREEDB.  Putting
this on a large partition is not good since Linux fsync syncs the entire
partition, not the file.  I intend to eliminate this database entirely --
so optimizing it using RVM is probably not worth the hassle.

The client uses a similar scheme.  Here the LOG and DATA files are in
/usr/coda/{LOG,DATA}.  There is one very big difference. The transactions
in the client are no-flush transactions.  This means that a bunch of
transactions is batched and the LOG is appended only every few minutes or
so.   The strong consistency guarantees are supplied by the server which
has flush transactions.

Write back caching (you can invoke it already in a primitive form with cfs
writedisconnect)  will make a very large amount of modifications on the
client and then re-integrate them as one large server transaction.  We
expect to get much better performance out of this -- for example 1000's of
creates could be done in one server mega transaction. 

I hope this is helpful!


- Peter -

On Mon, 9 Feb 1998, Steven N. Hirsch wrote:

> 
> 
> On Sun, 8 Feb 1998, Peter J. Braam wrote:
> 
> > Steve, 
> > 
> > What's in your /usr/coda/etc/vstab?
> > 
> > Are you using a LOG file or a raw log partition, same for data on the
> > server?  What is the size of that data file?
> > 
> > Things are slow (and a solution is in the making) but what you tell us is
> > terrible.
> > 
> > - Peter -
> 
> 
> As I commented in my posting, the log and metadata are on their own
> partitions.  Which data file are you asking about above?  I admit to being
> confused, as I apparently have log information in one partition, file
> metatdata in another partition, and the actual files themselves (??) under
> /usr/src/vicepa.  Perhaps an overall definition of terms would be
> appropriate?
> 
> The /usr/coda/etc/vstab file reads thusly:
> 
> /coda:/dev/cfs0:cy.steve.net:/usr/coda/venus.cache:20000:1
> 
> Let me know what other information you may need.
> 
> Steve
> 
> 
> 
> > 
> > 
> > On Sun, 8 Feb 1998, Steven N. Hirsch wrote:
> > 
> > > Under Intel Linux, I've yet to get through my basic networking test:
> > > 
> > > - Copy linux source tree to coda volume
> > > - Do 'make mrproper'
> > > - Do 'make xconfig'
> > > (etc.)
> > > 
> > > Copying the source to the server takes almost an hour (and, yes, I do have
> > > the metadata and log on their own partitions).  This is incredibly slow..
>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> > > Make mrproper fails on the first pass because it still thinks that
> > > directories it has just cleaned of files are busy.  
> > > 
> > > If I clean by hand, and attempt xconfig it fails with random corrupted
> > > files.
> > > 
> > > Smaller applications will build (most of the time), but something's broken
> > > somewhere.  Any ideas where to look?
> > > 
> > > Steve
> > > 
> > > 
> > 
> > 
> > 
> 
Received on 1998-02-09 11:32:19