Coda File System

Re: Bottlenecks in Coda

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 29 Dec 2003 15:06:08 -0500
On Thu, Dec 25, 2003 at 06:45:06PM +0000, Joerg Sommer wrote:
> firstly, I wish all a merry christmas.

And a Happy New Year :)

> http://www.inter-mezzo.org/docs/bottlenecks.pdf Does anyone know this
> paper? What's about the things mentioned in it? Are any improvements
> done for this? Or are any improvements planed?

Ofcourse I know about this paper. Peter ran these tests while he was at
CMU, in the office across the hall from mine :)

The numbers are right, but I do not necessarily agree with all the
premises or conclusions. First of all, we're clearly not competing with
local disk filesystems. Those do not have to deal with replication,
centralized storage, and some do have lower consistency constraints.
Especially comparing write performance with ext2 which is asynchronous
is a bit unfair. Ext2 doesn't care when bits hit the disk, people can
only look at the data through the ext2 filesystem. When I write a file
in Coda the underlying filesystem (i.e. on the servers) has to be
consistent across all replicas. We also don't have an extensive fsck
that can fix up various inconsistencies on the servers if there was a
power failure, so we only perform synchronous operations on the servers.

But it is nice to know where the 'theoretical limits' of performance
are, I don't expect to ever be faster than the underlying filesystem we
use for our cached data. Most of these tests were on 2.0 (or early 2.2)
kernels. One of the main culprits for the bottleneck, the context
switch, has improved tremendously over time. Several factors contribute
to this, the basic kernel tick got bumped from 100Hz to around 1000Hz
and processors have become faster by leaps and bounds. There is also
more memory available for caching.

Some changes in the kernel module have helped as well. When a file is
opened, instead of passing the (device,inode) number pair, we pass an
open filedescriptor for the container file. This is only a small change,
but when we are fetching a file from the server we used to write the
data to the container file, fsync the bits to disk and then pass the
device/inode numbers down. Now we can hand the still open file
descriptor and in most cases the we can be reading the still in-memory
data before the bits have actually hit the disk.

The numbers seem to indicate that the basic 'ls -lR <dir>' test was a
recursive ls on the Coda sourcetree. Now I probably don't have the exact
same tree in /coda, but I'm getting...

    client: P4 3.06GHz, 1GB RAM, udma5 IDE, 100Base-T network
    OS:	    Linux-2.6.0-test7
    server: Identical to Peter's test server
    test:   ls -lR on coda-4.6.0 (516 directories, 1548 files)
    cold cache: 3.09s
    warm cache: 0.05s

The numbers for local disk (ext2) are:
    cold cache: 3.57s
    warm cache: 0.03s

We're pretty much on par in case of a cold-cache, I unmounted /
remounted the local filesystem to get rid of any cached data in the ext2
cache, and used cfs flush for Coda. The slight speed difference (in this
case Coda being faster) can be attributed to the fact that the disk
caches on the server were in fact still warm and we probably never even
had to touch the disk on the server.

Peter's numbers were,
    client: PII 266MHz, 80MB RAM, IDE, probably 100Base-T network
    OS:	    Linux-2.1/2.2?
    test:   ls -lR on unknown tree (~300 directories, ~1500 files)
    cold cache:	0m9.710s
    warm cache:	0m0.500s

So currently we are about 3 times faster on a cold cache and are in fact
approaching the speed of ext2 pulling the same data off the local disk.
This is nice because the network, server hardware and server OS haven't
changed over the past years. Viotti is basically still running the same
RedHat 5 release modulo some security fixes. So on the server side, any
improvements are attributable to better Coda server code.

We're about 10 times faster for the warm cache case, which is not such a
big surprise because my current CPU is approximately 10x faster. On the
other hand my client cache is probably about 10x larger as well which is
slowing down my client significantly because it has more objects to
manage and there are still codepaths that are O(n^2), comparing every
cached file to all other files. We're probably still about twice as slow
as ext2, but 0.02 seconds isn't such a big deal.

Best of all, we got all this by simply cleaning up code instead of
trying to optimize :)

Jan
Received on 2003-12-29 15:13:58