Coda File System

Re: Is Coda Right For Me.

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 16 Oct 2002 16:56:33 -0400
On Wed, Oct 16, 2002 at 04:07:45PM -0400, Kevin Atkinson wrote:
> > Actually mp3v1 tags are typically at the end of a file, so you'd have to
> > wait for the whole file anyways. 
> 
> Um no, you can seek to the end of the file and only read the end.

Not if the fileserver only gives you the data as a stream, which is more
efficient for the server as it reduces head seeks on the disks and
protocol overhead on the network.

> > Same thing with my mp3 car radio (riocar/empeg player) It stores a
> > database of all the tags in a separate file, so that it doesn't have to
> > scan through every file on the disk to show a playlist, or do searches.
> 
> This approach has its advantages such as not having to seek as much but 
> for a general purpose file system reading the first couple of bytes of a 
> file is generally not a problem.

It does become a problem when you're talking about reading about 500
bytes out of 10000 files vs reading a 5MB file. It's the difference
between 30000+ disk seeks or a only a handful. The problem is that most
files are several MB, and the tags are at the end. So one seek gets us
the inode, one seek for the indirect block, and one seek to get to the
data block, and then we only need to read one disk block while the
kernel actually grabs a full 4KB page. And this is assuming we don't
have read the directory or the double or triple indirect blocks.

And userspace is blocked all this time, because the kernel can hardly
predict which file is going to be accessed next, so you get the full 
seek time (~20ms?) hit for accessing each separate file.

With the separate tag database, the kernel can simply stream through the
file and do readahead in the background, while userspace is going
through the data it just got back from the last read call. So you'd
typically do not even have to wait for actual disk IO to complete, it is
already there waiting in the page cache.

> > If you're only working with blocks of a couple of seconds, split the
> > huge 1Gig files in blocks of a couple of seconds, problem solved.
> 
> Um that is a lot easier said than done.  Once again this is a workaround 
> due to the limits of the Coda file system which I do not think should be 
> necessary.

You'd be throwing the baby out with the bathwater. You simply cannot get
rid of the session semantics without at the same time losing the
functionality that Coda provides.

If you want a replicated system with UNIX semantics, you need to start
from the ground up. It requires strict locking across all nodes that
share the same data and a lot of inter-node communication, which in turn
requires a fast and reliable network. Coda is completely at the other
side of the spectrum, it works pretty decently in the face of slow and
unreliable connections and uses optimistic update strategies without no
locking, combined with passive detection of conflicts.

Jan
Received on 2002-10-16 17:01:37