Coda File System

Re: maximum cache size in venus

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 10 May 2004 14:57:51 -0400
On Mon, May 10, 2004 at 01:16:31AM -0700, Steve Simitzis wrote:
> are there any practical limits to the size of venus's cache? i've been
> running caches in the GB range, and i seem to have problems as soon as
> df -i /coda hits 94%. (it also seems impossible to go above 94% for
> reasons that are unknown to me.)

I don't know. My cache is typically in the order of 200-250MB.
> 
> for example, with a 4GB cache, i've been getting these errors:
> 
> [ W(74) : 0000 : 08:34:20 ] fsdb::FreeFsoCount: counts disagree (174762 - 163840 != 10921)
> [ W(74) : 0000 : 08:34:20 ] fsdb::FreeFsoCount: counts disagree (174762 - 163840 != 10921)

That does seem like a correct message (174762 - 163840 = 10922 and not
10921) but I don't know why it happened. Venus is telling us that it has
a maximum of 174762 files and 163840 are considered used, but there are
only 10921 on the freelist.

So there is one object that is neither considered in-use, but it also is
not found on the freelist. It is probably this 'missing object' that
triggers the crash.

> if i bring the cache down to anywhere between 1.9GB - 3GB, i start
> seeing these errors, in addition to the others above:
> 
> [ W(34) : 0000 : 00:53:52 ] fsobj::StatusEq: (5a684608.7f000008.93726.504e9), VVs differ

I believe this is somewhat normal, it simply means that when we are
fetching a file, we notice that it had changed in the mean time. So we
update the status to reflect this.

We count cache-sizes in 'blocks' so overall the size could be 2^31 *
1024 (or 512), but I'm sure there are places where we actually work with
this number in bytes, in which case >2GB would overflow a simple
integer.

> venus crashed within an hour of hitting 94% on the 3GB venus. i haven't
> been able to crash the 1.9GB venus yet, though i'm still keeping my eye
> on it. (i purposely chose the number 1.9GB because it's less than 2GB.)
> 
> also, as soon as i hit that magic 94% mark, i start seeing lots of these,
> whenever the cache is in the GB range:
> 
> [ W(34) : 0000 : 01:05:01 ] fsobj::StatusEq: (5a684608.7f000003.1.1), LinkCount 10 != 266

Hmm, that first number is the local number, the second is what the
server reported. Locally we use an unsigned char, which maxes out at
256, which matches the datatype that the kernel is using. But the
client-server protocol uses an RPC2_Integer, which clearly can be
larger.

266 is precisely 256 + 10, which would match a wraparound on the link
count field. But... we don't have cross-directory hardlinks, and I find
it hard to believe that you would have so many links to one object in a
single directory.

Jan
Received on 2004-05-10 15:05:14