Coda File System

Re: Dir entry XX, should be in hash bucket X

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 1 Sep 2004 16:58:27 -0400
On Wed, Sep 01, 2004 at 02:18:18PM -0500, Troy Benjegerdes wrote:
> I saw the following in the venus logs.. is this an error on the server?

Not sure, it could be corruption during transfer.

> Dir entry 0x539c78a8, name Various Artists-ΑΊΖΊΣ.m3u, should be i nhash
					     ^^^^^
					     corrupt name??

> bucket 41, but is in 87DIR: 0x539c6f08,  LENGTH: 4096
> 
> HASH TABLE:
> (0 52) (1 89) (2 40) (6 27) (7 87) (9 91) (13 42) (15 73) (17 62) (24
> 16) (26 28) (29 41) (30 45) (31 70) (32 85) (36 56) (40 60) (43 46) (46
> 13) (47 75) (58 21) (68 29) (76 23) (78 67) (79 43) (87 77) (88 58) (98
> 50) (102 44) (105 79) (106 81) (108 48) (111 39) (112 31) (116 24) (117
> 37) (121 36)

I just looked at the source to see what this 'hash table' debug print is
about. It looks like we iterate through all the buckets and print the
bucket number and the number of entries in that slot whenever the bucket
is non-empty. First off, it looks like there is a very bad hash
algorithm, most buckets are empty and the ones that are non-empty seem
to have a lot of entries.

In any case, the it looks like the hash is based off of the name
    for letter in name:
	hash = hash * 173 + letter
    bucket = hash % 128

My guess is that the name somehow got corrupted, not sure if it is only
on the client, or whether the server copy is bad as well. The end result
is that the 'new' name hashes to a different value. The original name
probably hashed nicely to bucket 87.

What I find strange is that 5 bytes seem to be bad, not the typical 4
bytes which is typical when some code dumps an integer in the wrong
place.

It looks like the hex values are, 0xC1BAC6BAD3, all have the high bit
set. If I clear the high bits I get, "A:F:S", does that ring a bell?

I just copied the hash function into a little test program and...

    $ ./hash "Various Artists-ΑΊΖΊΣ.m3u"
    87
    $ ./hash "Various Artists-A:F:S.m3u"
    41

So it does in fact look like it is in the right bucket for the funny name.
Maybe the name isn't corrupt, but for some reason the hash function doesn't
get it right on your machine (char is a 7-bit datatype?).

Jan

/* Coda directory hash function */
/* compile with gcc -o hash hash.c */
#include <stdio.h>
#define NHASH 128
int DIR_Hash (const char *string)
{
    char tc;
    int hval, tval;

    hval = 0;
    while( (tc=(*string++)) )
        {hval *= 173;
        hval  += tc;
        }
    tval = hval & (NHASH-1);
    if (tval == 0) return tval;
    else if (hval < 0) tval = NHASH-tval;
    return tval;
}

int main(int argc, char **argv)
{
    int n = DIR_Hash(argv[1]);
    printf("%d\n", n);
}
Received on 2004-09-01 17:03:26