Coda File System

Re: coda endian problems??

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 14 Jul 2004 16:50:02 -0400
On Wed, Jul 14, 2004 at 12:49:20PM -0500, Troy Benjegerdes wrote:
> > The last backtrace let me to believe some RPC2 thing wasn't endian-safe,
> > but this makes me not so sure. Any ideas?

Actually the problem there was that we passed a (struct ViceFid) instead
of a (struct ViceFid *) into the RPC2 layer. The problem with multirpc
calls is that there is no type-checking on the arguments we pass down.

> > (gdb) bt
> > #0  0x4017c116 in sigsuspend () from /lib/tls/libc.so.6
> > #1  0x080c8738 in SigChoke (sig=354089116) at sighand.cc:241
> > #2  <signal handler called>
> > #3  0x0807509e in fsobj::Open (this=0x50a36d08, writep=0, execp=0,truncp=0,
> >     cp=0x151b1e18, uid=1000) at fso.h:686

This has been reported by others, I don't think I have found the actual
cause for this crash yet.

> I just got another one of these after doing a
> 'cunlog; ctokens'

Just tried, doesn't do anything for me, it seems like it doesn't even
remove my current tokens which kind of surprises me.

> Also, if I kill venus and restart it with 'venus -init', I wind up with
> /coda mounted twice.. should we have a check in venus to unmount and
> remount coda? Or should the kernel module do something different? I am
> using the coda.ko module that's in 2.6.7 debian kernel packages.

The kernel stopped complaining about mounting on top of an already
active mountpoint. But it is still preventing us from unmounting a
filesystem with active references. So it is often not possible for
venus to unmount the old tree, especially if we just crashed while
handling an upcall (which implies at least one active reference).

> Is it reasonable to have a new venus reconnect to an already-mounted
> /coda? Or are we just out of luck if venus dies?

The problem with reattaching is that venus doesn't know which files are
still considered open. So we don't know when we can safely throw things 
out of the cache, or when to write data back to the servers. Maybe it is
possible to make the kernel to kill/disable all active references, or
pass a list of currently open files back to userspace when we reconnect.

That does seem like a lot of effort for something that (hopefully)
doesn't really happen all that often.

Jan
Received on 2004-07-14 16:51:29