Coda File System

Re: a "compatibility" feature which achieves the opposite of its goals :-/

From: Ivan Popov <pin_at_medic.chalmers.se>
Date: Thu, 2 Feb 2006 11:55:47 +0100
Hi Jan,

it is inspiring that you are aware of the problems.

On Thu, Feb 02, 2006 at 12:18:20AM -0500, Jan Harkes wrote:

> The kernel couldn't be
> allowed to cache inode information, which pretty much kills performance.
> The directory cache entries can't exist without valid inodes, so even
> lookups ended up always going back to venus. There was also a race when
> several local users are doing a near-simultaneous lookup, because the
> lookups are uncached they take long enough to get the second one waiting
> for the first one to complete and it then skips the revalidation/upcall 
> (normal filesystems don't have different stat/lookup results based
> on the calling user) and returns the wrong information for the second
> user.

Ohh no!! Again some assumptions about filesystems' behaviour hardcoded
into the kernel... Wonder if we face similar problems on non-Linux
systems as well?

A possibility would be to let a fetched file belong to the local uid
who caused the fetch, then possibly to the local uid who updated the file.
Of course it _should_NOT_[*] mean anything for the access semantics,
but at least it would calm down the wrongly paranoid programs.
Hopefully it is also more compatible with the Linux kernel internals.

[*] no process should be able to go directly into the cache and modify
a container file, even if the file happens to belong to the uid.
Protecting the whole cache from all non-root uids should make the trick.
Best if the owner information would not reside in container file
inodes at all.

> > more consistent with global filesystem usage, namely stat() to return
> > the local uid of the calling process as the file "owner".
> 
> But why would the calling process be the correct userid. In fact
> anything that tries to use chown will probably want to set the ownership
> to something that is explicitly _not_ the calling user, and as a result
> still fail as horribly as it does now.

In my particular case the chown() comes into the picture when the application
detects that a file has a wrong owner, and it presumably just plainly assumes
it is setuid root and tries chown() to the "right owner" which is in fact
the caller's uid. So, the proposed change in stat() would eliminate
a try to chown()-to-the-calling-uid. Even otherwise chown() could become
harmless if the kernel returns success on a no-change-chown().
A test on Linux 2.4 shows that indeed a no-op chown("file",<sameuid>,<samegid>)
succeeds as a non-root on own files.

> As far as I know, the AFS way was to use a modified ls binary which uses
> an ioctl to map the filesystem identities to usernames. i.e. instead of
> relying on the /etc/passwd, NIS, or LDAP database lookups, ask venus,
> which asks the server, which performs a lookup in the pts database.

It is a cosmetic feature which alas does not help "compatibility".

AFS exposes the server side "uid" on the client, then applies workarounds
to partially undo the harm. Unfortunately, such kind of workarounds is not
going to help Coda - we can not postulate replacing "ls" on the clients with
a Coda-specific version as it will definitely collide with some OS-specific
ones, some AFS-specific ones :) and possibly other specific ones...

Then again, my particular problems do not stem from "ls" producing unsuitable
output. It is on stat() syscall interface level.

> It is similar to how Coda shows the ACLs, the getacl RPC returns the
> usernames as strings because there is no simple mapping to local
> userids.

The output format of "ls" is by design Unix-centric so Coda does it right
by letting a _different_ application do the job.

> Since Coda doesn't really use identities, maybe everything should be
> mapped to a fixed local identity (coda/coda), but allow chown by anyone,
> but not actually forward this change to the servers. So the setting will
> only survive for as long as the object is locally cached, which
> hopefully will be long enough to avoid problems with applications that
> expect UNIX semantics.

An elegant idea, but I expect that the number of applications
which automatically try to fix the ownership is a small fraction of those who
just check and bail out.

> Not having to update owner/group/mode attributes would reduce write
> traffic and as a result there is less potential for conflicts.

That is in fact a nice side effect of not storing the local client's "owner ids"
on the server.

Note that the "file owner" is traditionally used in a Unix system as the means
of attribution, say for tracing misbehaviour - though even in Unix
the identity who changed the file contents is not necessarily the "owner"!

I think Coda should [continue to?] store the identity of the last writer
to a file, and show it via something like "cfs ls". That is a metainformation
missing in Unix, but definitely useful. Of course, the identity displayed
should be a string with a meaning in the corresponding realm.

(a thought, crossrealm mountpoints make it impossible to see from the path
which realm the file belongs to, so the realm should be indicated as well...)

I hope this annoying issue can be fixed soon, 
Coda is just beginning to show its potential, let's clear the way for it.

Thanks for the great work.

Regards,
--
Ivan
Received on 2006-02-02 05:57:11