Coda File System

Re: Recurring conflicts occur when moving new files to CODA

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Wed, 5 May 2010 11:48:37 -0400
On Wed, May 05, 2010 at 07:25:04AM +0200, Wolfgang.Liebich_at_siemens-enterprise.com wrote:
> When I move some files into a directory on my coda volume, I get a
> local/global conflict (I only have one server for this volume), and
> venus.log says
> 
> [ W(17) : 0000 : 07:08:39 ] ClientModifyLog::MarkFailedMLE: failed reintegrating: chown <filename>

Interesting. Well, chown is a 'priviledged' operation, I think as far as
Coda is concerned only members of the group System:Administrators are
allowed to use it. But you are probably copying files as root, or with
an application that just uses chown and figures it will fail, return an
error that is ignored and be a noop when we're not root.

But with Coda it works a bit differently, because a client cannot be
sure if your Coda identity is a member of the System:Administrators
group. So it dutifully logs the chown operation and acts as if it
succeeded, but then when you reintegrate the server rejects the
operation causing a conflict, because your client's state is now
based on the false assumption that chown worked.

There are some other odd things around ownership, files are intially
'owned' by the local userid, and then after reintegration the owner is
changed to Coda's internal 'Coda userid' value, which confuses some
applications (OpenOffice.org) because they believe they cannot write to
the file if the uid doesn't match and don't use something like access(2)
or simply try to O_RDWR which would both work.

So maybe in the long run it should just be in Coda's best interest to
start _completely_ ignoring user id values. We already mostly ignore
modebits, only owner 'r--' bits are interpreted as overriding the write
directory acl and we explicitly strip setuid bits in both client->server
and server->client directions. Not replacing the uid with some internal
Coda uid which has no relation to anything happing on the clients when a
file is created or written to seems to make sense to me. Because clients
don't have synchronized /etc/password files and pretty much everyone's
local UID on their own machine is 1000 so almost every file is probably
going to end up with uid 1000. This could lead to some end-user
confusion.

What to do with chown is probably a slightly harder decision. One option
would be to always return permission denied on the client, but some
applications may actually check the return code and fail badly. Another
option is to report success, but not actually change the uid, which I
think will break rpm and dpkg. The third option is to just let anyone
chown any file (that they have write permission for), because if they
control their local machine and have write ACL rights they could have
written to the file with whatever uid they please, effectively changing
the uid to whatever they want if we aren't replacing them with the
internal Coda userid anymore.

This is something I need to ponder over a little longer.

> The "console" log says
> 
> 07:08:39 Reintegrate: users:liebichw, 2/3 records, result = Permission denied
> 
> When I try to do a repair, I get the error message "pathname not
> leftmost".

In this case, local-global reintegration conflict that failed on a
permission denied error, you probably just want to run 'cfs discardlocal
/path/to/volume'. It will drop only the first entry in the modification
log, which should be the rejected chown operation.

> When I call "cfs beginrepair" and then "cfs endrepair", I kill venus.
> 
> Last log message in console:
> 
> Assertion failed: 0, file "fso1.cc", line 1388
> 
> WTF ????
> Dazed and confused,

Turning a file/directory into a dangling symlink and then into an
expanded directory with the original file as a child really messes with
some if the internal parent/child linkage. When repair ends all the
temporary files and directories are removed, but the original object
isn't immediately linked back into the directory hierarchy. There are
various reasons for this, it may have been discarded or moved during
repair and some of the necessary information got lost during the
expansion process.

The client relies on path traversal from the root to the conflicting
object to reconstruct the right linkage, but in some cases when there is
an active reference to an in-kernel directory cache entry we don't
actually get to see every single directory lookup and when you tried to
expand again we did not yet have the conflict linked back into the tree
so we fail to find the parent directory.

Jan
Received on 2010-05-05 11:48:55