Coda File System

Re: Doiung the repair limbo

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Mon, 9 Dec 2002 11:09:15 -0500
On Mon, Dec 09, 2002 at 10:38:21AM +0100, Steffen Neumann wrote:
> I just did a repair session (again on non-important data)
> and found a few things. Coda is some CVS Version after 5.4.19
> 
> First repair/beginrepair laptop_liste failed with an error message 
> 
> 	Gaspra:/coda/vol/ai/share # ls -l 
> 	ls: bin: Connection timed out
> 	ls: doc: Connection timed out

Disconnected from the server?

> 	repair > beginrepair laptop_liste
> 	Too few directory entries
> 	Could not allocate replica list
> 
> Even though it created the mixed view laptop_liste/local and
> laptop_liste/global. The point I had missed was that I didn't have the
> *correct* token for the user that had caused the conflict. I'd like to
> see some more elaborate checking of user/tokens and intuitive error
> messages.

Yeah, the error messages such, but the repair tool really doesn't know
that it is a token issue, it is probably just getting ETIMEDOUT, or
EPERM type errors.

> Second I have a conflict for a file that never existed:
> 
> 	10:21:11 Local inconsistent object at /coda/vol/ai/share/laptop_lows.c, please check!
> 
> This looks like either a problem with the print output,
> where ows.c overwrote ...iste/ or laptop_liste had been 
> overwritten earlier, which is probably worse.

Maybe it is just a simple problem in the PathWalk code that is used to
create the path. I believe it reconstructs the path to the object from
the tail back to the beginning, when it hits the 'top of the tree' it
inserts '\0/coda/'. Maybe it exited earlier on an unexpected error and
never finished the expansion. I haven't looked at this is a while, so I
could be wrong.

> 	Gaspra:/coda/vol/ai/share/laptop_liste # cfs lv .
> 	  Status of volume 0x7f000004 (2130706436) named "coda.vol.ai"
> 	  Volume type is ReadWrite
> ==>	  Connection State is Connected
> 	  Minimum quota is 0, maximum quota is unlimited
> 	  Current blocks used are 156644
> 	  The partition has 1723716 blocks available out of 4787924
> 	  Write-back is disabled
> ==>	  *** There are pending conflicts in this volume ***

You are right that this is a very unusual state, in fact we shouldn't be
able to enter Connected state while there are pending CML entries, and
you shouldn't have conflicts when everything is reintegrated. Can you do
'cfs ck' to snapshot a checkpoint file. Maybe it will give an error
(i.e. there is no CML). Or it will actually manage to write the .cml and
.tar files, interesting to know where that ows.c file really is located.
And the log might show some thing.

Maybe it just 'forgot' to clear the conflict marker and everything has
reintegrated just fine. Force write-disconnected mode, create a new
empty file (touch foo), and reconnect?

Jan
Received on 2002-12-09 11:11:43