Coda File System

Re: The system crashed...

From: Lionix <lio_at_absium.com>
Date: Tue, 09 Dec 2003 02:39:09 +0100
Hy All !

>But in fact things have been improving steadily. I haven't really had
>many problems lately.
>

As Volume log was the origin of the trouble,  I bet everything will work 
fine for me when I'll change the size parameter....

>That sure looks like it,
>

Good news ! I can still understand C ! :-)
Hope one day I'll be able to write some lines in coda code...

>Now one question is, is this a replicated volume, or only a single
>replica?
>

I only work with replicated volumes...

>Hmm, the server was asked to create files, but the client never actually
>stored data into them. Possibly not-yet reintegrated.
>

Well I should stop venus until I get everything ok on the server side....
Hard to think well when you are tired : I did not stop venus because all 
clients where in disconnected state !
When server start up venus is certainly starting reintegrating pending 
operations....

>>    
>>
>You could create an entry for this volume in /vice/srv/skipsalvage, or
>was it /vice/vol/skipsalvage... Any case the content would be
>
>1
>0x1000013
>
>The 1 indicates that one volume id will follow, and then the volumeid
>that should be ignored during salvage. This should bring your server up
>with everything but this one problematic volume.
>  
>

Very good tip !!! Thanks....

>Hmm, old server maybe still running or something? killall -9 codasrv and
>retry.
>

Wooooops.... Seems I was really out....

> If you manage to get this one up and running without a problem
>and the 0x1000013 volume is part of a replicated volume you can get
>everything back up and running without too much hassle.
>

As the trouble impact multiple volumes (?) it certainly won't be fully 
fixed today...
Lucky, I'm not at production state of the project....
"Yes I know I'm in late, and No I don't know when I'll be ready,...Hope 
before 2004... Rome wasn't build in one day !"
( sigh )

>You would have to delete the corrupt replica (0x1000013), and then
>recreate it as an empty volume. If everything is done right,
>server-server resolution will simply copy everything back from the
>surviving replica.
>  
>

So the idea is :  try to get at least one replika up for each volume, 
skip salvage for corrupted replika,
 and use the valids ones to rebuild the others....

Sure it's less brutal than the way I was about to follow....

>You need to know... Replicated volumeid for 0x1000013, something like
>0x7f0000XX. The unique volumename, this depends a bit on what the other
>one happens to be named as. Lets say your replicated volume is 'volume',
>then one replica will be 'volume.0' and the other 'volume.1'. So you
>have to check (volutil info?) what the name of the the surviving replica
>is. Finally you need to know/decide where this volume should be stored
>(vicepa)
>  
>

No trouble, identifying and purging volume replika is an aspect of coda 
I had already to deal with ...
I generaly use volutil getvolumelist.... Or the powerfull couple 
bldvldb.sh - BigVolumeList on the SCM ! :-)

Now I'm looking at this file, I see I should have a bug in my volume 
creation script : before each VSG decay change, something get wrong, and 
I have some alpha-numerical volumes id.. .Did not check before ! Wooops 
again...
This typically means that I used same VSG for two or more createvol_rep 
command !
As servers had never complains about it, I push the work "backupvolume / 
new volume creation / volume restore / mount-point modification / old 
volume purge" lower in the list of to-do....
Should I reconsider this position because it could hurt coda behaviour ?

Thanks for your help and all this explanations...

-- 
Lionix
FS-Realm (newbee?) Administrator
Hundreds hours of work but so powerful !
And still have to work a lot !
Received on 2003-12-08 20:48:30