Coda File System

The system crashed...

From: Lionix <lio_at_absium.com>
Date: Fri, 05 Dec 2003 03:30:05 +0100
 Hi all, Hi Jan

Did not thought I would write so soon....
Hope you will forgive me for disturbing.
( Hope too I'm not anoying all of you ! )

I'm currently using rsync to transfer data from nfs to coda....
Browsing the ML I know that it's not the better idea  :
http://www.coda.cs.cmu.edu/maillists/codalist/codalist-2003/5122.html

Change date of codaproc2.cc was 9 months ago so it's in 6.0.x builds.
And I can't consider spamming the ML as a "politicaly correct" solution.
But I recognize i'm writing a lot last weeks... :o)

So I started playing on cache size parameter, having a look at the load 
average....
 Resolving problems as I could...  practicing and trying to improve...

For the first time I get a server side problem during rsync.

====SrvErr====
No waiters, dropped incoming sftp packet
No waiters, dropped incoming sftp packet
[....]
No waiters, dropped incoming sftp packet
Assertion failed: l, file "recov_vollog.cc", line 309
EXITING! Bye!
==========

OK ! Some troubles in recovering volume log....  
Let me bet and correct me if wrong...
Something like this function return a pointer, to the volumelog, after 
he tried to grow it ( grow index ? ) because he wants  to insert a new 
transaction log ?

====SrvLog end up with a
23:17:06 GetAttrPlusSHA: Computing SHA 1000013.188b2.da5a, disk.inode=7233
23:17:06 GetAttrPlusSHA: Computing SHA 1000013.188b6.da5b, disk.inode=7234
23:17:06 PutReintegrateObjects: stale directory fid 
0x7f000014.212b.696c, num 0, max 50
23:17:06 PutReintegrateObjects: stale directory fid 
0x7f000014.2133.696e, num 1, max 50
23:17:06 PutReintegrateObjects: stale directory fid 
0x7f000014.2137.696f, num 2, max 50

Venus was reintegrating and tring to log transaction...
I did not changed log resolution on the replicated volume, as venus had 
never complained of a lack of space during my firsts rsyncs....I'm still 
using default parameter... It was next parameter I was about to play 
with, now I understand what it is, I 'll perhaps be obliged to start 
here....

I tried to restart codasrv on one of the two servers....
 Perhaps a bad idee....

======SrvErr
Assertion failed: size == s, file "recov_vollog.cc", line 386
EXITING! Bye!

Uhu....  SalvageLog function... don't understand too much but I see 
something browsing volume log to track if space could be freed othevise 
tryied to increase Logsize no ?

==== having a look at SrvLog :
00:43:13 Force salvage of all volumes on this partition
00:43:13 Scanning inodes in directory /vicepa...
00:43:14 SFS: There are some volumes without any inodes in them
00:43:14 SFS:No Inode summary for volume 0x1000001; skipping full salvage
00:43:14 SalvageFileSys: Therefore only resetting inUse flag
00:43:14 Entering DCC(0x1000002)
00:43:14 DCC: Salvaging Logs for volume 0x1000002
00:43:14 done:  57 files/dirs,  62 blocks
00:43:14 SFS:No Inode summary for volume 0x1000003; skipping full salvage
00:43:14 SalvageFileSys: Therefore only resetting inUse flag
00:43:14 SalvageIndex:  Vnode 0x2ae has no inodeNumber
00:43:14 SalvageIndex: Creating an empty object for it
00:43:14 SalvageIndex:  Vnode 0x2b2 has no inodeNumber
[...]
00:43:14 DCC: Salvaging Logs for volume 0x1000010
00:43:14 done:  239 files/dirs, 1841 blocks
00:43:14 Entering DCC(0x1000011)
00:43:14 DCC: Salvaging Logs for volume 0x1000011
00:43:14 done:  133 files/dirs, 4940 blocks
00:43:14 Entering DCC(0x1000012)
00:43:14 DCC: Salvaging Logs for volume 0x1000012
00:43:14 recov_vol_log::SalvageLog: Block 3 could be freed
00:43:14 recov_vol_log::SalvageLog: Block 4 could be freed
00:43:14 recov_vol_log::SalvageLog: Block 5 could be freed
00:43:14 done:  1002 files/dirs,        238783 blocks
00:43:14 Entering DCC(0x1000013)
00:43:22 DCC: Salvaging Logs for volume 0x1000013

Reading this it seems I wasn't syncing only one volume....

An other restart failed with the same SrvErr message.... And same 
feather for SrvLog
Should I continue untill he succeed starting :- ?

I then went to the other server, restarting hoping puting him at same 
state his "brother"  but he's still refusing to start with no errors..
Freezing on the :
02:55:33 Main thread just did a RVM_SET_THREAD_DATA
02:55:33 Setting Rvm Truncate threshhold to 5.

I think a way to solve the problem would certainly be someting like 
truncating / reinitialising rvmlog on both serveurs.
And then reinit venus... Any uncommited / finished transaction would be 
lost....

Any comments would be appreciate.
Thanks

Lionix
FS-Realm (newbee?) Administrator
Hundreds hours of work but so powerful !
And still need to work
Received on 2003-12-04 21:31:57