Coda File System

Re: venus randomly dies

From: Steve Simitzis <steve_at_saturn5.com>
Date: Sat, 3 May 2003 14:07:52 -0700
On 05/02/03, Jan Harkes <jaharkes_at_cs.cmu.edu> wrote: 

> I saw 04:0 something and immediately thought 'updatedb run'. It is a
> pretty decent exercise for Coda's cache replacement code. And clearly
> there are several problems still hiding there somewhere.

i'm not sure if this is helpful or not to you, but the same problem
happened again (venus dies, 30,000 lines of log spew) - except the cause
was somewhat different than updatedb trolling the filesystem.

at 11am every morning, our site puts up fresh content. typically, our
users will rush to go to check it out as soon as it comes out. since
no one has actually viewed the content, it's likely that the files will
not be in venus's cache at 11am. so venus suddenly has to handle this
rush, plus all the usual file requests it has to deal with.

i'm not sure if this is a different case of the same problem, or just
the same exact problem. either way, i thought this problem report
would be helpful. unlike the updatedb fix, we can't disable releasing
new content. :)

right at 11am, i found this in my logs:

[ W(59) : 0000 : 11:00:01 ] WAITING((0x7f000001.0x933.0x10bfa)): level = RD, rea
ders = 0, writers = 1
[ W(59) : 0000 : 11:00:01 ] WAIT OVER, elapsed = 314.2
[ W(59) : 0000 : 11:00:05 ] fsobj::StatusEq: ((0x7f000001.0x925.0x10bf3)), LinkC
ount 89 != 345

[ T(01) : 38759 : 11:00:12 ] BeginRvmFlush (1, 699836, S)
[ T(01) : 38759 : 11:00:12 ] EndRvmFlush

[ T(01) : 38760 : 11:00:17 ] BeginRvmTruncate (232, 701704, S)
[ T(01) : 38760 : 11:00:17 ] EndRvmTruncate

[ W(59) : 0000 : 11:00:40 ] Cachefile::SetLength 2048
[ W(59) : 0000 : 11:00:40 ] Cachefile::SetValidData 56705
[ W(59) : 0000 : 11:00:41 ] Cachefile::SetValidData 58946
[ W(59) : 0000 : 11:00:41 ] Cachefile::SetValidData 62356
[ W(59) : 0000 : 11:00:41 ] Cachefile::SetValidData 58310
[ W(59) : 0000 : 11:00:41 ] Cachefile::SetValidData 67581
[ W(59) : 0000 : 11:00:41 ] Cachefile::SetValidData 69145
[ W(59) : 0000 : 11:00:41 ] Cachefile::SetValidData 65433
(etc etc)

...

note that all of the file sizes in the above part of the log
correspond to the content that was released today.

...

and then just as before:

[ H(06) : 0657 : 11:02:55 ] HDBDaemon just woke up
[ H(06) : 0657 : 11:02:56 ] DataWalk:  Restarting Iterator!!!!  Reset availabili
ty status information.
[ H(06) : 0657 : 11:02:56 ] Tally for vuid=0:
[ H(06) : 0657 : 11:02:56 ] BeginRvmFlush (1, 292, F)
[ H(06) : 0657 : 11:02:56 ] EndRvmFlush
[ H(06) : 0657 : 11:02:56 ] Tally for vuid=0:

[ H(06) : 0658 : 11:02:56 ] HDBDaemon about to sleep on hdbdaemon_sync

...

[ V(04) : 39441 : 11:15:02 ] WAITING(SRVRQ):

[ W(59) : 0000 : 11:15:02 ] WAITING(SRVRQ):

[ W(57) : 0000 : 11:15:02 ] WAITING(SRVRQ):

[ V(04) : 39441 : 11:15:02 ] WAIT OVER, elapsed = 13.3
[ V(04) : 39441 : 11:15:02 ] WAITING((0x7f000001.0x10808.0xf2c5)): level = RD, r
eaders = 0, writers = 1
[ W(57) : 0000 : 11:15:02 ] WAIT OVER, elapsed = 44.2


repeat the last bit 30,000 times


[ W(53) : 0000 : 11:16:21 ] WAIT OVER, elapsed = 2.0

[ W(51) : 0000 : 11:16:21 ] WAIT OVER, elapsed = 2.1

[ W(52) : 0000 : 11:16:21 ] WAIT OVER, elapsed = 2.1
[ W(52) : 0000 : 11:16:21 ] *****  FATAL SIGNAL (6) *****


-- 

steve simitzis : /sim' - i - jees/
          pala : saturn5 productions
 www.steve.org : 415.282.9979
  hath the daemon spawn no fire?
Received on 2003-05-03 17:12:54