Coda File System

yet another venus crash

From: Steve Simitzis <steve_at_saturn5.com>
Date: Wed, 21 May 2003 18:27:39 -0700
today, venus died on both of my coda clients, within minutes of each
other. i was able to get something out of gdb, along with the output
from the console. also, on each host, venus.log was filled with
hundreds of lines of "WAIT OVER" / "WAITING" pairs, until it ended
with a fatal error.

my setup as of right now consists of two venus clients, and two
replicated servers, but the venus clients only know about one of the
two servers. i'm not sure if that's related. (fwiw, once i get backups
working, it's likely that i'll stop running the second replicated
server. one server is enough to handle our traffic, so it's not worth
the tradeoff of dealing with server conflicts.)

** host 1 **

17:35:09 Reintegrate: sg.media.members, 2/2 records, result = SUCCESS
venus: multi1.c:144: RPC2_MultiRPC: Assertion `(context = get_multi_con(HowMany)) != ((void *)0)' failed.
17:36:08 Fatal Signal (6); pid 19237 becoming a zombie...
17:36:08 You may use gdb to attach to 19237

(gdb) where
#0  0x420292e5 in sigsuspend () from /lib/i686/libc.so.6
#1  0x080aa9cd in strcpy ()
#2  <signal handler called>
#3  0x42029241 in kill () from /lib/i686/libc.so.6
#4  0x4202902a in raise () from /lib/i686/libc.so.6
#5  0x4202a7d2 in abort () from /lib/i686/libc.so.6
#6  0x42022ddb in __assert_fail () from /lib/i686/libc.so.6
#7  0x40069864 in RPC2_MultiRPC (HowMany=8, ConnHandleList=0x81590b4, 
    RCList=0x81590f4, MCast=0x0, Request=0x813ae60, SDescList=0x0, 
    UnpackMulti=0x4006c410 <MRPC_UnpackMulti>, ArgInfo=0x15545458, 
    BreathOfLife=0x0) at multi1.c:144
#8  0x4006bbcc in MRPC_MakeMulti (ServerOp=41, ArgTypes=0x80f6dc0, HowMany=8, 
    CIDList=0x81590b4, RCList=0x81590f4, MCast=0x0, HandleResult=0, 
    Timeout=0x0) at multi2.c:327
#9  0x0809e98b in strcpy ()
#10 0x0808a3dc in strcpy ()
#11 0x080a154d in strcpy ()
#12 0x080a521d in strcpy ()
#13 0x080a9c22 in strcpy ()
#14 0x080a0b46 in strcpy ()
#15 0x40098f56 in Create_Process_Part2 () at lwp.c:796


** host 2 **

16:31:09 Reintegrate: sg.media.members, 2/2 records, result = SUCCESS
venus: multi1.c:144: RPC2_MultiRPC: Assertion `(context = get_multi_con(HowMany)) != ((void *)0)' failed.
16:31:09 Fatal Signal (6); pid 7215 becoming a zombie...
16:31:09 You may use gdb to attach to 7215

(gdb) where
#0  0x420292e5 in sigsuspend () from /lib/i686/libc.so.6
#1  0x080aa9cd in strcpy ()
#2  <signal handler called>
#3  0x42029241 in kill () from /lib/i686/libc.so.6
#4  0x4202902a in raise () from /lib/i686/libc.so.6
#5  0x4202a7d2 in abort () from /lib/i686/libc.so.6
#6  0x42022ddb in __assert_fail () from /lib/i686/libc.so.6
#7  0x4005a864 in RPC2_MultiRPC (HowMany=8, ConnHandleList=0x819cbb4,
    RCList=0x819cbf4, MCast=0x0, Request=0x811ad48, SDescList=0x0,
    UnpackMulti=0x4005d410 <MRPC_UnpackMulti>, ArgInfo=0x151d36d8,
    BreathOfLife=0x0) at multi1.c:144
#8  0x4005cbcc in MRPC_MakeMulti (ServerOp=38, ArgTypes=0x80f6c00, HowMany=8,
    CIDList=0x819cbb4, RCList=0x819cbf4, MCast=0x0, HandleResult=0,
    Timeout=0x0) at multi2.c:327
#9  0x0805cf21 in strcpy ()
#10 0x0806c9c5 in strcpy ()
#11 0x080a59b8 in strcpy ()
#12 0x080a9479 in strcpy ()
#13 0x080a0b46 in strcpy ()
#14 0x40089f56 in Create_Process_Part2 () at lwp.c:796
(gdb) 


-- 

steve simitzis : /sim' - i - jees/
          pala : saturn5 productions
 www.steve.org : 415.282.9979
  hath the daemon spawn no fire?
Received on 2003-05-21 21:29:48