Coda File System

Re: volutil rpc2 errors

From: Ryan M. Lefever <lefever_at_crhc.uiuc.edu>
Date: Tue, 15 May 2001 19:13:02 -0500 (CDT)
On Tue, 15 May 2001, Jan Harkes wrote:

> On Mon, May 14, 2001 at 10:18:31PM -0500, Ryan M. Lefever wrote:
> > Hi,
> > 
> > I am trying to fix some RPC2 problems that I have when using volutil.
> > 
> > When I do a "volutil setdebug", the following happens no matter whether I
> > do it locally or remotely, or to the SCM or a non-SCM. Also, a
> > /vice/srv/CRASH file is created.
> > 
> > --
> > [root_at_nsx srv]# startserver -d 1000
> > [root_at_nsx srv]# volutil setdebug 100
> > V_BindToServer: binding to host nsx.crhc.uiuc.edu
> > VolSetDebug failed with RPC2_DEAD (F)
> ...
> > --
> > 
> > The SrvErr file reads:
> > 
> > --
> > could not open key 2 file: No such file or directory
> > Assertion failed: 0, file "srv.cc", line 336
> > EXITING! Bye!
> > --
> 
> This is a generic assertion point where we always end up when a SIGSEGV
> is received. If you create the file /vice/srv/ZOMBIFY, the server should
> end up in an infinite loop at this point. Then you can easily attach gdb
> and get a stacktrace.
> 
>     # gdb /usr/sbin/codasrv `pidof codasrv`
>     (gdb) bt
> 
> The trace will be a bit funny, because the actual point where the
> segfault was triggered won't show up. The stack is clobbered by the
> signal handler. However, the function that called the function that
> crashed will show up and from the line number it is possible to figure
> out at least which function had a problem.
> 
> It will probably be something like,
> 
>     #1 coda_assert function where we are waiting
>     #2 sigsegv handler
>     #3 ???
>     #4 function before the segv was received.
>     x/x/volutil/vol_setdebug.cc:666
> 
> 

I tried this method and got the following:

--
(gdb) bt
#0  0x40184c61 in __libc_nanosleep () from /lib/libc.so.6
#1  0x40184bed in __sleep (seconds=1) at
../sysdeps/unix/sysv/linux/sleep.c:82
#2  0x80c34a7 in coda_assert (pred=0x80c48e7 "0", file=0x80c48e0 "srv.cc",
    line=336) at coda_assert.c:45
#3  0x804be04 in zombie (sig=11) at srv.cc:336
#4  0x40111c68 in __restore ()
    at ../sysdeps/unix/sysv/linux/i386/sigaction.c:127
#5  0x40138986 in _IO_vfprintf (s=0x401e1ce0,
    format=0x4005bed5 "[%s]%s: \"%s\", line %d:    ", ap=0x151a0f00)
    at vfprintf.c:1029
#6  0x40141047 in fprintf (stream=0x401e1ce0,
    format=0x4005bed5 "[%s]%s: \"%s\", line %d:    ") at fprintf.c:32
#7  0x40047200 in RPC2_SendResponse (ConnHandle=505527757,
Reply=0x8165ad0)
    at rpc2a.c:154
#8  0x8084958 in volUtil_ExecuteRequest (_cid=505527757, _reqbuffer=0x0,
    _bd=0x0) at volutil.server.c:1808
#9  0x8065ccc in VolUtilLWP (myindex=0xbffff8d0) at volutil.cc:135
#10 0x400829be in Create_Process_Part2 () at lwp.c:795
--

> The other (and perhaps easier) way to debug this is by running codasrv
> under the control of gdb at the time the segfault happens. That way the
> stacktrace shows up a lot nicer.
> 
>     # gdb /usr/sbin/codasrv `pidof codasrv`
>     (gdb) continue
>     /* trigger the volutil setdebug crash */
>     SEGV received
>     (gdb) bt
>     #1 culprit function
>     file.cc:line

I tried this method, and the backtrace gave the following:

--
Program received signal SIGSEGV, Segmentation fault.
0x401e1d88 in main_arena () from /lib/libc.so.6
(gdb) bt
#0  0x401e1d88 in main_arena () from /lib/libc.so.6
#1  0x3f3e002b in ?? ()
#2  0x40138986 in _IO_vfprintf (s=0x401e1ce0,
    format=0x4005bed5 "[%s]%s: \"%s\", line %d:    ", ap=0x151a0f00)
    at vfprintf.c:1029
#3  0x40141047 in fprintf (stream=0x401e1ce0,
    format=0x4005bed5 "[%s]%s: \"%s\", line %d:    ") at fprintf.c:32
#4  0x40047200 in RPC2_SendResponse (ConnHandle=1052104457,
Reply=0x8165ad0)
    at rpc2a.c:154
#5  0x8084958 in volUtil_ExecuteRequest (_cid=1052104457, _reqbuffer=0x0,
    _bd=0x0) at volutil.server.c:1808
#6  0x8065ccc in VolUtilLWP (myindex=0xbffff8d0) at volutil.cc:135
#7  0x400829be in Create_Process_Part2 () at lwp.c:795
--

Since I didn't write any of the Coda code, its kind of hard for me to
debug.  Jan, does this help you any.

Thanks,
Ryan
Received on 2001-05-15 20:13:06