Coda File System

detection of sins: RVM/RDS debugging

From: Peter J. Braam <braam_at_cs.cmu.edu>
Date: Thu, 2 Apr 1998 20:55:20 -0500 (EST)
This is a message about improvements I have made to the RVM handling, with
a view to increasing our debugging capability dramatically. 

Introduction:
=============

Coda manages persistent meta data using an RVM based heap allocation
mechanism, which combines transactions and persistence with the usual
"malloc/free" metaphore.  After initializing rvm you can persistently
allocate on the RDS heap as follows:

rvmlib_begin_transaction(restore _or_ no_restore)

p = rvmlib_rec_malloc(sizeof(vnode))

copy stuff to *p

rvmlib_end_transaction(flush _or_ no flush)

The vnode is now persistently in the RVM segment.  Of course one should
store "p" too somewhere in order to find back the vnode upon restart, but
I kep it simple. 

Problem:
========

If you free twice in RVM or commit other sins, the RVM package will likely
assert only in rvmlib_end_transaction.  How do you find back your sins? 

Solution:
=========

I have hacked for almost two days to get our RVM under control -- I found
it next to impossible to track memory problems before, but now I think I
have got a handle on it. 

Let me remind you that rvmlib_rec_free does a _fake_ deallocation, only
upon committing the transaction is the memory actually release by
rds_do_free.  The problem this causes is that a wrong guard on the region
is detected only much later, not when rvmlib_free is called, but sometime
dozens of procedures and 5 files later. 

I did the following to get past these problems.  rvmlib_rec_{malloc,free}
are now macros which, optionally -- start with "venus -rdstrace" -- print
out the file and line number on which they were invoked. 

Here is an example, the example which "locates" my bug.  

grep rdstrace /usr/coda/venus.cache/venus.log creates output with entries:
rdstrace: rec_malloc addr 2115aa8c size 20 file
           /home/braam/ss-dir/coda-src/venus/fso_dir.cc line 175
.....
rdstrace: rec_free addr 2115aa8c file
/home/braam/ss-dir/coda-src/ndir/dir.c line 1271
rdstrace: rec_free addr 2115aa8c file
/home/braam/ss-dir/coda-src/venus/fso1.cc line 2084
....


Finally when the transaction ends rds_do_free also prints out what it is
doing:
rdstrace: start do_free
rdstrace: addr 0x2115cfcc size 40
rdstrace: addr 0x2115cf8c size 40
rdstrace: addr 0x2115cf4c size 40
rdstrace: addr 0x2115cf0c size 40
rdstrace: addr 0x2115cecc size 40
rdstrace: addr 0x2115ce8c size 40
rdstrace: addr 0x2115cb4c size 40
rdstrace: addr 0x2115cb0c size 40
rdstrace: addr 0x2115ca8c size 40
rdstrace: addr 0x2115ca4c size 40
rdstrace: addr 0x2115c9cc size 40
rdstrace: addr 0x2115c98c size 40
rdstrace: addr 0x2115c94c size 40
rdstrace: addr 0x2115c90c size 40
rdstrace: addr 0x2115cacc size 40
rdstrace: addr 0x2115b6cc size 40
rdstrace: addr 0x2115b38c size 40
rdstrace: addr 0x2115a24c size 840
rdstrace: addr 0x2115aa8c size 40
.. CRASH .. (assertion in rds_do_free)

Clearly we crashed when we wanted to free 0x2115aa8c for the second time. 

The  above locates exactly where I committed my double free sins. 

Part of this code was built by David Steere, but I don't think he got it
into this sort of shape.  Let's hope that we will find lots of bugs now !

- Peter -
Received on 1998-04-02 20:58:54