Coda Project Traces and DFSTrace


This Web Page describes two things, The Coda Project traces, and the DFSTrace gathering system. During the period from February 91 to March 93 The Coda Project collected traces of all system call activity on 33 different machines. This was done through the DFSTrace system originally developed by Lily Mummert and M. Satyanarayanan as a part of the Coda project at Carnegie Mellon University. In the spring of 98, Tom M. Kroeger became interested in using these traces and re-implementing DFSTrace in Linux. With the support of Prof. M. Satyanarayanan he spent the Fall of 98 reorganizing and processing the trace data to make available for general use. Additionally, with the support of The Usenix Association he and his adviser Prof. Darrell D. E. Long have hired Ben Gertzfield to do this re-implementation at the Concurrent Systems Laboratory at the University of California at Santa Cruz.

These pages are mirrored at two locations:

http://www.coda.cs.cmu.edu/DFSTrace as well as http://csl.cse.ucsc.edu/projects/DFSTrace

Table of contents:


The Coda Project Traces

The Coda Project Traces are currently stored on a set of 38  CDs. Six sets of the CDs were made. In order to read and analyze these traces an extensive library that allows for traces of varying formats was created. The DFSTrace reading library is included on each CD along with the summary results from each of these analysis programs.

Library Documentation   Download a copy of the Library

The complete details of the data within the traces is available from the library. I would suggest looking at DFSTrace/src/tracelib/tracelib.h. This code library has been tested on the following systems: Linux, FreeBSD, SunOS 5.5, Digital Unix 4.0.

Further details and the primary reference on DFSTrace and these traces is available in:
Lily B. Mummert, M. Satyanarayanan: Long Term Distributed File Reference Tracing: Implementation and Experience. Software Practices and Experiences, 26(6): 705-736 (1996). Also available as a Technical Report CMU-CS-94-213 (pdf), (ps)

It is strongly recommended that anyone intending to use these traces for research read this paper in detail.

Getting the Trace Data

Making the entire 24 GB of data available over the Internet is difficult at this point. To provide a more extensive set of trace data than those enclosed with the trace library we have made available on-line a set of traces from four different machines, each covering a one month duration.

In addition to the actual trace data, an index CD contains a copy of the summary information for each of the 38 data CDs. We have also made this summary information available on-line.

After examining these samples, if you have a need for the entire set of 38 CDs please contact Ethan Miller directly (elm@cs.ucsc.edu) to make arrangements to temporarily borrow the CD collection or get a password to access the CDs online.


DFSTrace under Linux

Currently this work is still in progress. We have developed a loadable module that allows the tracing of all system calls; currently this module only sends information via printk. We are extending this module to create a device /dev/DFSTrace and sends output to that device in a binary format. Once this work is complete, the code for this module will be made publicly available.

Research using the Coda Project Traces

The DFSTrace system was originally developed by Lily Mummert and Jay Kistler as a part of Prof. M. Satyanarayanan's Coda Project. These traces were used to provide insight for the design of the Coda File System.

Tom M. Kroeger used these traces in his research into predictive caching. This work was done at the Concurrent Systems Laboratory at , under the guidance of his adviser, Prof. Darrell D. E. Long.

Other Filesystem Traces

The following is a list of other filesystem traces that we are aware of (any updates to this list are very welcome, send them to tmk@cs.ucsc.edu):