wiki:CodaHOWTO/Introduction

The contents of this page may be old in certain places. Take it with a grain of salt.

What is Coda?

Coda is a distributed file system, i.e. it makes files available to a collection of client computers as part of their directory tree, but ultimately maintains the authoritative copy of the file data on servers. Coda has some features that make it stand out: it supports disconnected operation, i.e. full access to a cached section of the file space during voluntary or involuntary network or server outages. Coda will automatically reintegrate the changes made on disconnected clients when reconnecting. Further Coda has read write, fail over server replication, meaning that data is stored and fetch from any of a group of servers

and Coda will continue to operate when only a subset of all servers is available. If server differences arise due to network partitions Coda will resolve differences automatically to a maximum extent possible and aid users in repairing what can't be done automatically. Coda is very differently organized from NFS and Windows/Samba? shares. Coda does have many similarities to AFS and DCE/DFS.

Getting clued in with the Coda terminology

A single name space
All of Coda appears under a single directory /coda on the client (or under a single drive under Windows). Coda does not have different exports or shares as do NFS and Samba that are individually mounted. Under /coda you can see the realm names, further containing file data and the "mount points" of the volumes (aka file sets) of files exported by all the servers (living in the corresponding Coda realm). Coda automatically finds servers and all a client needs to know is the name(s) of the realm(s) it wants to access.
A Coda realm
is a group of servers sharing one set of configuration databases. A realm can consist of a single server or up to hundreds of servers. One server is designated as the SCM, the System Control Machine. It is distinguished by being the only server modifying the configuration databases shared by all servers, and propagating such changes to other servers.
Coda volumes
File servers group the files in volumes. A volume is typically much smaller than a partition and much larger than a directory. Volumes have a root and contain a directory tree with files. Each volume is "Coda mounted" somewhere under /coda and forms a subtree of the /coda. Volumes can contain mount points of other volumes. A volume mount point is not a Unix mount point or Windows drive - there is only one Windows drive or Unix mountpoint for Coda. A Coda mount point contains enough information for the client to find the server(s) which store the files in the volume. The group of servers serving a volume is called the Volume Storage Group of the volume.
Volume Mount points
One volume is special, it is the root volume, the volume which Coda mounts on /coda/<realmname>. Other volumes are grafted into the /coda tree using cfs mkmount. This command installs a volume mount point in the Coda directory tree, and in effect its result is similar to mkdir mountpoint ; mount device mountpoint under Unix. When invoking cfs mkmount the two arguments given are the name of the mount point and the name of the volume to be mounted. Coda mount points are persistent objects, unlike Unix mount points which needs reinstating after a reboot.
Data storage
The servers do not store and export volumes as directories in the local disk file system, like NFS and Samba. Coda needs much more meta data to support server replication and disconnected operation and it has complex recovery which is hard to do within a local disk file system. Coda servers store files identified by a number typically all under a directory /vicepa. The meta data (owners, access control lists, version vectors) and directory contents is stored in an RVM data file.
RVM
stands for Recoverable Virtual Memory. RVM is a transaction based library to make part of a virtual address space of a process persistent on disk and commit changes to this memory atomically to persistent storage. Coda uses RVM to manage its meta data. This data is stored in an RVM data file which is mapped into memory upon startup. Modifications are made in VM and also writtent to the RVM log file upon committing a transaction. The LOG file contains committed data that has not yet been incorporated into the data file on disk.
Client data
is stored somewhat similarly: meta data in RVM and cached files are stored by number under ..../venus.cache. The cache on a client is persistent. This cache contains copies of files on the server. The cache allows for quicker access to data for the client and allows for access to files when the client is not connected to the server.
Validation
When Coda detects that a server is reachable again it will validate cached data before using it to make sure the cached data is the latest version of the file. Coda compares cached version stamps associated with each object, with version stamps held by the server.
Authentication
Coda manages authentication and authorization through a token. Similar (the details are very different) to using a Windows share, Coda requires users to log in. During the log in process, the client acquires a session key, or token in exchange for a correct password. The token is associated with a user identity, at present this Coda identity is the numerical Coda uid (which is a totally internal Coda business) of the Coda user performing the log in.
Protection
To grant permissions the cache manager and servers use the token with its associated identity and match this against privileges granted to this identity in access control lists (ACL). If a token is not present, anonymous access is assumed, for which permissions are again granted through the access control lists using the System:AnyUser identity.

Organization of the client

The kernel module and the cache manager

Like most file systems a computer enabled to use the Coda file system needs kernel support to access Coda files. Coda's kernel support is minimal and works in conjunction with the user space cache manager, Venus. User requests enter the kernel, which will either reply directly or ask Venus to assist in servicing the request.

Typically the kernel code is in a kernel module, which is either loaded at boot time or dynamically loaded when Venus is started. Venus will even mount the Coda file system on /coda.

Utilities

To manipulate ACLs, the cache, volume mount points and possibly the network behavior of a Coda client a variety of small utilities is provided. The most important one is the cfs command.

There is also a clog program to authenticate to the Coda authentication server. The codacon program allows the client host superuser to monitor the operation of the cache manager, and the cmon program gives summary information about a list of servers.

Organization of the server

The main program is the Coda file server, codasrv. It is responsible for doing all file operations, as well as volume location service.

The Coda authentication server auth2 handles requests from clog for tokens, and changes of password from au and cpasswd. Only the auth2 process on the SCM will modify the password database.

All servers in a Coda realm share the configuration databases (usually in /vice/db) and retrieve them from the SCM when changes have occurred. The updateclnt program is responsible for retrieving such changes, and it polls the updatesrv on the SCM to see if anything has changed.

Utilities

On the server there are utilities for volume creation and management. These utilities consist of shell scripts and the volutil command. There is also pdbtool which can be used to manipulate the user and group databases.

Last modified 8 years ago Last modified on Jan 21, 2010, 4:47:23 AM