Coda File System

Ubik and PTS issues

From: Jim Doyle <jrd_at_bu.edu>
Date: Sun, 15 Feb 1998 22:37:25 -0500 (EST)
Mention was mode of using UBIK to replicate Protection database entries
across multiple sites.

I'd hazard to suggest that we not consider re-implementing Ubik and 
use a slightly different protocol for replication. The problem with Ubik
is that it makes it difficult to consider storing PTS, Backup and VLDB
data in things other than record-oriented DBM style data-stores.

AFS and DFS sites have alot of their administrative data trapped inside
of fast, reliable but unwieldy databases with not to simple API's to 
program against to automate cell operation.   Much of the data stored
in the Protection Server, Backup Server and Fileset Location database 
would profit from being stored in a SQL server, or in LDAP browseable
directories. Ubik makes it difficult to consider easily engineering
such a solution.

I think a better solution would be to use D2PC (distributed two-phase commit).
To do this, each database site would export a set of methods on the 
database (though an RPC interface). In addition, each database site
would implement a local transaction manager. In the case of BSD DB 2.0,
we already have this, In the case of a SQL Server, we can issue a
"BEGIN TRANSACTION" statement to the SQL backend, in other cases, we can
rely on a implementation of logging.  In this scenario, the "Sync Site",
which is elected by the Ubik algorithm, would be responsible for acting
as the transaction coordinator for each participating replication site.
That is, the sync site would generate a new global Transaction IDs (trids)
for each operation. The sync site would then contact each participating
replication site and ask it to begin a new transaction. At that point,
all insert,update,delete operations that are fanned out to each site are
covered by a Trid. The sync-site can then tell all participating sites
to commit or abort.  If anything along the way poops it diaper, everyone
rolls back.

Finally, in the case of recovery, a replication site that joins a quorum 
can be recovered by simply comparing database version numbers, and then 
replaying the log at the sync site to the recovering site from LSNs [k,k'] 
where the last-known version number at the crashed site was k-1.

The distinction is this:  Ubik exposes the interface to raw storage over
the network, hiding the details of the transaction and recovery. The
D2PC approach abstracts the problem to transactions that cover method
invocations on objects.

I dont think it would be hard to build a mini TP monitor to do this..
It would be a fun and exciting project for someone that wants to learn more
about distributed transactions. In addition, I know that Margo and Keith 
would be very interested in a mini two-phase commit implementation for DB 2.0.

-- Jim

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Jim Doyle                         Boston University   Information Technology
Systems Analyst/Programmer        email: jrd_at_bu.edu   Distributed Systems
						      tel. (617)-353-8248
-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++--+-+-+-+-+-+-
Received on 1998-02-15 22:39:25