Coda File System

Re: Ubik and PTS issues

From: Mark G. Hayden <Mark_G_Hayden_at_ibm.net>
Date: Sun, 15 Feb 1998 23:52:45 -0800
Michael Callahan wrote:
> 
> This is a good summary of the kind of primitives Ensemble provides.  One
> thing, however, to bear in mind: Ensemble is written as a framework in
> Objective Caml, a dialect of ML from INRIA in France.  So one has to think
> through the implementation consequences of incorporating it (which are
> more significant than those raised by starting to exploit a C library).

Ensemble is written in Ocaml, but it compiles as a C++ library, which
means programmers can use the C++ API without being aware of the use
of ML in Ocaml.  This includes performance: the Ensemble protocols
introduce as little as 8 bytes of header and 7 usecs of latency overhead
to the cost of an unreliable IP multicast.  This makes it the lowest
overhead reliable group communication system currently available.
Applications can, of course, also be written in ML.

> Michael
> 
> On Mon, 16 Feb 1998, Brian Bartholomew wrote:
> 
> > > What does "distributed simultaneous group broadcast code" mean?  It
> > > sounds like it means "multicast".  I assume, though, it provides
> > > some facilities beyond simple IP multicast.
> >
> > Oops, sorry, one more time in English.  Mark and others, feel free to
> > jump in and correct me.  Ensemble is based on IP multicast.  However,
> > under the covers it is a multi-phase commit protocol.  It gives you
> > primitives like the following:
> >
> >       Message A is guaranteed to be delivered to every recipient, or
> >       none of them.  You know which happened.
> >
> >       All recipients can find out who all the other recipients are,
> >       with no race conditions relative to sent messages.
> >
> >       Messages A, B, and C are guaranteed to be delivered in ABC
> >       order, no matter how the network reorders or drops them.
> >
> >       It is multicast, so applications that want to be multicast are
> >       efficient.  Wouldn't you like to have redundant Coda servers
> >       on your LAN, in a self-correcting/voting arrangement?  Or
> >       distributed not across the LAN, but around the world?

Ensemble supports group broadcast protocols as described above.
If IP multicast is available at all the endpoints in a group, 
then the protocols use it for broadcasts.  Otherwise, they use
point-to-point broadcasts.  So IP multicast is supported but not
required.

> >       (I believe this is true) Recipients listen to the protocol
> >       details of a sender.  If the sender makes a mistake (because
> >       that machine is crashing), one or more of the recipients takes
> >       over, declares the sender to be busted, kicks it out of the
> >       group, and finishes the protocol to a consistant state.

This is correct.

--Mark
Received on 1998-02-16 02:59:29