Coda File System

From: Jan Harkes <jaharkes_at_cs.cmu.edu> Date: Mon, 28 Jul 2014 08:44:12 -0400

On Sun, Jul 27, 2014 at 05:22:26PM +0200, u-codalist-z149_at_aetey.se wrote:
> On Sun, Jul 27, 2014 at 09:33:21AM -0400, Jan Harkes wrote:
> > > This means that our usage of the SRV records does not and can not
> > > follow the rfc2782 literally because the assumptions in the rfc are
> > > not applicable.
> > 
> > Actually they are because we only need one volume location server and it
> > will tell us who the other servers are.
> 
> This is disputable as we can profit from our ability to talk to
> several servers in parallel and not sequentially as the rfc assumes.
> 
> (IOW yes we _could_ follow rfc's expectations but that would be harmful)
> 
> Don't we take and use all servers from the SRV reply?

We take all the servers and contact them in, as far as I know priority,
order until one answers. When we get to the end of the list we rerun the
SRV query and refresh the list.

> > I already made some steps towards handling ipv6 that are integrated in
> > the codebase. The stickiest problem were the client/server RPC2 messages
> > specifically the 'ViceGetVolumeInfo' ones, I added a new RPC
> > (ViceGetVolumeLocation) a long time ago that returns a string, which is
> > expected to be a fully qualified hostname that can be resolved to the
> > proper ip-addresses by the client.
> 
> I see it as an explicit extra indirection layer which Coda server would
> have to know while the layer does not bring any useful semantics per se.

Actually the server would have to know less. Currently the server
resolves the names it finds in /vice/db/servers to single ipv4 addresses
and returns those. But this is wrong, because it resolves the names from
the viewpoint of the server, which is why we have issues with the
loopback address etc.

A server has several valid addresses, on IPv4 you clearly have 127.0.0.1
and whatever addresses are bound to the network interfaces. But there
may even be addresses you don't know on the server, for instance if it
is behind a NAT firewall.

But using DNS the FQDN hostname can be resolved to a list of valid
addresses from the view of the client. If the client is on the same
machine 127.0.0.1 isn't a bad address to use, if it is on the private
network behind the firewall you'd want the network ip, and outside of
the firewall you need the public address. Addressing is complex and DNS
is clearly already set up to handle a lot of these issues.

> IOW: your intention is to translate an internal server id to a dns name
> and give this name to the client (and still have the port number managed

We already should have the id->name mapping in /vice/db/servers, so it
isn't a difficult translation. Port numbers are tricky, but I think it
should become part of the hostname in /vice/db/servers.

> outside of DNS) while I propose to achieve the same result
>  - without involving knowledge of this intermediate dns name for the
>    Coda server (it is fully incapsulated inside the corresponding
>    SRV record)

Coda servers already have a need for those names, for one to identify
themselves, and to identify other replicas during resolution.

>  - without handling of port numbers by the Coda server (which even reminds
>    me of the original choice to handle ip-numbers inside Coda instead
>    of dns)

A server has to start at a well known port. If you want to allow it to
be different from 2432/udp you still need to specify it in
/vice/db/servers so that the server can figure out who it is supposed to
be.

> > I don't
> > think introducing an alternate namespace based on what is really sort of
> > an internal artifact is a good idea.
> 
> The namespace (numeric server ids) is not new, but I propose to make it
> officially visible to the clients - as it in practice already is.
> 
> This artifact makes the change easier (gracefully preserves compatibility
> without any server side changes) but it can be easily promoted to first
> class citizen if we add an RPC, like the existing one which lists ipv4
> addresses but fill in the corresponding structure with numeric server
> ids. This also works with arbitrarily large server ids when we decide to
> introduce them.

No because then you get into trouble on the volume-ids. It is an
internal value that leaked, it should not be promoted to a first class
citizen.

> > But it also introduces problems,
> > we can only ever have 253 servers in a Coda realm, there are magic
> > server identifiers, 0, 127 and 255 that have to be avoided, etc.
> 
> The size of a single byte is certainly too low but the changes which I
> suggest do not make a volume id format change harder than it is otherwise.

The 32-bit VolumeID is everywhere, changing that is far worse than
changing the few places where there are ipv4 addresses in the RPC2
messages.

> On the other side this is actually not a problem. This means only that
> you can not use /coda/something.*cmu.edu* via DNS to access your data.
> 
> There are lots of cheap DNS services. I observe your group is actually
> using an .org domain - may be in fact you already have a suitable DNS
> service there?

Using a .org domain doesn't make it resolve coda.cs.cmu.edu correctly.

> > > An inconvenience related to DNS is its synchronous nature. I intend in
> > > the beginning to accept that the client will stall during DNS lookups
> > > (which is expected to happen rarely), this can be later replaced by an
> > > asynchronous layer.
> > 
> > That would probably make the callback break woes worse.
> 
> Oh callbacks are a separate issue, they are both inconsistent
> (as there is no gurantee that they actually trigger, only a hope)
> and harmful (creating stalls worse than DNS).
> 
> IOW I am looking forward to eliminating them. It is you who told me that
> such an effort was already made once (and alas not finished/merged, like
> many other subprojects).

You cannot actually get rid of callbacks in Coda unless you implement
something like leases. The cache consistency relies on them. The
unmerged project simply replaced the many individual callbacks with a
server-side log. Clients still get a callback when the first 'new' entry
is added to the log, then we can avoid sending further callbacks until
the client fetched the current state of the log., then we can avoid
sending further callbacks until the client fetched the current state of
the log., then we can avoid sending further callbacks until the client
fetched the current state of the log., then we can avoid sending further
callbacks until the client is back in sync by reading the log. So
instead of tracking callback state for each object and client we only
need to track it for each (volume-)log and client.

There is more overhead for clients that only cache a few objects per
volume, because they end up with callbacks/log updates for every changed
object in the volume.

> > > - possibility to freely move Coda servers both from one ip-number
> > >   to another (allows among others servers with dynamic ip addresses)
> > >   and from one f.q.d.n to another (also desirable in practice)
> > 
> > This has little to do with how the addresses are resolved and more with
> > how the client does not track when it should revalidate the address.
> 
> This is what I mean, DNS incorporates the means to handle in-/re-validation.

But it doesn't expose it when you use the normal gethostbyname type
interfaces. Your application either has to rely on the fact that libc
does the caching right, that there is a close by caching resolver, or
implement all the messy bits of DNS lookups itself.

Jan

Coda File System

Re: request for comments - proper addressing of servers in a realm