Coda File System

From: <u-codalist-z149_at_aetey.se> Date: Sun, 27 Jul 2014 17:22:26 +0200

Hello Jan,

thanks for the feedback.

On Sun, Jul 27, 2014 at 09:33:21AM -0400, Jan Harkes wrote:
> > This means that our usage of the SRV records does not and can not
> > follow the rfc2782 literally because the assumptions in the rfc are
> > not applicable.
> 
> Actually they are because we only need one volume location server and it
> will tell us who the other servers are.

This is disputable as we can profit from our ability to talk to
several servers in parallel and not sequentially as the rfc assumes.

(IOW yes we _could_ follow rfc's expectations but that would be harmful)

Don't we take and use all servers from the SRV reply?

> I already made some steps towards handling ipv6 that are integrated in
> the codebase. The stickiest problem were the client/server RPC2 messages
> specifically the 'ViceGetVolumeInfo' ones, I added a new RPC
> (ViceGetVolumeLocation) a long time ago that returns a string, which is
> expected to be a fully qualified hostname that can be resolved to the
> proper ip-addresses by the client.

I see it as an explicit extra indirection layer which Coda server would
have to know while the layer does not bring any useful semantics per se.

IOW: your intention is to translate an internal server id to a dns name
and give this name to the client (and still have the port number managed
outside of DNS) while I propose to achieve the same result
 - without involving knowledge of this intermediate dns name for the
   Coda server (it is fully incapsulated inside the corresponding
   SRV record)
 - without handling of port numbers by the Coda server (which even reminds
   me of the original choice to handle ip-numbers inside Coda instead
   of dns)

> One server can validly have more than one address, and mapping from
> server name to the actual address should be done through DNS.

Agreed. But there is no reason to involve the VLDB/Coda server in this
indirection. The server knows and has to know a server id, that's enough.

> I don't
> think introducing an alternate namespace based on what is really sort of
> an internal artifact is a good idea.

The namespace (numeric server ids) is not new, but I propose to make it
officially visible to the clients - as it in practice already is.

This artifact makes the change easier (gracefully preserves compatibility
without any server side changes) but it can be easily promoted to first
class citizen if we add an RPC, like the existing one which lists ipv4
addresses but fill in the corresponding structure with numeric server
ids. This also works with arbitrarily large server ids when we decide to
introduce them.

> AFAIK, the server identifier in the volume identifier is really only
> used on the servers to avoid a slightly more expensive hashtable (or
> volume location server) lookup to figure which server should handle that
> volume, to simplify volume identification for administrators (I know
> every volume on server X starts with 'c') and to avoid conflicting
> volume numbers on different servers.

Well, you listed quite nice properties, I'd like to keep them.

> But it also introduces problems,
> we can only ever have 253 servers in a Coda realm, there are magic
> server identifiers, 0, 127 and 255 that have to be avoided, etc.

The size of a single byte is certainly too low but the changes which I
suggest do not make a volume id format change harder than it is otherwise.

> Our domain doesn't even support SRV records, which is why we use
> /etc/coda/realms. And no, we do not have all servers listed there

_That_ I would perceive as a problem. Maintaining client-side realm
databases simply does not scale, it becomes too expensive to access
additional Coda realms. Think if the web browsers had you to enter
into /etc/hosts the ip-numbers for all sites to access??

On the other side this is actually not a problem. This means only that
you can not use /coda/something.*cmu.edu* via DNS to access your data.

There are lots of cheap DNS services. I observe your group is actually
using an .org domain - may be in fact you already have a suitable DNS
service there?

> either, and having to push out a new Coda client to everyone just
> because we retire/replace hardware is not an option.

I guess you would gain a lot by moving to DNS.

> And even if CMU's DNS would support SRV records, pushing an update is a
> non-trivial step because it is managed by the facilities networking
> group and coordinating a planned server replacement would involve
> reducing record expiration time, and then in a short timewindow
> replacing both the server and the SRV record. Unplanned would probably
> be worse.

This sounds like another reason to register a separate domain name
and then freely choose a sensible DNS provider for the domain.

> Placing everything in a single DNS SRV record is asking to be abused for
> DDoS amplification attacks too, although there probably are not enough
> Coda realms to put a dent in anyones network.

As I suggested, you do not have to put a lot in a single record, for the
price of multiple queries. One can even make the queries highly parallel
(use a range of reserved priority numbers to indicate the additional
records to be consulted) i.e. without creating enormous delays even for
realms with tens of thousands of servers.

(1 record of 4k can easily refer to about 100 further records, so
a double RTT is sufficient to fetch data about 10000 servers)

You don't have to fetch everything at once, just the chunks
of up to 100 servers containing the desired ones.

> > An inconvenience related to DNS is its synchronous nature. I intend in
> > the beginning to accept that the client will stall during DNS lookups
> > (which is expected to happen rarely), this can be later replaced by an
> > asynchronous layer.
> 
> That would probably make the callback break woes worse.

Oh callbacks are a separate issue, they are both inconsistent
(as there is no gurantee that they actually trigger, only a hope)
and harmful (creating stalls worse than DNS).

IOW I am looking forward to eliminating them. It is you who told me that
such an effort was already made once (and alas not finished/merged, like
many other subprojects).

> > - possibility to freely move Coda servers both from one ip-number
> >   to another (allows among others servers with dynamic ip addresses)
> >   and from one f.q.d.n to another (also desirable in practice)
> 
> This has little to do with how the addresses are resolved and more with
> how the client does not track when it should revalidate the address.

This is what I mean, DNS incorporates the means to handle in-/re-validation.

> > - possibility to put multiple Coda-servers (also for multiple
> >   realms) behind a single NAT, using different port numbers
> >   to distinguish between them
> 
> Forgot to mention, the returned string from ViceGetVolumeLocation is
> expected to be 'server:port'.

See above, supplying a port number it is only halfway from supplying a
numerical ip address and implies that the knowledge of the network-related
setup is being put into the file server which otherwise can be fully
agnostic of such details.

Thanks again Jan, I appreciate your comments even when you have a different
view than myself (this is a useful case :)

Regards,
Rune

Coda File System

Re: request for comments - proper addressing of servers in a realm