Coda File System

Re: Continuing reintegration problems

From: Jan Harkes <jaharkes_at_cs.cmu.edu>
Date: Thu, 24 Aug 2000 16:54:55 -0400
On Thu, Aug 24, 2000 at 04:04:00PM -0400, root wrote:
> I am still having reintegration problems.
> Here is part of /usr/var/log/messages (Redhat 6.2) after rebooting.
> Notice the warning about the "weird fid".

Those weird fids appear during a repair session when the local and
global directories contain different files with the same inode number.
The VFS layers in the linux kernel generally don't like colliding inode
numbers, and they potentially happen even in normal operation because we
are mapping 96-bit file identifiers in the 32-bit inode space.

During repair we are pretty much guaranteed to see these kinds of
collisions, because the multiple replicas of the same object are exposed
simultaneously.

> Here is a hunk from the "codacon" command.
> This seems to suggest that the client is disconnecting and 
> reconnecting to the server.  They are on an isolated 100Mbs network
> and should never have had any problems communicating during this period.

Did you use clog? Reauthentication drops any existing connections.

Alternatively, do you have a broken network setup. Venus tries to figure
out it's own ip-address by doing gethostbyname(gethostname()) and sends
this to the server so that the server knows where callbacks/backfetches
should go. For some stupid reason, RedHat 6.2 associates the machines
own host.domain name with 127.0.0.1 in /etc/hosts, and ofcourse the
server can't reach the client using that address. As soon as a callback
is broken, or sftp transfer is done, the server drops all the
connections to the client.

This effect is also clearly visible when the client and server are
separated by a (masquerading) firewall.

Also the fact that you have a isolated 100Mbs network doesn't matter
much if for instance the server reacts slowly (f.i. when the server is
swapping or has heavy CPU usage). The rpc2 layer looks at the time it
took for the server to respond to a message, and the client switches to
weakly-connected mode to take some of the load off the server.

> connection::bandwidth oak 1239157 1650165 2469135 ( 15:48:08 )

Well, the disconnections are not related to the responsiveness of the
server. The estimated bandwith is around 1.65MB per second, which is far
above the weak threshold of 50KB/s.

> setattr .history ( 15:48:14 )
> Store .history [2] ( 15:48:14 )
> setattr .history ( 15:48:16 )
> Store .history [2] ( 15:48:16 )
> connection::bandwidth oak 1536098 1834862 2277904 ( 15:48:20 )
> DisconnectFS (0xe0000100) ( 15:48:23 )
> NewConnectFS oak ( 15:48:23 )

Is there any reason for the disconnection in /usr/coda/etc/venus.log?
This actually looks a lot like the server just `naked' the connection,
i.e. the server side of the rpc2 connection was disconnected.

> How do you tell if all the volumes are currently 
> reintegrated and everything is normal?  Using "cfs lv" I get
> 
> cfs lv .
> Status of volume 0x7f000004 (2130706436) named "user.doug"
> Volume type is ReadWrite
> Connection State is Connected
		      ^ Would be WriteDisconnected or Disconnected
> Minimum quota is 0, maximum quota is unlimited
> Current blocks used are 515215
> The partition has 64119 blocks available out of 280633
> Write-back is disabled
  There are 0 CML entries pending for reintegration

  ^ there would also be an additional line like that

> Which looks ok, but there still are files in the /usr/coda/spool/<pid> dirs:
> 
> [root_at_Oak spool]# ls 502
> user.karen@_coda_users_karen.cml  user.karen@_coda_users_karen.tar

These files are kept around, these are merely copies of the CML which is
stored in RVM. When the CML is empty the automatic snapshots stop and
thus these files will never become empty.

> Is there any way to say something like "cfs lv *" or "cfs lv all"?

Not yet, you're actually the second person to mention this. And I agree
it would be useful.

Jan
Received on 2000-08-24 16:57:27