Coda File System

R: R: Backup problems

From: Andrea Cerrito <cerrito_at_centromultimediale.it>
Date: Wed, 13 Jun 2001 17:47:00 +0200
Again, thanks for your very quick reply.

> > Auch!
> > Is there a way to just dumps the volume to the holding area
> without the need
> > for tape? I think in the "done-by-hand" way, isn't it?
>
> Oh, sorry. The backup program that you are trying to get working only
> does the dump to the holding area. It's the backup.sh script that calls
> tape.pl next to spool everything to tape.

Great. I'll use backup program to do the trick.

> > Ok. I've created a /test directory, empty.
> > But:
> >
> > 1) in vicetab I have to set a not really directory?
> > 2) if I use backup -t /vice/db/dumplist /test, all my backups
> got into it,
> > instead of /extra0! What's the need to set a dir in vicetab?
>
> Are you sure they aren't symlinks?...

Yes, I'm sure: they aren't symlinks.

> This could be related to the fact
> that the backup program didn't recognize the non-fully qualified machine
> name (apu-a) as itself.
>
> 11:44:55 Warning: the hostname of this server (apu-a.mgt.int) is not
> listed in /vice/db/servers
>
> Although it does seem to find the backup partition,
>
> 11:44:55 Partition /extra0: 8305048K available (minfree=5%),
> 8264976K free.

The strange is that some parts of Coda accept apu-a as hostname, others not.
:)

>
> > So, it will be correct to set in /vice/db/vicetab
> >
> > apu-a   /imaginarydir backup
> >
> > and to launch
> >
> > backup -t /vice/db/dumplist /extra0
> > ???
>
> Yes that would be incorrect, we actually have in vicetab
>
> dvorak    /backup1   backup
> dvorak    /backup2   backup
> dvorak    /backup3   backup
> dvorak    /backup4   backup
> dvorak    /backup5   backup
> dvorak    /backup6   backup
>
> The backup command is started with backup -t 135 /vice/db/dumplist
> /backup. The volume dumps are then distributed across these 6 backup
> partitions.

Ok. So my vicetab is saying "apu-a   /extra0 backup", and I made a /backup
dir.
Now:

[root_at_apu-a /]# backup -t 135 /vice/db/dumplist /backup

Date: Wed 06/13/2001

17:12:10 Warning: the hostname of this server (apu-a.mgt.int) is not listed
in /vice/db/servers
Partition /codafs/mounts/server1: inodes in use: 0, total: 262144.
17:12:10 Partition /codafs/mounts/server1: 5688568K available (minfree=5%),
5550388K free.
17:12:10 Partition /extra0: 8305048K available (minfree=5%), 8264984K free.
17:12:10 VLDBLookup: VLDB_size unset. Calling VCheckVLDB()
17:12:10 7f000003: cloning
17:12:10        01000002->01000003
17:12:10        15000002->15000003
17:12:10 Dumping 7f000003.1000002 to
/extra0/13Jun2001/apu-a-7f000003.1000002 ...
17:12:10 VolDump (12abdb4a) failed on 1000003 with Unknown RPC2 return code
200

17:12:10 Dumping 7f000003.15000002 to
/extra0/13Jun2001/apu-b-7f000003.15000002 ...
17:12:10 VolDump (177610c4) failed on 15000003 with Unknown RPC2 return code
200

17:12:10 Dump of volume 7f000003 failed!

17:12:10 MarkAncient of 7f000003 failed!

17:12:10
17:12:10 Attempting to retry any failed operations.

And so on. But!
17:12:10 Dumping 7f000003.1000002 to
/extra0/13Jun2001/apu-a-7f000003.1000002 ...

So, backup is trying to dump into /extra0/13Jun2001... No, it isn't, it's
doing all into the /backup and into /extra0!!!

[root_at_apu-a 13Jun2001]# pwd
/backup/13Jun2001
[root_at_apu-a 13Jun2001]# ls -la
total 16
drwx------   4 root     root         4096 Jun 13 17:12 ./
drwxr-xr-x   3 root     root         4096 Jun 13 17:12 ../
drwxr-xr-x   2 root     root         4096 Jun 13 17:12 apu-a/
drwxr-xr-x   2 root     root         4096 Jun 13 17:12 apu-b/
[root_at_apu-a 13Jun2001]# cd /extra0/13Jun2001
[root_at_apu-a 13Jun2001]# pwd
/extra0/13Jun2001
[root_at_apu-a 13Jun2001]# ls -al
total 8
drwxr-xr-x   2 root     root         4096 Jun 13 17:15 ./
drwxr-xr-x   4 root     root         4096 Jun 13 17:15 ../
[root_at_apu-a 13Jun2001]#

?!?!?! What's happening??? In /backup/13Jun2001 I can see the two dirs about
servers, meanwhile /extra0/13Jun2001 is empty.

> > This is what I get doing it by hand (I think you forgot volutil
> -h server
> > unlock volume-replica id, isn't it?).
>
> No, that is implicitly done when the backup command completes. We could
> probably even do the locking automatically when the backup is started.

Ah.
Ok.

> > [root_at_apu-a /]# volutil -h apu-a dump 1000003
> /extra0/codafs.rootvol.backup
> > V_BindToServer: binding to host apu-a
> >
> > VolDump failed with Unknown RPC2 return code 200
>
> Try 0x1000003, in some places the strtol has an explicit base 16
> specified, but in other places it relies on the 0x prefix to figure out
> that the number is hexadecimal. I'm not sure why this error is returned,
> maybe there is something in the SrvLog.

Same error.

[root_at_apu-a /]# volutil -h apu-a dump 0x1000003
/extra0/codafs.rootvol.backup
V_BindToServer: binding to host apu-a

VolDump failed with Unknown RPC2 return code 200
[root_at_apu-a /]#

This is what I found in the SrvLog:

17:29:22 NewDump: file /vice/backup/7f000003.1000002.newlist volnum 7f000003
id 1000003 parent 1000002
17:29:22 S_VolDumpHeader: Couldn't open VVlistfile.
17:29:22 S_VolNewDump: Can't close binding RPC2_SUCCESS
17:29:22 S_VolNewDump:  volume dump failed with status = 200

But I haven't a /vice/backup dir! Mmmmhhh.....
[root_at_apu-a /vice]# mkdir backup
[root_at_apu-a /vice]# volutil -h apu-a dump 0x1000003
/extra0/codafs.rootvol.backup
V_BindToServer: binding to host apu-a
.
VolDump completed, 2932 bytes dumped

:) What's wrong with my installation???
Anyway, I think I found my problem.

[root_at_apu-a /]# backup -t 135 /vice/db/dumplist /backup

Date: Wed 06/13/2001

17:33:02 Warning: the hostname of this server (apu-a.mgt.int) is not listed
in /vice/db/servers
Partition /codafs/mounts/server1: inodes in use: 0, total: 262144.
17:33:02 Partition /codafs/mounts/server1: 5688568K available (minfree=5%),
5550356K free.
17:33:02 Partition /extra0: 8305048K available (minfree=5%), 8264984K free.
17:33:02 VLDBLookup: VLDB_size unset. Calling VCheckVLDB()
17:33:02 7f000003: cloning
17:33:02        01000002->01000003
17:33:02        15000002->15000003
17:33:02 Dumping 7f000003.1000002 to
/extra0/13Jun2001/apu-a-7f000003.1000002 ...
17:33:02                Transferred 368 bytes

17:33:02 Dumping 7f000003.15000002 to
/extra0/13Jun2001/apu-b-7f000003.15000002 ...
17:33:02                Transferred 2932 bytes

:))
Success! Now all is working great!!!

[root_at_apu-a apu-b]# pwd
/backup/13Jun2001/apu-b
[root_at_apu-a apu-b]# ls -al
total 8
drwxr-xr-x   2 root     root         4096 Jun 13 17:33 ./
drwx------   4 root     root         4096 Jun 13 17:33 ../
lrwxrwxrwx   1 root     root           41 Jun 13 17:33 7f000003.15000002 ->
/extra0/13Jun2001/apu-b-7f000003.15000002*
[root_at_apu-a apu-b]#

> > Last: during tests I found two dirs /test/13Jun2001/apu-a and
> > /test/13Jun2001/apu-b. Doing a backup will do two copy of the
> same volume??
>
> Each replica is dumped because Coda uses optimistic replication and
> there is no guarantee that both replicas are totally in sync at all
> times. When there is a slight difference due to f.i. a temporary
> disconnection, a client will detect it when it accesses the object and
> trigger the servers to resolve the differences.

Thanks for the info.

So: because I've two servers, can I make a backup on the SCM one day and
another backup on the second server the day after?
And: deleting old snapshots, it's sufficient removing /backup/OLDDAY and
/extra0/OLDDAY?

I think a good solution may be to insert a crontab like

0 0 * * * remove old backups
30 0 * * * backup -t 135 /vice/db/dumplist /backup

Right?
Anyway, thanks Jan: you rocks :)
---
Cordiali saluti / Best regards
Andrea Cerrito
^^^^^^^^^^^^^^
Net.Admin @ Centro MultiMediale di Terni S.p.A.
P.zzale Bosco 3A
05100 Terni IT
Tel. +39 744 5441330
Fax. +39 744 5441372
Received on 2001-06-13 11:47:40