File System level backups with LVM snapshots

From Proxmox VE
Revision as of 16:55, 10 September 2012 by Resoli (talk | contribs)
Jump to navigation Jump to search

Introduction

The general idea consists in combining an external tool which is able to do filesystem level incremental backups ( rsync by means of BackupPC in this document) with the possibility to take snapshots of LVM based storage of virtual machines.

Fundamental constraints in this solution are:

  • Do not change fundamentally the configuration of an host under BackuPC
  • Preserve easy interactive restore directly on the host.

Basically the target host, when a backup is required via ssh connection, instead of directly executing the rsync command, intercepts it and runs a script ("forced command") which:

  1. Prepares backup operations (for instance, save ACL in case of Windows host)
  2. Stops or Suspends services which can do important changes on filesystem, or require a logical consistency on filesystem (e.g: Databases).
  3. Triggers a snapshot of his own storage on PVE host it is runnng on.
  4. Revert machine to normal operating state.
  5. Redirects original rsync command towards PVE hosts and the snapshot.
    • Redirected rsync runs on PVE: mount fs, optionally save MBR and PBS, save ntfs metadata for Windows hosts, run rsync.
  6. Triggers snapshot snapshot removal on PVE.

During interactive restore, instead, rsync process runs directly on the host.

Backuppc-snap-schema.png

In the following pargraphs detailed configuration steps for a Windows host are shown.

Requirements

  • Local user "backup", member of Administrators and Backup Operators

Tools needed on target Windows host

Procedure

Create "backup" user

  • Add "backup" user to "Administrators" and "Backup Operators" groups.
  • Connect to the host as "backup" user.
  • If you have quotes activated for some disk, check that "backup" entry is "no limit" (interactively restored files are initially owned by this user.).

Install Cygwin as backup user

  • Create C:\cygwin folder
    • Copy from another server c:\cygwin\cygwin-data folder (or install from the net if this is the first host configured).
    • Copy locally and run as backup the Cygwin install file "Setup.exe"
      • Install for all users
      • Leave default setup (c:\cygwin) for root cygwin folder.
      • Set local folder as repository and use c:\cygwin\cygwin-data as source.
    • Add following packages:
      • openssh
      • rsync (NOTE: install 3.0.7; 3.0.8 is problematic.)
      • libiconv
      • libiconv2
      • subversion
      • vim
    • Proceed, accepting Desktop and Start menu shortcuts creation.
    • Enter bash shell using Desktop icon; wait default settings creation for "backup" user; exit bash shell.

cyg_server user setup

NOTE: Skip this step for Windows XP hosts; in that case sshd will run with system account privileges.

  • Reconnect to the host with a Domain Administrator account; enter bash, and run:
mkpasswd -l -d intra | grep cyg_server >> /etc/passwd

which adds in /etc/passwd cygwin file an entry for domain user cyg_server, ssh daemon will run with this user account.

  • Add cyg_server to local Administrators.
  • NOTE: It's important to check that cyg_server is listed as Domain Administrator in /etc/passwd, and that the same user is a local Administrator, before proceeding with following steps.

ssh service setup.

  • Reconnect as local "backup" user.
  • Run from bash "ssh-host-config" script; see in the following section the responses to various requests ("*** Query:" sections).
$ ssh-host-config

*** Query: Overwrite existing /etc/ssh_config file? (yes/no)  yes

*** Info: Creating default /etc/ssh_config file

*** Query: Overwrite existing /etc/sshd_config file? (yes/no) yes

*** Info: Creating default /etc/sshd_config file
*** Info: Privilege separation is set to yes by default since OpenSSH 3.3.
*** Info: However, this requires a non-privileged account called 'sshd'.
*** Info: For more info on privilege separation read /usr/share/doc/openssh/README.privsep.


*** Query: Should privilege separation be used? (yes/no) yes

*** Info: Note that creating a new user requires that the current account have
*** Info: Administrator privileges.  Should this script attempt to create a

*** Query:new local account 'sshd'? yes

*** Info: Updating /etc/sshd_config file


*** Warning: The following functions require administrator privileges!

*** Query: Do you want to install sshd as a service?
*** Query: (Say "no" if it is already installed as a service) (yes/no) yes 

*** Query: Enter the value of CYGWIN for the daemon: [] ntsec

*** Info: On Windows Server 2003, Windows Vista, and above, the
*** Info: SYSTEM account cannot setuid to other users -- a capability
*** Info: sshd requires.  You need to have or to create a privileged
*** Info: account.  This script will help you do so.

*** Info: You appear to be running Windows XP 64bit, Windows 2003 Server,
*** Info: or later.  On these systems, it's not possible to use the LocalSystem
*** Info: account for services that can change the user id without an
*** Info: explicit password (such as passwordless logins [e.g. public key
*** Info: authentication] via sshd).

*** Info: If you want to enable that functionality, it's required to create
*** Info: a new account with special privileges (unless a similar account
*** Info: already exists). This account is then used to run these special
*** Info: servers.

*** Info: Note that creating a new user requires that the current account
*** Info: have Administrator privileges itself.

*** Info: This script plans to use 'cyg_server'.
*** Info: 'cyg_server' will only be used by registered services.


*** Query: Do you want to use a different name? (yes/no) no

*** Info: Please enter a password for new user cyg_server.  Please be sure
*** Info: that this password matches the password rules given on your system.
*** Info: Entering no password will exit the configuration.

*** Query: Please enter the password:
*** Query: Reenter:

*** Info: Also keep in mind that the user 'cyg_server' needs read permissions
*** Info: on all users' relevant files for the services running as 'cyg_server'.

*** Info: In particular, for the sshd server all users' .ssh/authorized_keys
*** Info: files must have appropriate permissions to allow public key
*** Info: authentication. (Re-)running ssh-user-config for each user will set
*** Info: these permissions correctly. [Similar restrictions apply, for
*** Info: instance, for .rhosts files if the rshd server is running, etc].

NOTE: In some cases (probably if you forgot to add cyg_server to local Administrators), errors like following could happen:

*** Warning: cyg_server is in /etc/passwd, but the local
*** Warning: machine's SAM does not know about cyg_server.
*** Warning: Perhaps cyg_server is a pre-existing domain account.
*** Warning: Continuing, but check if this is ok.

In that case, verify cyg_server permissions as shown at the end of this document.

  • Start ssh service
net start sshd

ssh client setup

  • Copy from another backup host the ssh backup key
scp <hostname>:/home/backup/id_rsa_backup /home/backup/

All virtual machines connect to PVE hosts with the same key (generic access with this key is filtered with a forced command on PVE).

"Forced command" setup

Edit ~/.ssh/authorized keys, adding forced command for backup connections:

command="/home/backup/backup-restore" ssh-rsa AAAAB3Nza...

where AAAAB3Nza... is the remote backuppc public key.

host scripts

Install vm side scripts in /home/backup.

PVE scripts

  • Install PVE side scripts in a directory of your choice (e.g.: /opt/snap)
  • Configure forced command inside PVE "authorized_keys" file:
command="/opt/snap/snap-backup-restore" ssh-rsa AAAAB3NzaC1...

where AAAAB3Nza... is the public counterpart of the id_rsa_backup configured on virtual machines.

NOTE: Take care that /opt/snap/snap-backup-restore is the correct location of the snapshot master script.

  • You will need on PVE also a recent version of ntfs-3g (Debian Squeeze package is currently too old), and the attrib package (Squeeze version is ok).

Utilities

  • Copy in a directory of your choice (scripts are configured with C:\App\Scripts\bin\ by default) the executable SetACL.exe
    • It's the Open Source tool that takes care of backup and iteractive restore of Windows ACL.
  • Copy in a directory of your choice (scripts are configured with C:\App\Scripts\bin\ by default ) the executable sync.exe
    • N.B.: sync.exe is NOT Open Source software, and you need to execute it the first time from GUI for aknowledging the EULA;

Host scripts configuration

host.conf file

  • Edit /home/backup/host.conf backup configuration file. (You can rename ad edit host.conf.tpl).

There is one line, uncommented, listing the hostnames of PVE nodes in the cluster, and multiple lines, which are to be left commented, one per disk (partition) subject to backup:

  • Example (for hypotetical host-to-backup host):
PVE_LIST="lxsrv10 lxsrv11 lxsrv12"
#VMID DISK PART FSTYPE RELATIVE_MP STORAGE MBR_SAVE SNAP_SIZE(MB) MP_PREFIX
#101 1 1 ntfs /cygdrive/c/ lvmstorage mbryes 30%FREE
#101 1 1 ntfs /cygdrive/d/ lvmstorage mbrno 1000

Where:

  • PVE_LIST is the list of DNS hostnames for PVE cluster host-to-backup is running into.
  • VMID is the Virtual machine ID inside cluster PVE.
  • DISK is the disk number (inside PVE VM configuration 101.conf)
  • PART is the one-based partition number in DISK
  • FSTYPE is the filesystem type; it's the value that will be used as "-t" flag of "mount" command for mounting snapshot filesystem on PVE.
  • RELATIVE_MP it's the mount point relative to the directory on PVE ( /tmp/<VMID>/ ) reserved for snapshots; .
  • STORAGE is the storage name in PVE for DISK.
  • MBR_SAVE tells if to save Master Boot Record (mbryes) for the disk or not (mbrno).
  • SNAP_SIZE(MB) it's the snapshot size as a percentage of free space inside the Volume Group DISK is part of; es: 30%FREE. It is also possible ti set a fixed size in Megabytes.
  • MP_PREFIX Is an optional mount point prefix; useful on linux hosts for making mount points independent.

NOTE: trailing slash in RELATIVE_MP is important.

Setup of optional commands to run before and after the snapshot

  • Creat tasks file; copy and modify from template:
cp tasks.tpl tasks

inserting commands inside tasks_before() and tasks_after() functions.

If you want to stop a Lotus Domino service before the snapshot and start it after, for example:

#!/bin/bash
 
 tasks_before() {
  echo "Stopping Lotus Domino..."
  net stop "Lotus Domino Server (LotusDominodata)" 
 }

 tasks_after() {
  echo "Starting Lotus Domino..."
  net start "Lotus Domino Server (LotusDominodata)"
}

Snapshot testing

NOTE: for testing also otional commands before and after snapshot set variabile NO_TASKS to false inside snap.sh.

  • cd inside backup scripts root:
cd /home/backup
  • create a snapshot for /cygdrive/c/ drive:
./snap.sh create /cygdrive/c/
  • remove it:
./snap.sh remove /cygdrive/c/

Test acl backup from remote BackupPC host

  • Connect to BackupPC host and run:
# su - backuppc

assuming backuppc user identity

  • Try to connect to target host, adding it to known_hosts answering yes to the related question. Then, generic ssh should be Rejected:
ssh backup@host-to-backup
Rejected
Connection to host-to-backup closed.

this is OK.

  • Test acl backup funcion:
ssh -q -x -l backup <nome host> acl-backup "<nome share>"


For example, for /cygdrive/c/ on host-to-backup:

ssh -q -x -l backup host-to-backup acl-backup "/cygdrive/c"

Check that .acl.bak and acl.bak.err files are correctly created; .acl.bak.err contet is "0" if acl saving is ok.

NOTE: In some cases it is necessary to modify /etc/bashrc cygwin file on the target host, moving following lines:

# If not running interactively, don't do anything
[[ "$-" != *i* ]] && return

to the beginning of file, just before:

# Check that we haven't already been sourced.
([[ -z ${CYG_SYS_BASHRC} ]] && CYG_SYS_BASHRC="1") || return

This avoids standard output pollution when rsync is redirected, causing a lock in rsync operation.

A symptom is the precence of lines like the followings in BackupPC host log, after having killed manually the backup job:

Executing DumpPreShareCmd: /usr/bin/ssh -q -x -l backup host-to-backup acl-backup "/cygdrive/d"
incr backup started back to 2011-11-15 10:39:20 (backup #0) for directory /cygdrive/d
Running: /usr/bin/ssh -q -x -l backup host-to-backup /usr/bin/rsync-backup --server --sender --numeric-ids --perms --owner --group -D --links   --hard-links --times --block-size=2048 --recursive --one-file-system . /cygdrive/d/
Xfer PIDs are now 19432
Got remote protocol 1668572463
Fatal error (bad version): /etc/bash.bashrc: line 13:  5028 Aborted                 ( [[ -z ${CYG_SYS_BASHRC} ]] && CYG_SYS_BASHRC="1" )

BackupPC host configuration

  • Connect to BackupPC host.
  • Add target DNS host name to "Edit Hosts" section , indicating authorized users.
  • Move in "Hosts Summary"; the new machine will be listed as "Host without backups". Select the link of the machine.
  • Select "Edit Config" for the machine. NOTE:: Pay attention to not select by mistake the global"Edit Config" in "Server" section below.

Xfer

  • Set in "RsyncShareName" the paths to save ("shares" in BackupPC terminology); for Windows machine use Cygwin syntax, paths will start everytime with /cygdrive/<disk letter>; eg: /cygdrive/c.
  • Set Exclusions: ("BackupFilesExclude"); share name is the prefix, and after that relative paths to exclude. For windows hosts, for example, under /cygdrive/c you may want to exclude pagefile.sys and pagefile.sys
  • Set "RsyncClientCmd"
    • Modify default ssh user from root to backup.
    • Add "-backup" suffix to "$rsyncPath"; you will read then "$rsyncPath-backup". The final configuration will be:
$sshPath -q -x -l backup $host $rsyncPath-backup $argList+
  • Set "RsyncClientRestoreCmd"
    • Modify default ssh user from root to backup.
    • Add "-restore" suffix to "$rsyncPath"; you will read then "$rsyncPath-restore". The final configuration will be:
$sshPath -q -x -l backup $host $rsyncPath-restore  $argList+

Backup Settings

  • Insert in "DumpPreShareCmd" the following command for saving ACL before every backup (it is the formerly tested command):
$sshPath -q -x -l backup $host acl-backup "$share"
  • Insert in "RestorePostUserCmd" the following command which sets ACL after every restore:
$sshPath -q -x -l backup $host acl-restore "$share" "$pathHdrSrc" "$pathHdrDest" "$fileList"

Do other configurations as usual.




Verify cyg_server permissions

cyg_server user should have the following permissions:

SeTcbPrivilege
SeCreateTokenPrivilege
SeAssignPrimaryTokenPrivilege
SeServiceLogonRight

For verifying run::

$ editrights -u cyg_server -l

In case of missing permissions, add them:

editrights -a SeTcbPrivilege -u cyg_server
editrights -a SeCreateTokenPrivilege -u cyg_server
editrights -a SeAssignPrimaryTokenPrivilege -u cyg_server
editrights -a SeServiceLogonRight -u cyg_server