PVE2ClusterTestBed

From Proxmox VE
Jump to: navigation, search

This is a test bed I have done quickly in order to upgrade my operational procedures. Please tell me if there is some mistakes. After that, I will add explainations and publish it. And a question : this test bed procedure can be applied without changes to PVE 3.X ?


initial setup

  • two nodes (A and B), with one admin network and one DRBD network configured
  • drbd configured
  • one KVM node id 100 installed on a LVM over DRBD, with no virtual CDROM inserted
  • node 100 live migration OK between the two nodes
  • backup of /etc/pve in /root/backup/pve

delete cluster configuration, on both nodes :

service pvestatd stop
service pvedaemon stop
service pve-cluster stop
umount /etc/pve
/etc/init.d/cman stop
rm /etc/cluster/cluster.conf
rm -rf /var/lib/pve-cluster/*

wait 20 seconds if the server has just restarted

service pve-cluster start
service cman start
service pvestatd start
service pvedaemon start

put the nodes in a cluster

node A :

pvecm create <cluster name>

if backup of previous /etc/pve configuration exists :

 cd /root/backup/pve
 cp storage.cfg /etc/pve/
 cp nodes/<source node name>/qemu-server/100.conf /etc/pve/nodes/<node A name>/qemu-server/

else, configure DRBD storage, and install VM 100 Log into PVE WEB interface, verify storage, start VM 100, vm100 started.

node B :

pvecm add <ip of node A>

Log into PVE WEB interface, verify storage, live migrate VM 100, live migrate back to node A.

node A power failure, VM restart on node B, service OK

Remove power plug from node A (or push >4s power button)

node B :

pvecm nodes #shows node A as status "X"
pvecm expected 1
mv /etc/pve/nodes/<node A name>/qemu-server/*.conf /etc/pve/nodes/<node B name>/qemu-server/

Log into PVE WEB interface, start VM 100.

node A reinstallation

Install node A as in initial setup, without cluster configuration : to do this quickly in test platform :

  • stop node B (halt)
  • start node A
  • follow "delete cluster configuration, on both nodes", but just on node A
  • regenerate ssh keys on node A :
/bin/rm /etc/ssh/ssh_host_* && dpkg-reconfigure openssh-server
  • start node B

node A :

pvecm add <ip of node B>

if you have error "unable to copy ssh ID", do next step and retry, else skip next step

if "unable to copy ssh ID" - node B :

pvecm expected 1

verification

on node A and node B, verify that the status is "M" for both nodes :

pvecm nodes

verify you can log into PVE WEB interface of the two nodes, verify you can live migrate