https://pve.proxmox.com/mediawiki/api.php?action=feedcontributions&user=Term&feedformat=atomProxmox VE - User contributions [en]2024-03-29T11:12:43ZUser contributionsMediaWiki 1.35.11https://pve.proxmox.com/mediawiki/index.php?title=Proxmox_VE_4.x_Cluster&diff=9580Proxmox VE 4.x Cluster2016-12-14T19:45:07Z<p>Term: /* Adding nodes to the Cluster */</p>
<hr />
<div>== Introduction ==<br />
Proxmox VE 4.x (and all versions above) cluster enables central management of multiple physical servers. A Proxmox VE Cluster consists of several nodes (up to 32 physical nodes, probably more, dependent on network latency).<br />
<br />
== Main features ==<br />
*Centralized web management, including secure console<br />
*Support for multiple authentication sources (e.g. local, MS ADS, LDAP, ...)<br />
*Role based permission management for all objects (VM´s, storages, nodes, etc.)<br />
*Creates multi-master clusters<br />
*[[Proxmox Cluster file system (pmxcfs)]]: Database-driven file system for storing configuration files, replicated in real-time on all nodes using corosync (maximal size 30 MB)<br />
*Migration of Virtual Machines between physical hosts<br />
*Cluster-wide logging<br />
*RESTful web API<br />
*Self-fencing as out of the box method (also possible to use power- or network-fencing).<br />
*Fast deployment<br />
*Cluster-wide Firewall<br />
*Linux Container migration<br />
<br />
== Requirements ==<br />
'''NOTE: It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster'''<br />
*All nodes must be in the same network as corosync uses IP Multicast to communicate between nodes (See also [http://www.corosync.org Corosync Cluster Engine]). Note: Some switches do not support IP multicast by default and must be manually enabled first. See [[multicast notes]] for more information about multicast.<br />
*Date and time have to be synchronized.<br />
*SSH tunnel on port 22 between nodes is used.<br />
*If you are interested in High Availability too, for reliable quorum you must have at least 3 active nodes at all times (all nodes should have the same version).<br />
*If shared storage is used a dedicated NIC for the traffic is needed.<br />
<br />
== Proxmox VE Cluster ==<br />
First, install the Proxmox VE on all nodes, see [[Installation]]. Make sure that each Proxmox VE node is installed with the final hostname and IP configuration. Changing the hostname and IP is not possible after cluster creation.<br />
<br />
Currently the cluster creation has to be done on the console, you can login to the Proxmox VE node via ssh. <br />
<br />
All settings can be done via "pvecm", the [https://pve.proxmox.com/pve2-api-doc/man/pvecm.1.html Proxmox VE Cluster manager toolkit].<br />
<br />
=== Create the cluster ===<br />
Login via ssh to the first Proxmox VE node. Use a unique name for your cluster, this name cannot be changed later.<br />
<br />
'''Create:'''<br />
<pre>hp1# pvecm create YOUR-CLUSTER-NAME</pre> <br />
To check the state of cluster: <br />
<pre>hp1# pvecm status</pre><br />
<br />
=== Adding nodes to the Cluster ===<br />
Login via ssh to the '''other''' Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID´s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration).<br />
<br />
'''WARNING: Adding a node to the cluster will delete it's current /etc/pve/storage.cfg. If you have VMs stored on the node, be prepared to add back your storage locations if necessary. Even though the storage locations disappear from the GUI, your data is still there.'''<br />
<br />
'''Add the current node to the cluster:''' <br />
<pre>hp2# pvecm add IP-ADDRESS-CLUSTER</pre><br />
<br />
For IP-ADDRESS-CLUSTER use an IP from an existing cluster node.<br />
<br />
To check the state of cluster: <br />
<pre>hp2# pvecm status</pre><br />
<br />
'''Display the state of cluster:''' <br />
<pre><br />
hp2# pvecm status<br />
Quorum information<br />
------------------<br />
Date: Mon Apr 20 12:30:13 2015<br />
Quorum provider: corosync_votequorum<br />
Nodes: 4<br />
Node ID: 0x00000001<br />
Ring ID: 1928<br />
Quorate: Yes<br />
<br />
Votequorum information<br />
----------------------<br />
Expected votes: 4<br />
Highest expected: 4<br />
Total votes: 4<br />
Quorum: 2 <br />
Flags: Quorate <br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
0x00000001 1 192.168.15.91<br />
0x00000002 1 192.168.15.92 (local)<br />
0x00000003 1 192.168.15.93<br />
0x00000004 1 192.168.15.94<br />
</pre><br />
<br />
'''Display the nodes of cluster:'''<br />
<pre>hp2# pvecm nodes<br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
1 1 hp1<br />
2 1 hp2 (local)<br />
3 1 hp3<br />
4 1 hp4<br />
</pre><br />
<br />
=== Remove a cluster node ===<br />
<br />
<b>Read carefully the procedure before proceeding, as it could not be what you want or need.</b><br />
<br />
Move all virtual machines from the node, just use the [[Central Web-based Management]] to migrate or delete all VM´s. Make sure you have no local backups you want to keep, or save them accordingly. <br />
<br />
Log in to one remaining node via ssh. Issue a pvecm nodes command to identify the nodeID: <br />
<pre>hp1# pvecm status<br />
<br />
Quorum information<br />
------------------<br />
Date: Mon Apr 20 12:30:13 2015<br />
Quorum provider: corosync_votequorum<br />
Nodes: 4<br />
Node ID: 0x00000001<br />
Ring ID: 1928<br />
Quorate: Yes<br />
<br />
Votequorum information<br />
----------------------<br />
Expected votes: 4<br />
Highest expected: 4<br />
Total votes: 4<br />
Quorum: 2 <br />
Flags: Quorate <br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
0x00000001 1 192.168.15.91 (local)<br />
0x00000002 1 192.168.15.92<br />
0x00000003 1 192.168.15.93<br />
0x00000004 1 192.168.15.94<br />
</pre> <br />
<br />
'''ATTENTION: at this point you must power off the node to be removed and make sure that it will not power on again (in the network) as it is.'''<br />
<pre>hp1# pvecm nodes<br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
1 1 hp1 (local)<br />
2 1 hp2<br />
3 1 hp3 <br />
</pre><br />
<br />
<br />
Log in to one remaining node via ssh. Issue the delete command (here deleting node hp2): <br />
<pre>hp1# pvecm delnode hp2</pre> <br />
If the operation succeeds no output is returned, just check the node list again with 'pvecm nodes' (or just 'pvecm n'), you should see something like:<br />
<br />
<pre>hp1# pvecm status<br />
Quorum information<br />
------------------<br />
Date: Mon Apr 20 12:44:28 2015<br />
Quorum provider: corosync_votequorum<br />
Nodes: 3<br />
Node ID: 0x00000001<br />
Ring ID: 1992<br />
Quorate: Yes<br />
<br />
Votequorum information<br />
----------------------<br />
Expected votes: 3<br />
Highest expected: 3<br />
Total votes: 3<br />
Quorum: 3 <br />
Flags: Quorate <br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
0x00000001 1 192.168.15.90 (local)<br />
0x00000002 1 192.168.15.91 <br />
0x00000003 1 192.168.15.92<br />
</pre> <br />
<br />
ATTENTION: as said above, it is very important to power off the node '''before''' removal, and make sure that it will not power on again (in the network) as it is. <br />
<br />
If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state. <br />
<br />
If for whatever reason you want that this server joins the same cluster again, you have to <br />
* reinstall pve on it from scratch<br />
* reinstall it as a '''new node'''<br />
* and then join it, as said in the previous section. <br />
<br />
Referring to the above example, you can create a new '''hp5''' node, which will then join the cluster.<br />
<br />
=== Re-installing a cluster node ===<br />
'''Prepare the node for re-install'''<br />
<br />
Remove all virtual machines from the node by either transfering them to other nodes or creating a backup to an external storage and deleting them from the node.<br />
<br />
Stop the following services:<br />
<pre>systemctl stop pvestatd.service<br />
systemctl stop pvedaemon.service<br />
systemctl stop pve-cluster.service<br />
</pre><br />
<br />
<br />
'''Backup node and cluster configuration'''<br />
<br /><br />
Backup /var/lib/pve-cluster/<br />
<pre>tar -czf /root/pve-cluster-backup.tar.gz /var/lib/pve-cluster<br />
</pre><br />
<br />
Backup /root/.ssh/ , there are two symlinks here to the shared pve config authorized_keys and authorized_keys.orig, don't worry about these two yet as they're stored in /var/lib/pve-cluster/<br />
<pre>tar -czf /root/ssh-backup.tar.gz /root/.ssh<br />
</pre><br />
<br />
Backup /etc/corosync/<br />
<pre>tar -czf /root/corosync-backup.tar.gz /etc/corosync<br />
</pre><br />
<br />
Backup /etc/hosts/<br />
<pre>cp /etc/hosts /root/<br />
</pre><br />
<br />
Backup /etc/network/interfaces<br />
<pre>cp /etc/network/interfaces /root/<br />
</pre><br />
<br />
If applicable, nor forget the settings related to iSCSI and multipath (/etc/iscsi/initiatorname.iscsi /etc/iscsi/iscsid.conf and /etc/multipath.conf are files that would like to keep for future reference help in the configuration of the new installation)<br />
<br />
Backup files to you client machine via SCP (or a pen drive).<br />
<br /><br />
List of files to copy:<br />
* /root/pve-cluster-backup.tar.gz<br />
* /root/ssh-backup.tar.gz<br />
* /root/corosync-backup.tar.gz<br />
* /root/hosts<br />
* /root/interfaces<br />
<br /><br />
<br />
'''Re-install the node'''<br />
<br /><br />
*Shutdown the server<br />
*If you are using severals eth interfaces and LACP or some kind of load balancing you should configure the first port of the switch to single mode (no LACP) to allow the standard network config in proxmox to connect to the network.<br />
*Re-install. Make sure the hostname is the same as it was before you continue.<br />
*Activate license again if you have any.<br />
*Install updates, to get the same patchlevel as the other nodes.<br />
<br />
<br />
'''Restore node and cluster configuration'''<br />
<br /><br />
Copy the config files to the folder /root via SCP or from the pen drive.<br />
<br /><br />
Restore /etc/hosts/<br />
<pre>cp /root/hosts /etc/hosts <br />
</pre><br />
<br />
Restore /etc/network/interfaces<br />
<pre>cp /root/interfaces /etc/network/interfaces <br />
</pre><br />
<br />
'''IMPORTANT'''<br />
<br /><br />
Make sure that you have the right switch configuration in case you're using vlans, specific port assignments or LACP!<br /><br />
If you are using OVS (OpenvSwitch) you have to install the package before reboot.<br />
<pre>apt-get install openvswitch-switch<br />
</pre><br />
Reboot server<br />
<br /><br />
<br />
Stop the following services:<br />
<pre>systemctl stop pvestatd.service<br />
systemctl stop pvedaemon.service<br />
systemctl stop pve-cluster.service<br />
</pre><br />
<br />
Restore the files in /root/.ssh/<br />
<pre>cd / ; tar -xzf /root/ssh-backup.tar.gz<br />
</pre><br />
<br />
Replace /var/lib/pve-cluster/ with your backup copy<br />
<pre>rm -rf /var/lib/pve-cluster<br />
cd / ; tar -xzf /root/pve-cluster-backup.tar.gz<br />
</pre><br />
<br />
Replace /etc/corosync/ with your backup copy<br />
<pre>rm -rf /etc/corosync<br />
cd / ; tar -xzf /root/corosync-backup.tar.gz<br />
</pre><br />
<br />
Start pve-cluster <br />
<pre>systemctl start pve-cluster.service<br />
</pre><br />
<br />
Restore the two ssh symlinks:<br />
<pre>ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys<br />
ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys.orig<br />
</pre><br />
<br />
Start the rest of the services:<br />
<pre>systemctl start pvestatd.service<br />
systemctl start pvedaemon.service<br />
</pre><br />
<br />
Re-add the node to the cluster to update the keys and then update the certs<br />
<pre>pvecm add xxx.xxx.xxx.xxx -force<br />
pvecm updatecerts<br />
</pre><br />
<br />
'''Accept ssh keys (again) from other nodes'''<br />
This may not be required, but may be required in some cases to make it work without erros!<br />
<br /><br />
Additionally you'll need to establish a ssh connection from every cluster other node to the re-installed node via ssh to accept the new host key.<br />
<br /><br />
If you have several subnets configured in your nodes make sure that you're accessing the correct ip via ssh.<br />
<pre>ssh xxx.xxx.xxx.xxx<br />
........ yes<br />
</pre><br />
<br />
== Working with the Proxmox VE Cluster ==<br />
Now you can start creating virtual machines on your cluster nodes by using the [[Central Web-based Management]] on any node.<br />
<br />
== Troubleshooting ==<br />
=== General ===<br />
*Date and time have to be synchronized (check "ntpdc -p")<br />
*Check /etc/hosts for an actual IP address of a system<br />
<br />
=== Cluster Network ===<br />
[[Separate Cluster Network]]<br />
<br />
== Video Tutorials ==<br />
* [http://www.youtube.com/user/ProxmoxVE Proxmox VE Youtube channel]<br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Proxmox_VE_4.x_Cluster&diff=9579Proxmox VE 4.x Cluster2016-12-14T19:44:01Z<p>Term: /* Adding nodes to the Cluster */</p>
<hr />
<div>== Introduction ==<br />
Proxmox VE 4.x (and all versions above) cluster enables central management of multiple physical servers. A Proxmox VE Cluster consists of several nodes (up to 32 physical nodes, probably more, dependent on network latency).<br />
<br />
== Main features ==<br />
*Centralized web management, including secure console<br />
*Support for multiple authentication sources (e.g. local, MS ADS, LDAP, ...)<br />
*Role based permission management for all objects (VM´s, storages, nodes, etc.)<br />
*Creates multi-master clusters<br />
*[[Proxmox Cluster file system (pmxcfs)]]: Database-driven file system for storing configuration files, replicated in real-time on all nodes using corosync (maximal size 30 MB)<br />
*Migration of Virtual Machines between physical hosts<br />
*Cluster-wide logging<br />
*RESTful web API<br />
*Self-fencing as out of the box method (also possible to use power- or network-fencing).<br />
*Fast deployment<br />
*Cluster-wide Firewall<br />
*Linux Container migration<br />
<br />
== Requirements ==<br />
'''NOTE: It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster'''<br />
*All nodes must be in the same network as corosync uses IP Multicast to communicate between nodes (See also [http://www.corosync.org Corosync Cluster Engine]). Note: Some switches do not support IP multicast by default and must be manually enabled first. See [[multicast notes]] for more information about multicast.<br />
*Date and time have to be synchronized.<br />
*SSH tunnel on port 22 between nodes is used.<br />
*If you are interested in High Availability too, for reliable quorum you must have at least 3 active nodes at all times (all nodes should have the same version).<br />
*If shared storage is used a dedicated NIC for the traffic is needed.<br />
<br />
== Proxmox VE Cluster ==<br />
First, install the Proxmox VE on all nodes, see [[Installation]]. Make sure that each Proxmox VE node is installed with the final hostname and IP configuration. Changing the hostname and IP is not possible after cluster creation.<br />
<br />
Currently the cluster creation has to be done on the console, you can login to the Proxmox VE node via ssh. <br />
<br />
All settings can be done via "pvecm", the [https://pve.proxmox.com/pve2-api-doc/man/pvecm.1.html Proxmox VE Cluster manager toolkit].<br />
<br />
=== Create the cluster ===<br />
Login via ssh to the first Proxmox VE node. Use a unique name for your cluster, this name cannot be changed later.<br />
<br />
'''Create:'''<br />
<pre>hp1# pvecm create YOUR-CLUSTER-NAME</pre> <br />
To check the state of cluster: <br />
<pre>hp1# pvecm status</pre><br />
<br />
=== Adding nodes to the Cluster ===<br />
Login via ssh to the '''other''' Proxmox VE nodes. Please note, the nodes cannot hold any VM. (If yes you will get conflicts with identical VMID´s - to workaround, use vzdump to backup and to restore to a different VMID after the cluster configuration).<br />
'''WARNING: Adding a node to the cluster will delete it's current /etc/pve/storage.cfg. If you have VMs stored on the node, be prepared to add back your storage locations if necessary. Even though the storage locations disappear from the GUI, your data is still there.'''<br />
<br />
'''Add the current node to the cluster:''' <br />
<pre>hp2# pvecm add IP-ADDRESS-CLUSTER</pre><br />
<br />
For IP-ADDRESS-CLUSTER use an IP from an existing cluster node.<br />
<br />
To check the state of cluster: <br />
<pre>hp2# pvecm status</pre><br />
<br />
'''Display the state of cluster:''' <br />
<pre><br />
hp2# pvecm status<br />
Quorum information<br />
------------------<br />
Date: Mon Apr 20 12:30:13 2015<br />
Quorum provider: corosync_votequorum<br />
Nodes: 4<br />
Node ID: 0x00000001<br />
Ring ID: 1928<br />
Quorate: Yes<br />
<br />
Votequorum information<br />
----------------------<br />
Expected votes: 4<br />
Highest expected: 4<br />
Total votes: 4<br />
Quorum: 2 <br />
Flags: Quorate <br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
0x00000001 1 192.168.15.91<br />
0x00000002 1 192.168.15.92 (local)<br />
0x00000003 1 192.168.15.93<br />
0x00000004 1 192.168.15.94<br />
</pre><br />
<br />
'''Display the nodes of cluster:'''<br />
<pre>hp2# pvecm nodes<br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
1 1 hp1<br />
2 1 hp2 (local)<br />
3 1 hp3<br />
4 1 hp4<br />
</pre><br />
<br />
=== Remove a cluster node ===<br />
<br />
<b>Read carefully the procedure before proceeding, as it could not be what you want or need.</b><br />
<br />
Move all virtual machines from the node, just use the [[Central Web-based Management]] to migrate or delete all VM´s. Make sure you have no local backups you want to keep, or save them accordingly. <br />
<br />
Log in to one remaining node via ssh. Issue a pvecm nodes command to identify the nodeID: <br />
<pre>hp1# pvecm status<br />
<br />
Quorum information<br />
------------------<br />
Date: Mon Apr 20 12:30:13 2015<br />
Quorum provider: corosync_votequorum<br />
Nodes: 4<br />
Node ID: 0x00000001<br />
Ring ID: 1928<br />
Quorate: Yes<br />
<br />
Votequorum information<br />
----------------------<br />
Expected votes: 4<br />
Highest expected: 4<br />
Total votes: 4<br />
Quorum: 2 <br />
Flags: Quorate <br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
0x00000001 1 192.168.15.91 (local)<br />
0x00000002 1 192.168.15.92<br />
0x00000003 1 192.168.15.93<br />
0x00000004 1 192.168.15.94<br />
</pre> <br />
<br />
'''ATTENTION: at this point you must power off the node to be removed and make sure that it will not power on again (in the network) as it is.'''<br />
<pre>hp1# pvecm nodes<br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
1 1 hp1 (local)<br />
2 1 hp2<br />
3 1 hp3 <br />
</pre><br />
<br />
<br />
Log in to one remaining node via ssh. Issue the delete command (here deleting node hp2): <br />
<pre>hp1# pvecm delnode hp2</pre> <br />
If the operation succeeds no output is returned, just check the node list again with 'pvecm nodes' (or just 'pvecm n'), you should see something like:<br />
<br />
<pre>hp1# pvecm status<br />
Quorum information<br />
------------------<br />
Date: Mon Apr 20 12:44:28 2015<br />
Quorum provider: corosync_votequorum<br />
Nodes: 3<br />
Node ID: 0x00000001<br />
Ring ID: 1992<br />
Quorate: Yes<br />
<br />
Votequorum information<br />
----------------------<br />
Expected votes: 3<br />
Highest expected: 3<br />
Total votes: 3<br />
Quorum: 3 <br />
Flags: Quorate <br />
<br />
Membership information<br />
----------------------<br />
Nodeid Votes Name<br />
0x00000001 1 192.168.15.90 (local)<br />
0x00000002 1 192.168.15.91 <br />
0x00000003 1 192.168.15.92<br />
</pre> <br />
<br />
ATTENTION: as said above, it is very important to power off the node '''before''' removal, and make sure that it will not power on again (in the network) as it is. <br />
<br />
If you power on the node as it is, your cluster will be screwed up and it could be difficult to restore a clean cluster state. <br />
<br />
If for whatever reason you want that this server joins the same cluster again, you have to <br />
* reinstall pve on it from scratch<br />
* reinstall it as a '''new node'''<br />
* and then join it, as said in the previous section. <br />
<br />
Referring to the above example, you can create a new '''hp5''' node, which will then join the cluster.<br />
<br />
=== Re-installing a cluster node ===<br />
'''Prepare the node for re-install'''<br />
<br />
Remove all virtual machines from the node by either transfering them to other nodes or creating a backup to an external storage and deleting them from the node.<br />
<br />
Stop the following services:<br />
<pre>systemctl stop pvestatd.service<br />
systemctl stop pvedaemon.service<br />
systemctl stop pve-cluster.service<br />
</pre><br />
<br />
<br />
'''Backup node and cluster configuration'''<br />
<br /><br />
Backup /var/lib/pve-cluster/<br />
<pre>tar -czf /root/pve-cluster-backup.tar.gz /var/lib/pve-cluster<br />
</pre><br />
<br />
Backup /root/.ssh/ , there are two symlinks here to the shared pve config authorized_keys and authorized_keys.orig, don't worry about these two yet as they're stored in /var/lib/pve-cluster/<br />
<pre>tar -czf /root/ssh-backup.tar.gz /root/.ssh<br />
</pre><br />
<br />
Backup /etc/corosync/<br />
<pre>tar -czf /root/corosync-backup.tar.gz /etc/corosync<br />
</pre><br />
<br />
Backup /etc/hosts/<br />
<pre>cp /etc/hosts /root/<br />
</pre><br />
<br />
Backup /etc/network/interfaces<br />
<pre>cp /etc/network/interfaces /root/<br />
</pre><br />
<br />
If applicable, nor forget the settings related to iSCSI and multipath (/etc/iscsi/initiatorname.iscsi /etc/iscsi/iscsid.conf and /etc/multipath.conf are files that would like to keep for future reference help in the configuration of the new installation)<br />
<br />
Backup files to you client machine via SCP (or a pen drive).<br />
<br /><br />
List of files to copy:<br />
* /root/pve-cluster-backup.tar.gz<br />
* /root/ssh-backup.tar.gz<br />
* /root/corosync-backup.tar.gz<br />
* /root/hosts<br />
* /root/interfaces<br />
<br /><br />
<br />
'''Re-install the node'''<br />
<br /><br />
*Shutdown the server<br />
*If you are using severals eth interfaces and LACP or some kind of load balancing you should configure the first port of the switch to single mode (no LACP) to allow the standard network config in proxmox to connect to the network.<br />
*Re-install. Make sure the hostname is the same as it was before you continue.<br />
*Activate license again if you have any.<br />
*Install updates, to get the same patchlevel as the other nodes.<br />
<br />
<br />
'''Restore node and cluster configuration'''<br />
<br /><br />
Copy the config files to the folder /root via SCP or from the pen drive.<br />
<br /><br />
Restore /etc/hosts/<br />
<pre>cp /root/hosts /etc/hosts <br />
</pre><br />
<br />
Restore /etc/network/interfaces<br />
<pre>cp /root/interfaces /etc/network/interfaces <br />
</pre><br />
<br />
'''IMPORTANT'''<br />
<br /><br />
Make sure that you have the right switch configuration in case you're using vlans, specific port assignments or LACP!<br /><br />
If you are using OVS (OpenvSwitch) you have to install the package before reboot.<br />
<pre>apt-get install openvswitch-switch<br />
</pre><br />
Reboot server<br />
<br /><br />
<br />
Stop the following services:<br />
<pre>systemctl stop pvestatd.service<br />
systemctl stop pvedaemon.service<br />
systemctl stop pve-cluster.service<br />
</pre><br />
<br />
Restore the files in /root/.ssh/<br />
<pre>cd / ; tar -xzf /root/ssh-backup.tar.gz<br />
</pre><br />
<br />
Replace /var/lib/pve-cluster/ with your backup copy<br />
<pre>rm -rf /var/lib/pve-cluster<br />
cd / ; tar -xzf /root/pve-cluster-backup.tar.gz<br />
</pre><br />
<br />
Replace /etc/corosync/ with your backup copy<br />
<pre>rm -rf /etc/corosync<br />
cd / ; tar -xzf /root/corosync-backup.tar.gz<br />
</pre><br />
<br />
Start pve-cluster <br />
<pre>systemctl start pve-cluster.service<br />
</pre><br />
<br />
Restore the two ssh symlinks:<br />
<pre>ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys<br />
ln -sf /etc/pve/priv/authorized_keys /root/.ssh/authorized_keys.orig<br />
</pre><br />
<br />
Start the rest of the services:<br />
<pre>systemctl start pvestatd.service<br />
systemctl start pvedaemon.service<br />
</pre><br />
<br />
Re-add the node to the cluster to update the keys and then update the certs<br />
<pre>pvecm add xxx.xxx.xxx.xxx -force<br />
pvecm updatecerts<br />
</pre><br />
<br />
'''Accept ssh keys (again) from other nodes'''<br />
This may not be required, but may be required in some cases to make it work without erros!<br />
<br /><br />
Additionally you'll need to establish a ssh connection from every cluster other node to the re-installed node via ssh to accept the new host key.<br />
<br /><br />
If you have several subnets configured in your nodes make sure that you're accessing the correct ip via ssh.<br />
<pre>ssh xxx.xxx.xxx.xxx<br />
........ yes<br />
</pre><br />
<br />
== Working with the Proxmox VE Cluster ==<br />
Now you can start creating virtual machines on your cluster nodes by using the [[Central Web-based Management]] on any node.<br />
<br />
== Troubleshooting ==<br />
=== General ===<br />
*Date and time have to be synchronized (check "ntpdc -p")<br />
*Check /etc/hosts for an actual IP address of a system<br />
<br />
=== Cluster Network ===<br />
[[Separate Cluster Network]]<br />
<br />
== Video Tutorials ==<br />
* [http://www.youtube.com/user/ProxmoxVE Proxmox VE Youtube channel]<br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Upgrade_from_3.x_to_4.0&diff=8300Upgrade from 3.x to 4.02016-03-14T15:01:14Z<p>Term: /* Troubleshooting */</p>
<hr />
<div>== Introduction ==<br />
<br />
Proxmox VE 4.0 introduces major new features, therefore the upgrade must be carefully planned and tested. Depending on your existing configuration, several manual steps are required, including some downtime. NEVER start the upgrade process without a valid backup and without testing the same in a test lab setup.<br />
<br />
Major upgrades for V4.0:<br />
*OpenVZ is removed, a conversion via backup/restore to LXC is needed <br />
*New corosync version, therefore clusters has to be re-established<br />
*New HA manager (replacing RGmanager, involving a complete HA re-configuration)<br />
<br />
If you run a customized installation and/or you installed additional packages, for example for distributed storage like Ceph or sheepdog, DRBD or any other third party packages, you need to make sure that you also upgrade these package to Debian Jessie. <br />
<br />
V4.0 supports only the new '''DRBD9 which is not backwards compatible with the 8.x version''' and is considered only a technology preview.<br />
<br />
Generally speaking there are two possibilities to move from 3.x to 4.0<br />
<br />
*In-place upgrade via apt, step by step <br />
*New installation on new hardware (and restore VM´s from backup) - safest way.<br />
<br />
In both cases you'd better empty the browser's cache after upgrade and reload the GUI page or there is the possibility that you see a lot of glitches.<br />
<br />
== In-place upgrade ==<br />
<br />
In-place upgrades are done with apt, so make sure that you are familiar with apt before you start here.<br />
<br />
=== Preconditions ===<br />
<br />
* upgraded to latest V3.4 version<br />
* reliable access to all configured storages<br />
* healthy cluster<br />
* no VM or CT running (note: VM live migration from 3.4 to 4.0 node or vice versa NOT possible)<br />
* valid backup of all OpenVZ containers (needed for the conversion to LXC)<br />
* valid backup of all VM (only needed if something goes wrong)<br />
* Correct repository configuration (accessible both wheezy and jessie)<br />
* at least 1GB free disk space at root mount point<br />
<br />
=== Actions Step by Step ===<br />
<br />
All has to be done on each Proxmox node's command line (via console or ssh; preferably via console in order to exclude interrupted ssh connections) , some of the steps are optional. If a whole cluster should be upgraded, keep in mind the cluster name and HA configuration like failoverdomains, fencing etc since these have to be restored after upgrade by the new WEB GUI. Again, make sure that you have a valid backup of all CT and VM before you start.<br />
<br />
'''Tip''': ''It is advisable to perform a dry-run of the upgrade first. Install the PVE 3.4 ISO on testing hardware, then upgrade this installation to the latest minor version of PVE 3.4 using the test repo (see [[Package repositories]]) then copy/create relevant configurations to the test machine to replicate your production setup as closely as possible.''<br />
<br />
==== Remove Proxmox VE 3.x packages in order to avoid dependency errors ====<br />
<br />
First make sure that your actual installation is "clean", tentatively run<br />
<br />
apt-get update && apt-get dist-upgrade<br />
<br />
Then start the removal:<br />
<br />
apt-get remove proxmox-ve-2.6.32 pve-manager corosync-pve openais-pve redhat-cluster-pve pve-cluster pve-firmware <br />
<br />
Adapt repository locations and update the apt database, point all to jessie, e.g.:<br />
<br />
sed -i 's/wheezy/jessie/g' /etc/apt/sources.list<br />
sed -i 's/wheezy/jessie/g' /etc/apt/sources.list.d/pve-enterprise.list<br />
apt-get update<br />
'''If there is a backports line then remove it.'''<br />
Currently, ''pve-manager'' and ''ceph-common'' have unmet dependencies with regards to package versions in the jessie ''backports'' repo.<br />
<br />
In case Ceph server is used: Ceph repositories for jessie can be found at http://download.ceph.com, therefore etc/apt/sources.list.d/ceph.list will contain e.g.:<br />
<br />
deb http://download.ceph.com/debian-hammer jessie main<br />
<br />
<br />
You also need to install the Ceph repository key to apt, for details, check the wiki on ceph.com.<br />
<br />
==== Install the new kernel ====<br />
<br />
Check first what the current new kernel's version is <br />
<br />
apt-cache search pve-kernel | sort<br />
<br />
- at the moment (February 2016) it is 4.2.8-1 - and install it:<br />
<br />
apt-get install pve-kernel-4.2.8-1-pve pve-firmware<br />
<br />
==== Upgrade the basic system to Debian Jessie ====<br />
<br />
This action will consume some time - depending on the systems performance, this can take up to 60 min or even more. If you run on SSD, the dist-upgrade can be finished in 5 minutes.<br />
<br />
apt-get dist-upgrade<br />
<br />
Reboot the system in order to activate the new kernel.<br />
<br />
==== Install Proxmox VE 4.0 ====<br />
Finally, install the new Proxmox VE 4.0 packages with one single command:<br />
apt-get install proxmox-ve<br />
<br />
Then you should purge configuration files from packages which are no longer needed (assuming you already saved your OpenVZ containers)<br />
<br />
dpkg --purge vzctl<br />
dpkg --purge redhat-cluster-pve<br />
<br />
'''Remove the old kernel''' (not a must, but recommended), e.g. (the kernel version has to be adapted to the currently installed one - there can be more old kernels too. Use dpkg --list | grep pve-kernel to find any 2.6.* kernels to remove):<br />
<br />
apt-get remove pve-kernel-2.6.*<br />
<br />
Finally, reboot and test if all is working as expected.<br />
<br />
==== Optional: OpenVZ conversion ====<br />
<br />
Convert the previously backed up containers to LXC, following the HowTo on [[Convert OpenVZ to LXC]]<br />
<br />
You can also remove the obsolete OpenVZ container data from your local storage.<br />
<br />
rm -f /etc/pve/openvz/<ct-id>.conf<br />
rm -R <storage-path>/private/*<br />
<br />
==== Cluster upgrade ====<br />
'''It is not possible to mix Proxmox VE 3.x and earlier with Proxmox VE 4.0 cluster '''<br />
<br />
Due to the new corosync 2.x, the cluster has to be re-established again. Please use the same clustername.<br />
<br />
* at the first node<br />
<br />
pvecm create <clustername><br />
<br />
* at all other nodes:<br />
<br />
pvecm add <first-node´s-IP> -force<br />
<br />
The HA configuration (fail-over, fencing etc.) has to be re-configured manually, now supported from WEB GUI, see [[High Availability Cluster 4.x]]<br />
<br />
After upgrading the last node remove the V3.x cluster data:<br />
<br />
rm /etc/pve/cluster.conf<br />
<br />
=== Troubleshooting ===<br />
<br />
* Failing upgrade to latest Proxmox VE 3.x or removal of old packages:<br />
<br />
Make sure that the original repository configuration (for wheezy) is correct. The change to "jessie" repositories has to be done '''after''' the removal of old Proxmox VE.<br />
<br />
In case of Ceph is used: note that recently the repository url has changed to http://download.ceph.com/<br />
<br />
* Failing upgrade to "jessie"<br />
<br />
Make the sure that the repository configuration for jessie is correct.<br />
<br />
If there was a network failure and the upgrade has been made partially try to repair the situation with <br />
<br />
apt-get -fy install<br />
<br />
* Unable to boot due to grub failure<br />
<br />
See [[Recover_From_Grub_Failure]]<br />
<br />
=== External links ===<br />
<br />
*[https://www.debian.org/releases/jessie/amd64/release-notes/ Release Notes for Debian 8.0 (jessie), 64-bit PC]<br />
<br />
== New installation ==<br />
<br />
* Backup all VMs and containers to external media (see [[Backup and Restore]])<br />
* Backup all files in /etc Yo will need various files in /etc/pve, as well as /etc/passwd, /etc/network/interfaces, /etc/resolv.conf and others depending on what has been configured from the defaults.<br />
* Install Proxmox VE from ISO (this will wipe all data on the existing host)<br />
* Rebuild the cluster if you had any<br />
* Restore the file /etc/pve/storage.cfg (this will re-map and make available any external media you used for backup) <br />
* Restore firewall configs /etc/pve/firewall/ and /etc/pve/nodes/<node>/host.fw (if relevant)<br />
* Restore full VMs from Backups (see [[Backup and Restore]])<br />
* Restore/Convert containers (see [[Convert OpenVZ to LXC]])<br />
<br />
[[Category:HOWTO]] [[Category:Installation]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Recover_From_Grub_Failure&diff=8299Recover From Grub Failure2016-03-14T14:55:59Z<p>Term: </p>
<hr />
<div>During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot.<br />
I attempted to use the proxmox 4.1 install disk, but found it had a bug where the prompt would not accept input.<br />
<br />
You'll need a ISO for a 64 bit version of Ubuntu, I used 14.04 LTS.<br />
<br />
Boot ubuntu off the ISO. We do not want to install ubuntu, just run it live off the ISO/DVD.<br />
<br />
First We need to activate LVM and mount the the root partition that is inside the LVM container.<br />
*sudo vgscan<br />
*sudo vgchange -ay<br />
<br />
Mount all the filesystems that are already there so we can upgrade/install grub. Your paths may vary depending on your drive configuration.<br />
*sudo mkdir /media/USB<br />
*sudo mount /dev/pve/root /media/USB/<br />
*sudo mount /dev/sda1 /media/USB/boot<br />
*sudo mount -t proc proc /media/USB/proc<br />
*sudo mount -t sysfs sys /media/USB/sys<br />
*sudo mount -o bind /dev /media/USB/dev<br />
<br />
Chroot into your proxmox install.<br />
*chroot /media/USB<br />
<br />
Then upgrade grub and install it.<br />
*grub-upgrade<br />
*grub-install /dev/sda<br />
<br />
If there are no error messages, you should be able to reboot now.<br />
<br />
Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Recover_From_Grub_Failure&diff=8298Recover From Grub Failure2016-03-14T14:54:08Z<p>Term: Created page with "During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot. I attempted to use the proxmox 4.1 install disk, but found it had a bug where ..."</p>
<hr />
<div>During to the upgrade from 3.x to 4.x, I found myself without a working grub and unable to boot.<br />
I attempted to use the proxmox 4.1 install disk, but found it had a bug where the prompt would not accept input.<br />
<br />
You'll need a ISO for a 64 bit version of Ubuntu, I used 14.04 LTS.<br />
<br />
Boot ubuntu off the ISO. We do not want to install ubuntu, just run it live off the ISO/DVD.<br />
<br />
First We need to activate LVM and mount the the root partition that is inside the LVM container.<br />
sudo vgscan<br />
sudo vgchange -ay<br />
<br />
Mount all the filesystems that are already there so we can upgrade/install grub. Your paths may vary depending on your drive configuration.<br />
sudo mkdir /media/USB<br />
sudo mount /dev/pve/root /media/USB/<br />
sudo mount /dev/sda1 /media/USB/boot<br />
sudo mount -t proc proc /media/USB/proc<br />
sudo mount -t sysfs sys /media/USB/sys<br />
sudo mount -o bind /dev /media/USB/dev<br />
<br />
Chroot into your proxmox install.<br />
chroot /media/USB<br />
<br />
Then upgrade grub and install it.<br />
grub-upgrade<br />
grub-install /dev/sda<br />
<br />
If there are no error messages, you should be able to reboot now.<br />
<br />
Credit: https://www.nerdoncoffee.com/operating-systems/re-install-grub-on-proxmox/</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Multicast_notes&diff=7104Multicast notes2015-03-31T18:37:32Z<p>Term: /* Multicast with Infiniband */</p>
<hr />
<div>== Introduction ==<br />
<br />
Multicast allows a single transmission to be delivered to multiple servers at the same time. <br />
<br />
This is the basis for cluster communications in Proxmox VE 2.0 and higher, which uses corosync and cman, and would apply to any other solution which utilizes those clustering tools.<br />
<br />
If multicast does not work in your network infrastructure, you should fix it so that it does. If all else fails, use unicast instead, but beware of the node count limitations with unicast. <br />
<br />
=== IGMP snooping ===<br />
<br />
IGMP snooping prevents flooding multicast traffic to all ports in the broadcast domain by only allowing traffic destined for ports which have solicited such traffic. IGMP snooping is a feature offered by most major switch manufacturers and is often enabled by default on switches. In order for a switch to properly snoop the IGMP traffic, there must be an IGMP querier on the network. If no querier is present, IGMP snooping will actively prevent ALL IGMP/Multicast traffic from being delivered!<br />
<br />
If IGMP snooping is disabled, all multicast traffic will be delivered to all ports which may add unnecessary load, potentially allowing a denial of service attack.<br />
<br />
=== IGMP querier ===<br />
<br />
An IGMP querier is a multicast router that generates IGMP queries. IGMP snooping relies on these queries which are unconditionally forwarded to all ports, as the replies from the destination ports is what builds the internal tables in the switch to allow it to know which traffic to forward.<br />
<br />
IGMP querier can be enabled on your router, switch, or even linux bridges.<br />
<br />
== Configuring IGMP/Multicast ==<br />
<br />
=== Ensuring IGMP Snooping and Querier are enabled on your network (recommended) ===<br />
<br />
==== Juniper - JunOS ====<br />
<br />
Juniper EX switches, by default, enable IGMP snooping on all vlans as can be seen by this config snippet:<br />
<nowiki><br />
[edit protocols]<br />
user@switch# show igmp-snooping<br />
vlan all;<br />
</nowiki><br />
<br />
However, IGMP querier is not enabled by default. If you are using RVIs (Routed Virtual Interfaces) on your switch already, you can enabled IGMP v2 on the interface which enables the querier. However, most administrators do not use RVIs in all vlans on their switches and should be configured instead on the router. The below config setting is the same on Juniper EX switches using RVIs as it is on Juniper SRX service gateways/routers, and effectively enables IGMP querier on the specified interface/vlan. Note you must set this on all vlans which require multicast!:<br />
<nowiki><br />
set protocols igmp $iface version 2<br />
</nowiki><br />
<br />
==== Cisco ====<br />
<br />
==== Brocade ====<br />
<br />
==== Linux: Enabling Multicast querier on bridges ====<br />
If your router or switch does not support enabling a multicast querier, and you are using a classic linux bridge (not Open vSwitch), then you can enable the multicast querier on the Linux bridge by adding this statement to your /etc/network/interfaces bridge configuration:<br />
<nowiki><br />
post-up ( echo 1 > /sys/devices/virtual/net/$IFACE/bridge/multicast_querier )<br />
</nowiki><br />
<br />
=== Disabling IGMP Snooping (not recommended) ===<br />
<br />
==== Juniper - JunOS ====<br />
<nowiki><br />
set protocols igmp-snooping vlan all disable<br />
</nowiki><br />
<br />
==== Cisco Managed Switches ====<br />
<nowiki><br />
# conf t<br />
# no ip igmp snooping<br />
</nowiki><br />
<br />
==== Netgear Managed Switches ====<br />
<br />
the following are pics of setting to get multicast working on our netgear 7300 series switches. for more information see http://documentation.netgear.com/gs700at/enu/202-10360-01/GS700AT%20Series%20UG-06-18.html <br />
<br />
<br> [[Image:Multicast-netgear-1.png]] <br />
<br />
[[Image:Multicast-netgear-2.png]] <br />
<br />
[[Image:Multicast-netgear-3.png]]<br />
<br />
[[File:NetGear-multicast-save-and-apply.png]]<br />
<br />
== Multicast with Infiniband ==<br />
<br />
IP over Infiniband (IPoIB) supports Multicast but Multicast traffic is limited to 2043 Bytes when using connected mode even if you set a larger MTU on the IPoIB interface. <br />
<br />
Corosync has a setting, netmtu, that defaults to 1500 making it compatible with connected mode Infiniband. <br />
<br />
=== Changing netmtu ===<br />
<br />
Changing the netmtu can increase throughput '''The following information is untested.''' <br />
<br />
Edit the /etc/pve/cluster.conf file Add the section: <source lang="xml"><br />
<totem netmtu="2043" /><br />
</source> <br />
<br />
<br> <source lang="xml"><br />
<?xml version="1.0"?><br />
<cluster name="clustername" config_version="2"><br />
<totem netmtu="2043" /><br />
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"><br />
</cman><br />
<br />
<clusternodes><br />
<clusternode name="node1" votes="1" nodeid="1"/><br />
<clusternode name="node2" votes="1" nodeid="2"/><br />
<clusternode name="node3" votes="1" nodeid="3"/></clusternodes><br />
<br />
</cluster><br />
</source> <br />
<br />
<br><br />
<br />
== Testing multicast ==<br />
<br />
Note: not all hosting companies allow multicast traffic. <br />
<br />
First, check your cluster multicast address:<br />
<br />
#pvecm status|grep "Multicast addresses"<br />
Multicast addresses: 239.192.221.35 <br />
<br />
=== Using omping ===<br />
Install on all nodes<br />
<br />
aptitude install omping<br />
<br />
start omping on all nodes with the following command and check the output, e.g:<br />
<br />
omping -m yourmulticastadress node1 node2 node3<br />
<br />
<br />
*note to find the multicast address run this:<br />
pvecm status | grep Multicast<br />
<br />
== Troubleshooting ==<br />
<br />
=== cman & iptables ===<br />
<br />
In case ''cman'' crashes with ''cpg_send_message failed: 9'' add those to your rule set:<br />
<nowiki><br />
iptables -A INPUT -m addrtype --dst-type MULTICAST -j ACCEPT<br />
iptables -A INPUT -p udp -m state --state NEW -m multiport –dports 5404,5405 -j ACCEPT<br />
</nowiki><br />
<br />
=== Use unicast instead of multicast (if all else fails) ===<br />
<br />
Unicast is a technology for sending messages to a single network destination. In corosync, unicast is implemented as UDP-unicast (UDPU). Due to increased network traffic (compared to multicast) the number of supported nodes is limited, do not use it with more that 4 cluster nodes. <br />
<br />
* just create the cluster as usual (pvecm create ...) <br />
* follow this howto to create a cluster.conf.new [[Fencing#General_HowTo_for_editing_the_cluster.conf]]<br />
* add the new '''transport="udpu"''' in /etc/pve/cluster.conf.new (don't forget to increment the version number)<br />
<br />
<source lang="xml"><cman keyfile="/var/lib/pve-cluster/corosync.authkey" transport="udpu"/></source><br />
<br />
* activate via GUI<br />
* add all nodes you want to join in /etc/hosts and reboot<br />
* before you add a node, make sure you add all other nodes in /etc/hosts</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Infiniband&diff=6588Infiniband2014-07-14T13:16:23Z<p>Term: </p>
<hr />
<div>=Introduction=<br />
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) <br />
<br />
==Subnet Manager==<br />
Infiniband requires a subnet manager to function.<br />
Many Infiniband switches have a built in subnet manager that can be enabled.<br />
When using multiple switches you can enable a subnet manager on all of them for redundancy.<br />
<br />
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s).<br />
opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.<br />
<br />
<br />
==Sockets Direct Protocol (SDP)==<br />
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband.<br />
DRBD supports SDP and offers some performance gains.<br />
<br />
The Linux Kernel does not include the SDP module.<br />
If you want to use SDP you need to install OFED.<br />
Thus far I have been unable to get OFED to compile for Proxmox 2.0.<br />
<br />
=IPoIB=<br />
IP over Infiniband allows sending IP packets over the Infiniband fabric.<br />
<br />
==Proxmox 1.X Prerequisites==<br />
Debian Lenny network scripts do not work well with Infiniband interfaces.<br />
This can be corrected by installing the following packages from Debian squeeze:<br />
<pre>ifenslave-2.6_1.1.0-17_amd64.deb<br />
net-tools_1.60-23_amd64.deb<br />
ifupdown_0.6.10_amd64.deb</pre><br />
<br />
==Proxmox 2.0==<br />
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.<br />
<br />
AFAIK this is needed [ rob f 2013-07-13 ]. 2013-08-02 we have subnet manager running on IB switch, so we uninstalled. '''TBD: is this needed under some circumstances?'''<br />
aptitude install opensm<br />
<br />
==Proxmox 3.x==<br />
See directions for 2.0. Nothing has changed in 3 that warrants noting here.<br />
<br />
==Create IPoIB Interface== <br />
<br />
===Bonding===<br />
It is not possible to bond Infiniband to increase throughput<br />
If you want to use bonding for redundancy create a bonding interface.<br />
<br />
/etc/modprobe.d/aliases-bond.conf<br />
<pre>alias bond0 bonding<br />
options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2</pre><br />
<br />
<br />
Infiniband interfaces are named ib0,ib1, etc.<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto bond0<br />
iface bond0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
slaves ib0 ib1<br />
bond_miimon 100<br />
bond_mode active-backup<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
pre-up echo connected > /sys/class/net/ib1/mode<br />
pre-up modprobe bond0<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup bond0</pre><br />
<br />
===Without Bonding===<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto ib0<br />
iface ib0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup ib0</pre><br />
<br />
==TCP/IP Tuning==<br />
These settings performed best on my servers, your mileage may vary.<br />
<br />
edit /etc/sysctl.conf<br />
<pre>#Infiniband Tuning<br />
net.ipv4.tcp_mem=1280000 1280000 1280000<br />
net.ipv4.tcp_wmem = 32768 131072 1280000<br />
net.ipv4.tcp_rmem = 32768 131072 1280000<br />
net.core.rmem_max=16777216<br />
net.core.wmem_max=16777216<br />
net.core.rmem_default=16777216<br />
net.core.wmem_default=16777216<br />
net.core.optmem_max=1524288<br />
net.ipv4.tcp_sack=0<br />
net.ipv4.tcp_timestamps=0</pre><br />
<br />
To apply the changes now:<br />
<pre>sysctl -p</pre><br />
<br />
<br />
==iperf speed tests==<br />
''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.''<br />
<br />
on systems to test install<br />
aptitude install iperf<br />
on one system run as server. in example it is using Ip 10.0.99.8<br />
<pre><br />
iperf -s<br />
------------------------------------------------------------<br />
Server listening on TCP port 5001<br />
TCP window size: 128 KByte (default)<br />
------------------------------------------------------------<br />
</pre><br />
on a client.<br />
<pre><br />
# iperf -c 10.0.99.8<br />
------------------------------------------------------------<br />
Client connecting to 10.0.99.8, TCP port 5001<br />
TCP window size: 646 KByte (default)<br />
------------------------------------------------------------<br />
[ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001<br />
[ ID] Interval Transfer Bandwidth<br />
[ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec<br />
</pre><br />
<br />
==I want to see the infiniband interface exposed in my VMs - can I do that?==<br />
The short answer is no.<br />
The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work. <br />
<br />
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device. <br />
<br />
==Using IB for cluster networking==<br />
IB can be used for cluster communications. <br />
<pre><br />
Edit /etc/hosts and change the host names/IPs to your IB network. <br />
Reboot each host, and make sure ssh can connect to all hosts from each host over IB. <br />
</pre><br />
=admin=<br />
Install this and check docs in pkg and at http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html<br />
<br />
apt-get install infiniband-diags<br />
<pre><br />
aptitude show infiniband-diags<br />
<br />
Package: infiniband-diags <br />
New: yes<br />
State: installed<br />
Automatically installed: no<br />
Version: 1.4.4-20090314-1.2<br />
Priority: extra<br />
Section: net<br />
Maintainer: OFED and Debian Developement and Discussion <pkg-ofed-devel@lists.alioth.debian.org><br />
Architecture: amd64<br />
Uncompressed Size: 472 k<br />
Depends: libc6 (>= 2.3), libibcommon1, libibmad1, libibumad1, libopensm2, perl<br />
Description: InfiniBand diagnostic programs<br />
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its<br />
features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. <br />
<br />
This package provides diagnostic programs and scripts needed to diagnose an InfiniBand subnet.<br />
Homepage: http://www.openfabrics.org<br />
</pre><br />
<br />
* ibstat . in the following we had a cable not fully connected to ib card<br />
<pre><br />
# ibstat<br />
CA 'mthca0'<br />
CA type: MT25208<br />
Number of ports: 2<br />
Firmware version: 5.3.0<br />
Hardware version: a0<br />
Node GUID: 0x0002c90200277c9c<br />
System image GUID: 0x0002c90200277c9f<br />
Port 1:<br />
State: Active<br />
Physical state: LinkUp<br />
Rate: 10<br />
Base lid: 18<br />
LMC: 0<br />
SM lid: 3<br />
Capability mask: 0x02510a68<br />
Port GUID: 0x0002c90200277c9d<br />
Port 2:<br />
State: Down<br />
Physical state: Polling<br />
Rate: 10<br />
Base lid: 0<br />
LMC: 0<br />
SM lid: 0<br />
Capability mask: 0x02510a68<br />
Port GUID: 0x0002c90200277c9e<br />
</pre><br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Infiniband&diff=6587Infiniband2014-07-14T13:15:40Z<p>Term: </p>
<hr />
<div>=Introduction=<br />
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) <br />
<br />
==Subnet Manager==<br />
Infiniband requires a subnet manager to function.<br />
Many Infiniband switches have a built in subnet manager that can be enabled.<br />
When using multiple switches you can enable a subnet manager on all of them for redundancy.<br />
<br />
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s).<br />
opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.<br />
<br />
<br />
==Sockets Direct Protocol (SDP)==<br />
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband.<br />
DRBD supports SDP and offers some performance gains.<br />
<br />
The Linux Kernel does not include the SDP module.<br />
If you want to use SDP you need to install OFED.<br />
Thus far I have been unable to get OFED to compile for Proxmox 2.0.<br />
<br />
=IPoIB=<br />
IP over Infiniband allows sending IP packets over the Infiniband fabric.<br />
<br />
==Proxmox 1.X Prerequisites==<br />
Debian Lenny network scripts do not work well with Infiniband interfaces.<br />
This can be corrected by installing the following packages from Debian squeeze:<br />
<pre>ifenslave-2.6_1.1.0-17_amd64.deb<br />
net-tools_1.60-23_amd64.deb<br />
ifupdown_0.6.10_amd64.deb</pre><br />
<br />
==Proxmox 2.0==<br />
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.<br />
<br />
AFAIK this is needed [ rob f 2013-07-13 ]. 2013-08-02 we have subnet manager running on IB switch, so we uninstalled. '''TBD: is this needed under some circumstances?'''<br />
aptitude install opensm<br />
<br />
==Proxmox 3.x==<br />
See directions for 2.0. Nothing has changed in 3 that warrants noting here.<br />
<br />
==Create IPoIB Interface== <br />
<br />
===Bonding===<br />
It is not possible to bond Infiniband to increase throughput<br />
If you want to use bonding for redundancy create a bonding interface.<br />
<br />
/etc/modprobe.d/aliases-bond.conf<br />
<pre>alias bond0 bonding<br />
options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2</pre><br />
<br />
<br />
Infiniband interfaces are named ib0,ib1, etc.<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto bond0<br />
iface bond0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
slaves ib0 ib1<br />
bond_miimon 100<br />
bond_mode active-backup<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
pre-up echo connected > /sys/class/net/ib1/mode<br />
pre-up modprobe bond0<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup bond0</pre><br />
<br />
===Without Bonding===<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto ib0<br />
iface ib0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup ib0</pre><br />
<br />
==TCP/IP Tuning==<br />
These settings performed best on my servers, your mileage may vary.<br />
<br />
edit /etc/sysctl.conf<br />
<pre>#Infiniband Tuning<br />
net.ipv4.tcp_mem=1280000 1280000 1280000<br />
net.ipv4.tcp_wmem = 32768 131072 1280000<br />
net.ipv4.tcp_rmem = 32768 131072 1280000<br />
net.core.rmem_max=16777216<br />
net.core.wmem_max=16777216<br />
net.core.rmem_default=16777216<br />
net.core.wmem_default=16777216<br />
net.core.optmem_max=1524288<br />
net.ipv4.tcp_sack=0<br />
net.ipv4.tcp_timestamps=0</pre><br />
<br />
To apply the changes now:<br />
<pre>sysctl -p</pre><br />
<br />
<br />
==iperf speed tests==<br />
''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.''<br />
<br />
on systems to test install<br />
aptitude install iperf<br />
on one system run as server. in example it is using Ip 10.0.99.8<br />
<pre><br />
iperf -s<br />
------------------------------------------------------------<br />
Server listening on TCP port 5001<br />
TCP window size: 128 KByte (default)<br />
------------------------------------------------------------<br />
</pre><br />
on a client.<br />
<pre><br />
# iperf -c 10.0.99.8<br />
------------------------------------------------------------<br />
Client connecting to 10.0.99.8, TCP port 5001<br />
TCP window size: 646 KByte (default)<br />
------------------------------------------------------------<br />
[ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001<br />
[ ID] Interval Transfer Bandwidth<br />
[ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec<br />
</pre><br />
<br />
==I want to see the infiniband interface exposed in my VMs - can I do that?==<br />
The short answer is no.<br />
The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work. <br />
<br />
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device. <br />
<br />
==Using IB for cluster networking==<br />
IB can be used for cluster communications. <br />
Edit /etc/hosts and change the host names/IPs to your IB network. <br />
Reboot each host, and make sure ssh can connect to all hosts from each host over IB. <br />
<br />
=admin=<br />
Install this and check docs in pkg and at http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-4.html<br />
<br />
apt-get install infiniband-diags<br />
<pre><br />
aptitude show infiniband-diags<br />
<br />
Package: infiniband-diags <br />
New: yes<br />
State: installed<br />
Automatically installed: no<br />
Version: 1.4.4-20090314-1.2<br />
Priority: extra<br />
Section: net<br />
Maintainer: OFED and Debian Developement and Discussion <pkg-ofed-devel@lists.alioth.debian.org><br />
Architecture: amd64<br />
Uncompressed Size: 472 k<br />
Depends: libc6 (>= 2.3), libibcommon1, libibmad1, libibumad1, libopensm2, perl<br />
Description: InfiniBand diagnostic programs<br />
InfiniBand is a switched fabric communications link used in high-performance computing and enterprise data centers. Its<br />
features include high throughput, low latency, quality of service and failover, and it is designed to be scalable. <br />
<br />
This package provides diagnostic programs and scripts needed to diagnose an InfiniBand subnet.<br />
Homepage: http://www.openfabrics.org<br />
</pre><br />
<br />
* ibstat . in the following we had a cable not fully connected to ib card<br />
<pre><br />
# ibstat<br />
CA 'mthca0'<br />
CA type: MT25208<br />
Number of ports: 2<br />
Firmware version: 5.3.0<br />
Hardware version: a0<br />
Node GUID: 0x0002c90200277c9c<br />
System image GUID: 0x0002c90200277c9f<br />
Port 1:<br />
State: Active<br />
Physical state: LinkUp<br />
Rate: 10<br />
Base lid: 18<br />
LMC: 0<br />
SM lid: 3<br />
Capability mask: 0x02510a68<br />
Port GUID: 0x0002c90200277c9d<br />
Port 2:<br />
State: Down<br />
Physical state: Polling<br />
Rate: 10<br />
Base lid: 0<br />
LMC: 0<br />
SM lid: 0<br />
Capability mask: 0x02510a68<br />
Port GUID: 0x0002c90200277c9e<br />
</pre><br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=PCI_Passthrough&diff=6088PCI Passthrough2013-11-18T16:17:04Z<p>Term: </p>
<hr />
<div>To enable PCI passthrough, you need to configure: <br />
<br />
== INTEL ==<br />
<br />
<br> edit: <source lang="bash"><br />
#vi /etc/default/grub<br />
</source> change <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet" <br />
</source> to <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" <br />
</source> then <source lang="bash"><br />
# update-grub<br />
# reboot<br />
</source> <br />
<br />
<br> <br />
<br />
Then run "dmesg | grep -e DMAR -e IOMMU" from the command line. &nbsp;If there is no output, then something is wrong. <br />
<br />
== AMD ==<br />
<br />
Edit: <br />
<br />
<source lang="bash"><br />
# vi /etc/default/grub<br />
</source> <br />
<br />
Change: <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet" <br />
</source> To: <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on" <br />
</source> Then: <source lang="bash"><br />
# update-grub<br />
# echo "options kvm allow_unsafe_assigned_interrupts=1" > /etc/modprobe.d/kvm_iommu_map_guest.conf <br />
# reboot<br />
</source> <br />
<br />
<br />
<br />
== Determine your PCI card address, and configure your VM ==<br />
<br />
<br />
<br />
Locate your card using "lspci". &nbsp;The address should be in the form of: 04:00.0<br />
<br />
Manually edit the node.conf file. &nbsp;It can be located at:&nbsp;/etc/pve/nodes/proxmox3/qemu-server/vmnumber.conf.<br />
<br />
Add this line to the end of the file: "hostpci0: 04:00.0"<br />
<br />
<br />
<br />
== Verify Operation ==<br />
<br />
<br />
<br />
Start the VM from the UI.<br />
<br />
Enter the qm monitor. &nbsp;"qm monitor vmnumber"<br />
<br />
Verify that your card is listed here: "info pci"<br />
<br />
Then install drivers on your guest OS. &nbsp;<br />
<br />
<br />
<br />
NOTE: Card support might be limited to 2 or 3 devices.<br />
<br />
NOTE: This process will remove the card from the proxmox host OS. &nbsp;<br />
<br />
Editorial Note: Using PCI passthrough to present drives direct to a ZFS (FreeNAS, Openfiler, OmniOS) virtual machine is dangerous on many levels and is not recommended for production use. Specific FreeNAS warnings can be found here: http://forums.freenas.org/threads/absolutely-must-virtualize-freenas-a-guide-to-not-completely-losing-your-data.12714/ <br />
<br />
[[Category:HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Infiniband&diff=6018Infiniband2013-10-25T20:20:37Z<p>Term: /* I want to see the infiband interface exposed in my VMs - can I do that? */</p>
<hr />
<div>=Introduction=<br />
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) <br />
<br />
==Subnet Manager==<br />
Infiniband requires a subnet manager to function.<br />
Many Infiniband switches have a built in subnet manager that can be enabled.<br />
When using multiple switches you can enable a subnet manager on all of them for redundancy.<br />
<br />
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s).<br />
opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.<br />
<br />
<br />
==Sockets Direct Protocol (SDP)==<br />
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband.<br />
DRBD supports SDP and offers some performance gains.<br />
<br />
The Linux Kernel does not include the SDP module.<br />
If you want to use SDP you need to install OFED.<br />
Thus far I have been unable to get OFED to compile for Proxmox 2.0.<br />
<br />
=IPoIB=<br />
IP over Infiniband allows sending IP packets over the Infiniband fabric.<br />
<br />
==Proxmox 1.X Prerequisites==<br />
Debian Lenny network scripts do not work well with Infiniband interfaces.<br />
This can be corrected by installing the following packages from Debian squeeze:<br />
<pre>ifenslave-2.6_1.1.0-17_amd64.deb<br />
net-tools_1.60-23_amd64.deb<br />
ifupdown_0.6.10_amd64.deb</pre><br />
<br />
==Proxmox 2.0==<br />
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.<br />
<br />
AFAIK this is needed [ rob f 2013-07-13 ]. 2013-08-02 we have subnet manager running on IB switch, so we uninstalled. '''TBD: is this needed under some circumstances?'''<br />
aptitude install opensm<br />
<br />
==Proxmox 3.x==<br />
See directions for 2.0. Nothing has changed in 3 that warrants noting here.<br />
<br />
==Create IPoIB Interface== <br />
<br />
===Bonding===<br />
It is not possible to bond Infiniband to increase throughput<br />
If you want to use bonding for redundancy create a bonding interface.<br />
<br />
/etc/modprobe.d/aliases-bond.conf<br />
<pre>alias bond0 bonding<br />
options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2</pre><br />
<br />
<br />
Infiniband interfaces are named ib0,ib1, etc.<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto bond0<br />
iface bond0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
slaves ib0 ib1<br />
bond_miimon 100<br />
bond_mode active-backup<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
pre-up echo connected > /sys/class/net/ib1/mode<br />
pre-up modprobe bond0<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup bond0</pre><br />
<br />
===Without Bonding===<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto ib0<br />
iface ib0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup ib0</pre><br />
<br />
==TCP/IP Tuning==<br />
These settings performed best on my servers, your mileage may vary.<br />
<br />
edit /etc/sysctl.conf<br />
<pre>#Infiniband Tuning<br />
net.ipv4.tcp_mem=1280000 1280000 1280000<br />
net.ipv4.tcp_wmem = 32768 131072 1280000<br />
net.ipv4.tcp_rmem = 32768 131072 1280000<br />
net.core.rmem_max=16777216<br />
net.core.wmem_max=16777216<br />
net.core.rmem_default=16777216<br />
net.core.wmem_default=16777216<br />
net.core.optmem_max=1524288<br />
net.ipv4.tcp_sack=0<br />
net.ipv4.tcp_timestamps=0</pre><br />
<br />
To apply the changes now:<br />
<pre>sysctl -p</pre><br />
<br />
<br />
==iperf speed tests==<br />
''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.''<br />
<br />
on systems to test install<br />
aptitude install iperf<br />
on one system run as server. in example it is using Ip 10.0.99.8<br />
<pre><br />
iperf -s<br />
------------------------------------------------------------<br />
Server listening on TCP port 5001<br />
TCP window size: 128 KByte (default)<br />
------------------------------------------------------------<br />
</pre><br />
on a client.<br />
<pre><br />
# iperf -c 10.0.99.8<br />
------------------------------------------------------------<br />
Client connecting to 10.0.99.8, TCP port 5001<br />
TCP window size: 646 KByte (default)<br />
------------------------------------------------------------<br />
[ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001<br />
[ ID] Interval Transfer Bandwidth<br />
[ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec<br />
</pre><br />
<br />
==I want to see the infiniband interface exposed in my VMs - can I do that?==<br />
The short answer is no.<br />
The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work. <br />
<br />
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device. <br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Infiniband&diff=6017Infiniband2013-10-25T18:00:44Z<p>Term: </p>
<hr />
<div>=Introduction=<br />
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) <br />
<br />
==Subnet Manager==<br />
Infiniband requires a subnet manager to function.<br />
Many Infiniband switches have a built in subnet manager that can be enabled.<br />
When using multiple switches you can enable a subnet manager on all of them for redundancy.<br />
<br />
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s).<br />
opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.<br />
<br />
<br />
==Sockets Direct Protocol (SDP)==<br />
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband.<br />
DRBD supports SDP and offers some performance gains.<br />
<br />
The Linux Kernel does not include the SDP module.<br />
If you want to use SDP you need to install OFED.<br />
Thus far I have been unable to get OFED to compile for Proxmox 2.0.<br />
<br />
=IPoIB=<br />
IP over Infiniband allows sending IP packets over the Infiniband fabric.<br />
<br />
==Proxmox 1.X Prerequisites==<br />
Debian Lenny network scripts do not work well with Infiniband interfaces.<br />
This can be corrected by installing the following packages from Debian squeeze:<br />
<pre>ifenslave-2.6_1.1.0-17_amd64.deb<br />
net-tools_1.60-23_amd64.deb<br />
ifupdown_0.6.10_amd64.deb</pre><br />
<br />
==Proxmox 2.0==<br />
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.<br />
<br />
AFAIK this is needed [ rob f 2013-07-13 ]. 2013-08-02 we have subnet manager running on IB switch, so we uninstalled. '''TBD: is this needed under some circumstances?'''<br />
aptitude install opensm<br />
<br />
==Proxmox 3.x==<br />
See directions for 2.0. Nothing has changed in 3 that warrants noting here.<br />
<br />
==Create IPoIB Interface== <br />
<br />
===Bonding===<br />
It is not possible to bond Infiniband to increase throughput<br />
If you want to use bonding for redundancy create a bonding interface.<br />
<br />
/etc/modprobe.d/aliases-bond.conf<br />
<pre>alias bond0 bonding<br />
options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2</pre><br />
<br />
<br />
Infiniband interfaces are named ib0,ib1, etc.<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto bond0<br />
iface bond0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
slaves ib0 ib1<br />
bond_miimon 100<br />
bond_mode active-backup<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
pre-up echo connected > /sys/class/net/ib1/mode<br />
pre-up modprobe bond0<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup bond0</pre><br />
<br />
===Without Bonding===<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto ib0<br />
iface ib0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup ib0</pre><br />
<br />
==TCP/IP Tuning==<br />
These settings performed best on my servers, your mileage may vary.<br />
<br />
edit /etc/sysctl.conf<br />
<pre>#Infiniband Tuning<br />
net.ipv4.tcp_mem=1280000 1280000 1280000<br />
net.ipv4.tcp_wmem = 32768 131072 1280000<br />
net.ipv4.tcp_rmem = 32768 131072 1280000<br />
net.core.rmem_max=16777216<br />
net.core.wmem_max=16777216<br />
net.core.rmem_default=16777216<br />
net.core.wmem_default=16777216<br />
net.core.optmem_max=1524288<br />
net.ipv4.tcp_sack=0<br />
net.ipv4.tcp_timestamps=0</pre><br />
<br />
To apply the changes now:<br />
<pre>sysctl -p</pre><br />
<br />
<br />
==iperf speed tests==<br />
''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.''<br />
<br />
on systems to test install<br />
aptitude install iperf<br />
on one system run as server. in example it is using Ip 10.0.99.8<br />
<pre><br />
iperf -s<br />
------------------------------------------------------------<br />
Server listening on TCP port 5001<br />
TCP window size: 128 KByte (default)<br />
------------------------------------------------------------<br />
</pre><br />
on a client.<br />
<pre><br />
# iperf -c 10.0.99.8<br />
------------------------------------------------------------<br />
Client connecting to 10.0.99.8, TCP port 5001<br />
TCP window size: 646 KByte (default)<br />
------------------------------------------------------------<br />
[ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001<br />
[ ID] Interval Transfer Bandwidth<br />
[ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec<br />
</pre><br />
<br />
==I want to see the infiband interface exposed in my VMs - can I do that?==<br />
The short answer is no.<br />
The long answer is that you can use manual routes to pass traffic through the virtio interface to your IB card. This will inflict a potentially enormous penalty on the transfer rates but it does work. <br />
<br />
In the long run, what is needed is a special KVM driver, similar to the Virtio driver, that would allow the IB card to be abstracted and presented to all your VMs as a separate device. <br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=Infiniband&diff=6016Infiniband2013-10-25T17:52:09Z<p>Term: </p>
<hr />
<div>=Introduction=<br />
Infiniband can be used with DRBD to speed up replication, this article covers setting up IP over Infiniband(IPoIB) <br />
<br />
==Subnet Manager==<br />
Infiniband requires a subnet manager to function.<br />
Many Infiniband switches have a built in subnet manager that can be enabled.<br />
When using multiple switches you can enable a subnet manager on all of them for redundancy.<br />
<br />
If your switch does not have a subnet manager, or if you are not using a switch then you need to run a subnet manager on your node(s).<br />
opensm package in Debian Squeeze and up should be suffecient if you need a subnet manager.<br />
<br />
<br />
==Sockets Direct Protocol (SDP)==<br />
SDP can be used with a preload library to speed up TCP/IP communications over Infiniband.<br />
DRBD supports SDP and offers some performance gains.<br />
<br />
The Linux Kernel does not include the SDP module.<br />
If you want to use SDP you need to install OFED.<br />
Thus far I have been unable to get OFED to compile for Proxmox 2.0.<br />
<br />
=IPoIB=<br />
IP over Infiniband allows sending IP packets over the Infiniband fabric.<br />
<br />
==Proxmox 1.X Prerequisites==<br />
Debian Lenny network scripts do not work well with Infiniband interfaces.<br />
This can be corrected by installing the following packages from Debian squeeze:<br />
<pre>ifenslave-2.6_1.1.0-17_amd64.deb<br />
net-tools_1.60-23_amd64.deb<br />
ifupdown_0.6.10_amd64.deb</pre><br />
<br />
==Proxmox 2.0==<br />
Nothing special is needed with Proxmox 2.0, everything seems to work out of the box.<br />
<br />
AFAIK this is needed [ rob f 2013-07-13 ]. 2013-08-02 we have subnet manager running on IB switch, so we uninstalled. '''TBD: is this needed under some circumstances?'''<br />
aptitude install opensm<br />
<br />
==Proxmox 3.x==<br />
See directions for 2.0. Nothing has changed in 3 that warrants noting here.<br />
<br />
==Create IPoIB Interface== <br />
<br />
===Bonding===<br />
It is not possible to bond Infiniband to increase throughput<br />
If you want to use bonding for redundancy create a bonding interface.<br />
<br />
/etc/modprobe.d/aliases-bond.conf<br />
<pre>alias bond0 bonding<br />
options bond0 mode=1 miimon=100 downdelay=200 updelay=200 max_bonds=2</pre><br />
<br />
<br />
Infiniband interfaces are named ib0,ib1, etc.<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto bond0<br />
iface bond0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
slaves ib0 ib1<br />
bond_miimon 100<br />
bond_mode active-backup<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
pre-up echo connected > /sys/class/net/ib1/mode<br />
pre-up modprobe bond0<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup bond0</pre><br />
<br />
===Without Bonding===<br />
Edit /etc/network/interfaces<br />
<br />
<pre>auto ib0<br />
iface ib0 inet static<br />
address 192.168.1.1<br />
netmask 255.255.255.0<br />
pre-up modprobe ib_ipoib<br />
pre-up echo connected > /sys/class/net/ib0/mode<br />
mtu 65520 </pre><br />
<br />
To bring up the interface:<br />
<pre>ifup ib0</pre><br />
<br />
==TCP/IP Tuning==<br />
These settings performed best on my servers, your mileage may vary.<br />
<br />
edit /etc/sysctl.conf<br />
<pre>#Infiniband Tuning<br />
net.ipv4.tcp_mem=1280000 1280000 1280000<br />
net.ipv4.tcp_wmem = 32768 131072 1280000<br />
net.ipv4.tcp_rmem = 32768 131072 1280000<br />
net.core.rmem_max=16777216<br />
net.core.wmem_max=16777216<br />
net.core.rmem_default=16777216<br />
net.core.wmem_default=16777216<br />
net.core.optmem_max=1524288<br />
net.ipv4.tcp_sack=0<br />
net.ipv4.tcp_timestamps=0</pre><br />
<br />
To apply the changes now:<br />
<pre>sysctl -p</pre><br />
<br />
<br />
==iperf speed tests==<br />
''this is the 1-st time I've used iperf. there are probably better options to use for the iperf command.''<br />
<br />
on systems to test install<br />
aptitude install iperf<br />
on one system run as server. in example it is using Ip 10.0.99.8<br />
<pre><br />
iperf -s<br />
------------------------------------------------------------<br />
Server listening on TCP port 5001<br />
TCP window size: 128 KByte (default)<br />
------------------------------------------------------------<br />
</pre><br />
on a client.<br />
<pre><br />
# iperf -c 10.0.99.8<br />
------------------------------------------------------------<br />
Client connecting to 10.0.99.8, TCP port 5001<br />
TCP window size: 646 KByte (default)<br />
------------------------------------------------------------<br />
[ 3] local 10.0.99.30 port 38629 connected with 10.0.99.8 port 5001<br />
[ ID] Interval Transfer Bandwidth<br />
[ 3] 0.0-10.0 sec 8.98 GBytes 7.71 Gbits/sec<br />
</pre><br />
<br />
<br />
<br />
[[Category: HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=PCI_Passthrough&diff=5882PCI Passthrough2013-08-30T15:32:30Z<p>Term: </p>
<hr />
<div>To enable PCI passthrough, you need to configure: <br />
<br />
== INTEL ==<br />
<br />
<br> edit: <source lang="bash"><br />
#vi /etc/default/grub<br />
</source> change <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet" <br />
</source> to <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" <br />
</source> then <source lang="bash"><br />
# update-grub<br />
# reboot<br />
</source> <br />
<br />
<br> <br />
<br />
Then run "dmesg | grep -e DMAR -e IOMMU" from the command line. &nbsp;If there is no output, then something is wrong. <br />
<br />
== AMD ==<br />
<br />
Edit: <br />
<br />
<source lang="bash"><br />
# vi /etc/default/grub<br />
</source> <br />
<br />
Change: <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet" <br />
</source> To: <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on" <br />
</source> Then: <source lang="bash"><br />
# update-grub<br />
# echo "options kvm allow_unsafe_assigned_interrupts=1" > /etc/modprobe.d/kvm_iommu_map_guest.conf <br />
# reboot<br />
</source> <br />
<br />
<br />
<br />
== Determine your PCI card address, and configure your VM ==<br />
<br />
<br />
<br />
Locate your card using "lspci". &nbsp;The address should be in the form of: 04:00.0<br />
<br />
Manually edit the node.conf file. &nbsp;It can be located at:&nbsp;/etc/pve/nodes/proxmox3/qemu-server/vmnumber.conf.<br />
<br />
Add this line to the end of the file: "hostpci0: 04:00.0"<br />
<br />
<br />
<br />
== Verify Operation ==<br />
<br />
<br />
<br />
Start the VM from the UI.<br />
<br />
Enter the qm monitor. &nbsp;"qm monitor vmnumber"<br />
<br />
Verify that your card is listed here: "info pci"<br />
<br />
Then install drivers on your guest OS. &nbsp;<br />
<br />
<br />
<br />
NOTE: Card support might be limited to 2 or 3 devices.<br />
<br />
NOTE: This process will remove the card from the proxmox host OS. &nbsp;<br />
<br />
[[Category:HOWTO]]</div>Termhttps://pve.proxmox.com/mediawiki/index.php?title=PCI_Passthrough&diff=5881PCI Passthrough2013-08-30T15:24:18Z<p>Term: </p>
<hr />
<div>To enable PCI passthrough, you need to configure: <br />
<br />
== INTEL ==<br />
<br />
<br> edit: <source lang="bash"><br />
#vi /etc/default/grub<br />
</source> change <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet" <br />
</source> to <source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" <br />
</source> then <source lang="bash"><br />
# update-grub<br />
# reboot<br />
</source> <br />
<br />
<br />
<br />
Then run "dmesg | grep -e DMAR -e IOMMU" from the command line. &nbsp;If there is no output, then something is wrong.<br />
<br />
==AMD== <br />
<br />
Edit: <br />
<br />
<source lang="bash"><br />
# vi /etc/default/grub<br />
</source> <br />
<br />
Change: <br />
<source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet" <br />
</source><br />
To: <br />
<source lang="bash"><br />
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on" <br />
</source><br />
Then:<br />
<source lang="bash"><br />
# update-grub<br />
# echo "options kvm allow_unsafe_assigned_interrupts=1" > /etc/modprobe.d/kvm_iommu_map_guest.conf <br />
# reboot<br />
</source> <br />
<br />
[[Category: HOWTO]]</div>Term