VMA: Difference between revisions

Revision as of 07:54, 5 December 2013

PVE has a new format for vm backups (since 2.3): VMA

Since the 2.3 release PVE has a new format for its very powerful vm backup feature: .vma, replacing the old one, the common .tar format.

As with the old .tar, also .vma is stored compressed, in .lzo format (in the old days it was .gz).

You can read about Backup and Restore in http://pve.proxmox.com/wiki/Backup_and_Restore but other pages may describe other related aspects, you can find them doing a wiki search for the 'backup' word

What you will find below are informations about the new .vma format.

The reasons for the switch from tar

PVE nowadays supports various storage models (some of them still in experimental flavor): lvm, sheepdog, ceph, local, nfs, iscsi, and can allow for usage of raw, qcow, vmdk, images and so on.

see, amongst the other wiki pages:

The main reason for developing a brand new format for vm backup, was to being able to use just one format, and efficiently, in particular when it comes to snapshots usage, which can also be done in a number of ways depending on the storage model, which showed drawbacks with the plain old standard .tar format, and .vma try to address and efficiently allow one single backup behavior.

There is a very detailed explanation of those reasons, currently here only in the proxmox git, but here is a summary:

Most VM backup solutions use some kind of snapshot to get a consistent VM view at a specific point in time (eg: LVM snapshots, qcow2 snapshot, qemu

livebackup), but they can involve considerable overhead, in different ways.

Some storage types/formats supports internal snapshots using some kind of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible

to use that for backups, but for now we want to be storage-independent.

There is the need of a way to be more efficient, and avoid any unnecessary step, but to make that work, the backup archive need to be able to store image

data 'out of order', although this will not work with traditional archive formats like tar.

The new method/format allows for very good performance, works on any storage type and image format, doe not need temporary storage, it is simple archive format, which is able to store sparse files efficiently.

So it's easy to understand that the aim is to get a good, simple, efficient and consistent backup behavior, more specifically suited to the vm backup, unlike the old traditional file/folder tools!

The VMA format specification

The format details can be found here.

Virtual Machine Archive format (VMA)

This format contains a header which includes the VM configuration as binary blobs, and a list of devices (dev_id, name).

The actual VM image data is stored inside extents. An extent contains up to 64 clusters, and start with a 512 byte header containing additional information for those clusters.

We use a cluster size of 65536, and use 8 bytes for each cluster in the header to store the following information:

1 byte dev_id (to identity the drive)
1 byte not used (reserved)
2 bytes zero indicator (mark zero regions (16x4096))
4 bytes cluster number

We only store non-zero blocks (such block is 4096 bytes).

Each archive is marked with a uuid. The archive header and all extent headers includes that uuid and a MD5 checksum (over header data).

>>

command line utility

~# vma
usage: vma command [command options]

vma list <filename>
vma create <filename> [-c config] <archive> pathname ...
vma extract <filename> [-v] [-r <fifo>] <targetdir>
vma verify <filename> [-v]

Things to be aware of

Using well-known archive formats like the old tar.gz allowed (mainly) linux users to take advantage of a number of tools already available, such as rdiff to do off-site incremental backups, but now everyone should be aware that the new vma backup file is unique with no similarity to the vma from the old backup, so always the full file needs to be moved to the remote location. This means that rdiff will not be able anymore to easily spot "diffs" between two similar vma files, and will therefore produce a very big (not really) "incremental" file. See this post for more info and an example: http://forum.proxmox.com/threads/13475-Proxmox-2-3-new-backup-methode-vma-not-rdiff-friendly

@@ Line 1: / Line 1: @@
-==PVE has a new format for vm backups (since 2.3): VMA==
+== PVE has a new format for vm backups (since 2.3): VMA  ==
 Since the 2.3 release PVE has a new format for its very powerful vm backup feature: .vma, replacing the old one, the common .tar format.
 As with the old .tar, also .vma is stored compressed, in .lzo format (in the old days it was .gz).
-You can read about Backup and Restore in http://pve.proxmox.com/wiki/Backup_and_Restore but other pages may describe other related aspects,
+You can read about Backup and Restore in http://pve.proxmox.com/wiki/Backup_and_Restore but other pages may describe other related aspects, you can find them doing a [http://pve.proxmox.com/wiki/Special:Search?search=backup wiki search for the 'backup' word]
-you can find them doing a [http://pve.proxmox.com/wiki/Special:Search?search=backup wiki search for the 'backup' word]
 What you will find below are informations about the new .vma format.
-==About this page==
+<br>
-At the moment detailed technical info about this brand new format are not officially available, and you can only find some discussions in the forums, or some early tech doc in the proxmox git repository: this page aims at collect the most useful info around for users, just to let them know what .vma is, why is here, how to deal or not deal with it, and such :-)
-===The reasons for the switch from tar===
+=== The reasons for the switch from tar  ===
-PVE nowadays supports various storage models (some of them still in experimental flavor): lvm, sheepdog, ceph, local, nfs, iscsi, and can allow for usage of raw, qcow, vmdk, images and so on.
-see, amongst the other wiki pages:
+PVE nowadays supports various storage models (some of them still in experimental flavor): lvm, sheepdog, ceph, local, nfs, iscsi, and can allow for usage of raw, qcow, vmdk, images and so on.
-* http://pve.proxmox.com/wiki/Storage_Model
-* http://pve.proxmox.com/wiki/Storage:_Sheepdog
-* http://pve.proxmox.com/wiki/Storage:_Ceph
-The main reason for developing a brand new format for vm backup, was to being able to use just one format, and efficiently, in particular when it comes to snapshots usage, which can also be done in a number of ways depending on the storage model, which showed drawbacks with the plain old standard .tar format, and .vma try to address and efficiently allow one single backup behavior.
+see, amongst the other wiki pages:
-There is a very detailed explanation of those reasons, currently here [https://git.proxmox.com/?p=qemu.git;a=blob_plain;f=docs/backup.txt;hb=backup only in the proxmox git], but here is a summary:
+*http://pve.proxmox.com/wiki/Storage_Model
+*http://pve.proxmox.com/wiki/Storage:_Sheepdog
+*http://pve.proxmox.com/wiki/Storage:_Ceph
-* Most VM backup solutions use some kind of snapshot to get a consistent VM view at a specific point in time (eg: LVM snapshots, qcow2 snapshot, qemu
+The main reason for developing a brand new format for vm backup, was to being able to use just one format, and efficiently, in particular when it comes to snapshots usage, which can also be done in a number of ways depending on the storage model, which showed drawbacks with the plain old standard .tar format, and .vma try to address and efficiently allow one single backup behavior.
-livebackup), but they can involve considerable overhead, in different ways.
-* Some storage types/formats supports internal snapshots using some kind of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible
-to use that for backups, but for now we want to be storage-independent.
-* There is the need of a way to be more efficient, and avoid any unnecessary step, but to make that work, the backup archive need to be able to store image
-data 'out of order', although this will not work with traditional archive formats like tar.
-* The new method/format allows for very good performance, works on any storage type and image format, doe not need temporary storage, it is simple archive format, which is able to store sparse files efficiently.
-So it's easy to understand that the aim is to get a good, simple, efficient and consistent backup behavior, more specifically suited to the vm backup, unlike the old traditional file/folder tools!
+There is a very detailed explanation of those reasons, currently here [https://git.proxmox.com/?p=qemu.git;a=blob_plain;f=docs/backup.txt;hb=backup only in the proxmox git], but here is a summary:
-===The VMA format specification===
+*Most VM backup solutions use some kind of snapshot to get a consistent VM view at a specific point in time (eg: LVM snapshots, qcow2 snapshot, qemu
-As said before, there are not many info about .vma around just now, so this is all can be found on proxmox git repository, just to gave an idea
-<<
+livebackup), but they can involve considerable overhead, in different ways.
-====Virtual Machine Archive format (VMA)====
-This format contains a header which includes the VM configuration as
-binary blobs, and a list of devices (dev_id, name).
-The actual VM image data is stored inside extents. An extent contains
+*Some storage types/formats supports internal snapshots using some kind of reference counting (rados, sheepdog, dm-thin, qcow2). It would be possible
-up to 64 clusters, and start with a 512 byte header containing
-additional information for those clusters.
-We use a cluster size of 65536, and use 8 bytes for each
+to use that for backups, but for now we want to be storage-independent.
-cluster in the header to store the following information:
-* 1 byte dev_id (to identity the drive)
+*There is the need of a way to be more efficient, and avoid any unnecessary step, but to make that work, the backup archive need to be able to store image
-* 1 byte not used (reserved)
-* 2 bytes zero indicator (mark zero regions (16x4096))
-* 4 bytes cluster number
-We only store non-zero blocks (such block is 4096 bytes).
+data 'out of order', although this will not work with traditional archive formats like tar.
-Each archive is marked with a uuid. The archive header and all
+*The new method/format allows for very good performance, works on any storage type and image format, doe not need temporary storage, it is simple archive format, which is able to store sparse files efficiently.
-extent headers includes that uuid and a MD5 checksum (over header
-data).
->>
+So it's easy to understand that the aim is to get a good, simple, efficient and consistent backup behavior, more specifically suited to the vm backup, unlike the old traditional file/folder tools!
-=== command line utility ===
+=== The VMA format specification  ===
+The format details can be found [https://git.proxmox.com/?p=pve-qemu-kvm.git;a=blob;f=vma_spec.txt here].
+==== Virtual Machine Archive format (VMA)  ====
+This format contains a header which includes the VM configuration as binary blobs, and a list of devices (dev_id, name).
+The actual VM image data is stored inside extents. An extent contains up to 64 clusters, and start with a 512 byte header containing additional information for those clusters.
+We use a cluster size of 65536, and use 8 bytes for each cluster in the header to store the following information:
+*1 byte dev_id (to identity the drive)
+*1 byte not used (reserved)
+*2 bytes zero indicator (mark zero regions (16x4096))
+*4 bytes cluster number
+We only store non-zero blocks (such block is 4096 bytes).
+Each archive is marked with a uuid. The archive header and all extent headers includes that uuid and a MD5 checksum (over header data).
+&gt;&gt;
+=== command line utility  ===
   ~# vma
   usage: vma command [command options]
-  vma list <filename>
+  vma list &lt;filename&gt;
-  vma create <filename> [-c config] <archive> pathname ...
+  vma create &lt;filename&gt; [-c config] &lt;archive&gt; pathname ...
-  vma extract <filename> [-v] [-r <fifo>] <targetdir>
+  vma extract &lt;filename&gt; [-v] [-r &lt;fifo&gt;] &lt;targetdir&gt;
-  vma verify <filename> [-v]
+  vma verify &lt;filename&gt; [-v]
+<br>
-===Things to be aware of===
+=== Things to be aware of  ===
 Using well-known archive formats like the old tar.gz allowed (mainly) linux users to take advantage of a number of tools already available, such as rdiff to do off-site incremental backups, but now everyone should be aware that the new vma backup file is unique with no similarity to the vma from the old backup, so always the full file needs to be moved to the remote location. This means that rdiff will not be able anymore to easily spot "diffs" between two similar vma files, and will therefore produce a very big (not really) "incremental" file. See this post for more info and an example: http://forum.proxmox.com/threads/13475-Proxmox-2-3-new-backup-methode-vma-not-rdiff-friendly