[pve-devel] [PATCH 1/4] RFC: Efficient VM backup for qemu

Tue Nov 13 16:31:33 CET 2012

some first notes:

>>+Another approch to backup VM images is to create a new qcow2 image 
>>+which use the old image as base. During backup, writes are redirected 
>>+to the new image, so the old image represents a 'snapshot'. After 
>>+backup, data need to be copied back from new image into the old 
>>+one (commit). So a simple write during backup triggers the following 
>>+steps: 
>>+ 
>>+1.) write new data to new image (VM write) 
>>+2.) read data from old image (backup) 
>>+3.) write data from old image into tar file (backup) 
>>+ 
>>+4.) read data from new image (commit) 
>>+5.) write data to old image (commit) 
>>+ 
>>+This is in fact the same overhead as before. Other tools like qemu 
>>+livebackup produces similar overhead (2 reads, 3 writes). 
>>+ 

This is not true with all storages snapshots.
rbd,sheepdog,nexenta by example ("internal snapshots), you don't need to do step 4) 5)  (merging the snapshot in base image).
deleting the snapshot only delete references(so it's fast).
But it's true with lvm or qcow2 (external snapshot)

>>
>>+The be more efficient, we simply need to avoid unnecessary steps. The 
>>+following steps are always required: 
>>+ 
>>+1.) read old data before it gets overwritten 
>>+2.) write that data into the backup archive 
>>+3.) write new data (VM write) 
>>+ 
>>+As you can see, this involves only one read, an two writes. 
>>+ 
>>+To make that work, our backup archive need to be able to store image 
>>+data 'out of order'. It is important to notice that this will not work 
>>+with traditional archive formats like tar. 
>>+ 
>>+During backup we simply intercept writes, then read existing data and 
>>+store that directly into the archive. After that we can continue the 
>>+write. 
>>+ 

So this is some kind of mirroring ? is this very different from new qemu 1.3 live disk mirroring ?
Any impact with high write vm load ? (backup never finish because of too many writes ?)

One disavantage of this:

what happen if you backup storage is slow (or slower than your vm storage ?)
Can't it impact write speed during the backup? (with snapshot, we can backup slowly, new writes are only going to vm storage).



I'm beginning to read the C code. (damn, Dietmar, you are good ;)

----- Mail original ----- 

De: "Dietmar Maurer" <dietmar at proxmox.com> 
À: pve-devel at pve.proxmox.com 
Envoyé: Mardi 13 Novembre 2012 14:07:09 
Objet: [pve-devel] [PATCH 1/4] RFC: Efficient VM backup for qemu 


Signed-off-by: Dietmar Maurer <dietmar at proxmox.com> 
--- 
docs/backup-rfc.txt | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++ 
1 files changed, 96 insertions(+), 0 deletions(-) 
create mode 100644 docs/backup-rfc.txt

diff --git a/docs/backup-rfc.txt b/docs/backup-rfc.txt 
new file mode 100644 
index 0000000..8a32c38 
--- /dev/null 
+++ b/docs/backup-rfc.txt 
@@ -0,0 +1,96 @@ 
+RFC: Efficient VM backup for qemu 
+ 
+=Requirements= 
+ 
+* Backup to a single archive file 
+* Backup needs to contain all data to restore VM (full backup) 
+* Do not depend on storage type or image format 
+* Avoid use of temporary storage 
+* store sparse images efficiently 
+ 
+=Introduction= 
+ 
+Most VM backup solutions use some kind of snapshot to get a consistent 
+VM view at a specific point in time. For example, we previosly used 
+LVM to create a snapshot of all used VM images, which are then copied 
+into a tar file. 
+ 
+That basically means that any data written during backup involve 
+considerable overhead. For LVM we get the following steps: 
+ 
+1.) read original data (VM write) 
+2.) write original data into snapshot (VM write) 
+3.) write new data (VM write) 
+4.) read data from snapshot (backup) 
+5.) write data from snapshot into tar file (backup) 
+ 
+Another approch to backup VM images is to create a new qcow2 image 
+which use the old image as base. During backup, writes are redirected 
+to the new image, so the old image represents a 'snapshot'. After 
+backup, data need to be copied back from new image into the old 
+one (commit). So a simple write during backup triggers the following 
+steps: 
+ 
+1.) write new data to new image (VM write) 
+2.) read data from old image (backup) 
+3.) write data from old image into tar file (backup) 
+ 
+4.) read data from new image (commit) 
+5.) write data to old image (commit) 
+ 
+This is in fact the same overhead as before. Other tools like qemu 
+livebackup produces similar overhead (2 reads, 3 writes). 
+ 
+=Make it more efficient= 
+ 
+The be more efficient, we simply need to avoid unnecessary steps. The 
+following steps are always required: 
+ 
+1.) read old data before it gets overwritten 
+2.) write that data into the backup archive 
+3.) write new data (VM write) 
+ 
+As you can see, this involves only one read, an two writes. 
+ 
+To make that work, our backup archive need to be able to store image 
+data 'out of order'. It is important to notice that this will not work 
+with traditional archive formats like tar. 
+ 
+During backup we simply intercept writes, then read existing data and 
+store that directly into the archive. After that we can continue the 
+write. 
+ 
+==Advantages== 
+ 
+* very good performance (1 read, 2 writes) 
+* works on any storage type and image format. 
+* avoid usage of temporary storage 
+* we can define a new and simple archive format, which is able to 
+ store sparse files efficiently. 
+ 
+Note: Storing sparse files is a mess with existing archive 
+formats. For example, tar requires information about holes at the 
+beginning of the archive. 
+ 
+==Disadvantages== 
+ 
+* we need to define a new archive format 
+ 
+Note: Most existing archive formats are optimized to store small files 
+including file attributes. We simply do not need that for VM archives. 
+ 
+* archive contains data 'out of order' 
+ 
+If you want to access image data in sequential order, you need to 
+re-order archive data. It would be possible to to that on the fly, 
+using temporary files. 
+ 
+Fortunaltey, a normal restore/extract works perfectly with 'out of 
+order' data, because the target files are seekable. 
+ 
+=Archive format requirements= 
+ 
+The basic requirement for such new format is that we can store image 
+date 'out of order'. It is also very likely that we have less than 256 
+drives/images per VM. We also want to be able to store VM 
+configuration files. 
-- 
1.7.2.5 

_______________________________________________ 
pve-devel mailing list 
pve-devel at pve.proxmox.com 
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel