This week I had several conversations colleagues and suppliers regarding the quiescing options of vSphere based VMs for backup and replication purposes. During these talks I noticed that is it more complex then most would think (including myself). This gave me the idea to write a blog post about the matter, because I think I’m not the only one.
This post describes the background, dependencies and options for quiesced (consistent) VM backups and replications. I will not go into the on-demand snapshot creation option in the vSphere Client.
In this post, I used some Veeam screenshots because of their excellent documentation and my familiarity with the product. Other vendor solutions can also be used and work in similar way.
So what is quiescing? To be on the same page here, quiescing is the process to prepare a VM for back-up or replication in a consistent way. The VMware KB 1015180 article “Understanding VM snapshots in ESXi” describes it as: “Quiescing a file system is a process of bringing the on-disk data of a physical or virtual computer into a state suitable for backups. This process might include such operations as flushing dirty buffers from the operating system’s in-memory cache to disk, or other higher-level application-specific task“.
Now we know that quiescing essentially provides consistency for vSphere VM’s backups and replication, it’s essential to know that it can be categorized in 3 types of consistencies, amongst 2 categories
- Crash consistent
- File-system consistent
- Application consistent (with or without log pruning)
Crash consistent is the most basic and fastest way. It guarantees that all data that already has been written to the vDisk will be part of the backup or replication. This mode works regardless of the OS type, use of VMware Tools and use of snapshots.
Crash consistency is basically the state when you would unplug of power-off a host or VM.
This quiescing method is often used due to it’s broad OS support. The method prevents file-system (FS) corruption by writing pending data and FS changes to disk prior to creating the snapshot, which is even more important when using journaling file-systems.
FS quiescing requires the a compatible OS and installed VMware Tools.
Application consistency builds upon FS quiescing and is mostly used for applications running on Windows OS’es that use VSS. This ensures that applications have written their pending changes to disk prior and perform tasks to bring to app in a consistent state. For example Oracle writes checkpoints to it’s important files to have uniform SCNs. In general in-memory data and pending I/O is flushed to disk in the correct transactional order.
For a Windows VM the quiescence process is controlled by the application’s VSS writer which performs the needed actions before the quiesced snapshot is taken. Check your backup / replication vendor compatible application list before using the feature. The interaction between VSS and the backup or replication software that starts the quiescing process uses the Guest OS operations feature of the VMware Tools (former VIX API) which is part of the vSphere Web Services SDK that is built into vCenter.
Some backup vendor implement it differently and try to reach the VM over the network (L2 / L3) first, before using the Guest OS operations features. In a service provider situation, probably the latter would be the way to go.
In the VIX documenting the purpose was described as:
The VIX API helps you write scripts to automate virtual machine operations and manipulate files within guest operating systems. VIX programs run on different systems and support management of vSphere, Workstation, Player, and Fusion. Bindings are provided for C, Perl, and COM (Visual Basic, VBscript, C#).
For Linux and non-VSS compatible Windows applications, quiescing is performed in a different way. In this case, scripts (freeze and thaw) have to be put in a certain location and need to be configured in the backup or replication application. These scripts perform the required quiescing operations in the application.
App consistent mode requires a compatible OS, installed VMware Tools and a compatible (mostly VSS based) application.
Quiescing impact on app recovery
Application consistent (VSS based) VM backups or replications can happen in 2 ways. The first is described in the previous section. In that case the application is in a consistent state before it is backed-up or replicated. Nothing more and nothing less.
This way the application still needs to use some form of built-in or external tooling to backup the the archivelogs (Oracle) or transaction logs (SQL Server) to be able to perform Point-In-Time (PIT) restores and to truncated these logs to prevent the disks from filling up by ever growing log files.
If your backup / replication vendor software supports it (and most do), it can use a set of credentials to log into the application. Using the Guest OS operations feature it gives you a couple of extra options.
I would advice to always talk this through with the application owner, because it impacts the way it is backed up and the way recovery will be performed. In some cases temporary backup files can be stored on the VM’s C: drives, which could impact the availability of the application.
In this example, the GUI options for Veeam B&R are shown. It works quite similar in others.
Truncate logs enabled
Use the “Truncate logs” option when (in this example a SQL database) the database is in bulk or full logging mode and additional DB backup tools are not used. The result is that the DB logs are truncated, which prevent disks from filling up. As a result the DB can only be restored up to the time of backup. No PIT restore is possible.
Truncate log disabled
Use the “Do not truncate logs” option when database is in simple logging mode or if you use other tools to perform this task when the databases are in bulk-logged of full logging mode. For example SQL has solid built in tools to performs it’s own backups. This way you can use PIT restores even up to a specific second (requires full logging).
I will not describe all possible options, since that’s outside of the scope of this post, but probably you get the point by now.
Quiescing impact on performance
In the start of the post, I described the quiescing methods available, but did not go into the impact on VM performance. This is important because there is a big difference if quiescing is based on snapshots or I/O filters.
Depending on the VM’s I/O load, and quiescing method used, performance degradation may occur when using snapshots for backup or replication purposes. This impact is most noticeable during the snapshot deletion phase, when all changes in the snashot files need to be written back to the base vDisks.
When you have compatible gear and backup / replication software, the performance impact can be lowered significantly when using SAN integrations. In this case the VM snapshot will be deleted directly after the SAN snapshot is taken, which reduced the snapshot delta, whereas the snasphot normally would be deleted if all VM data is processed.
Snapshot based backup and replication both use the vSphere Storage API’s (previously knows as VADP). Which are use for centralized, efficient, off-host LAN free backup of VM’s.
I/O Filter based
I/O filter based quiescing is available by using the vSphere API for I/O filtering (VAIO) framework, which was introduced with vSphere 6.0 U1 back in 2015. Another option is to to use the vSCSI filter, which is used by vSphere replication and Cloud Director Availability. Both are designed for low impacting replication (and caching in the case of VAIO).
Both the VAIO and vSCSI I/O filters are used for replication, not backups. When used for non-quiesced (crash consistent) replications, there is no peformance impact, since snapshots are not used.
If used with quiesced replication, some performance impact could be noticed because FS or App quiescing will take place inside the VM. I the case of the vSCSI filter a snapshot is briefly used.
Both solutions “captures” changes in the VM’s vDisk since the last replication and send them within the timeframe of the defined RPO to the remote site, as can be seen in the layered approach below.
Also check out the post on this topic on vmarena.com
VMware Tools come in different forms and are an essential component for quiesced backup and replication. Tools are available in a couple of packaging formats. Depending on the chosen packaging format, it can have a major impact on day 2 operations. Therefore the options are mentioned in this post, which hopefully helps in choosing the right format for your environment.
An even important reason to install Tools is that the guest OS disk timeout is raised, which is important to prevent OS and application issues due to latency spikes in the case of storage incidents or maintenance.
The Tools version used inside the VM is not really depending on the version of the hypervisor used, since they are version flexible (forwards and backwards). Good practice is to keep them up-to-date.
Beware that frequent usage of the Tools ISO can corrupt the SD-Cards or USB Sticks if used as ESXi boot device. An advanced parameter can be set to load the Tools into RAM and prevent excessive wear on those.
Did you know by the way, that Tools are decoupled from ESXi and now a separate product since November 2020? Tools now have it’s own lifecycle. Check out this post.
Let’s discuss the package and distribution options.
For Windows, using Tools is very common. The version used in the VM is matched to the version provided by the hypervisor. If an older version is used in the VM, a warning is shown in the system tray icon and in vCenter. The Tools version within ESXi is normally updated when the hypervisor is patched. For quiescing and guest interaction to work, install the Tools and keep them updated.
The Tools can be mounted and installed from the vSphere Client, but can also be downloaded for VMware and be distributed separately by using your patch / update tool of choice.
The default way to upgrade tools on a Windows VM, installs and automatically reboots the VM. If a reboot needs to be postponed until a maintenance windows, it can be upgraded automatically without rebooting. Even the Tools components to be installed can be changed.
For Linux based systems, the use of Tools could give you a headache in the past, because of all the package formats. If possible do not use the Linux (ISO) tar based tools anymore. This will make day-2 operations so much more difficult and time consuming.
A second reason not to use the Linux tar based Tools and OSP packages anymore are the fact they are feature frozen as of v10.3.5 (November 2018) and only receive critical and security fixes.
OSP packages are Tools for selected (legacy) guest operating systems and are packaged for online distribution and updating using their native tools (yum, zypper and apt) and in package formats.
OSP packages can be used with:
- CentOS 4.0 through 6.x
- Red Hat Enterprise Linux (RHEL) 3.0 through 6.x
- SUSE Linux Enterprise Server (SLES) 9 through 11
- SUSE Linux Enterprise Desktop (SLED) 10 through 11
- Ubuntu Linux 8.04 through 12.04
- RHEL 7, SLES 12, Ubuntu 14.04, CentOS 7, Debian 7, Oracle Enterprise Linux (OEL) 7, Fedora 19 and OpenSuSE 11 and later
Just like the Windows version of the Tools they are required for quiescing and guest interaction. The biggest advantage is that open-vm-tools can be installed and updated using the Linux distro’s default repositories.
When looking at the Compatibility Matrix for vSphere Replication, only Windows (App and FS) and RHEL 7, SLES 11 and 12 (FS only) are supported for quiescing. During my testing Debian FS quiescing did not even work, while it did for a Ubuntu VM. My advise is to check compatibility before depending on it.
Impact for Service Providers
For service providers, it can be quite difficult to support quiesced backups and replications, since they often do not have control over, or even can access customer VM’s. This way Tools may be outdated or not even be installed. Also applications running on customer VM’s could not be supported or be outdated.
If the above would be mitigated, there still can be issues with unsupported OS’es and application / VSS (compatibility) issues, which can be hard to troubleshoot.
While this is one of the more time consuming posts I wrote, it gave me good insights in what the differences are between the quiscing and consistency options available in vSphere and related backup / replication software. Especially on how they relate to the OS and apps they protect.
From my opinion knowing these differences are essential in advising and designing an (app) consistent solutions for your organization or customers.
If this post helped you understanding the matter or have feedback, reach out.