This one has been a long-time coming. I’ve been using Veeam Backup & Replication and Veeam ONE since 2014 and, in anger, since v8.0. I’ve designed, deployed, and managed Veeam solutions; however, I’ve never gotten around to obtaining any Veeam certifications.
If you’ve ever met me in person, I’ve likely told you how much I love the Veeam Availability Suite and how cool it is that Veeam keeps cramming-in so many awesome features with every release. Despite so many great features, Veeam still retains the ability to keep things simple, and I’m a fan of keeping things simple.
Yesterday, Tuesday 16th October saw the much anticipated release of VMware’s vSphere 6.7 Update 1, however, shortly after the announcement a number of Veeam users decried the release due to compatibility issues with Veeam’s Backup & Replication suite. None other than Veeam’s Anton Gostev first announced the issue with the below tweet:
Looks like vSphere 6.7 Update 1 completely breaks backups, so please avoid updating until further notice. I must say I really miss those times when we didn’t even have to test vSphere updates, and literally supported them by default because they never broke anything – for years!
The very next day the Veeam team announced a workaround in the form of Veeam KB2784, as well as ‘out-of-the-box’ support being included with highly awaited (and much delayed) next release, Update 4.
vSphere 6.7 U1 compatibility issue has been researched, and the simple workaround is now available for use in test labs. Official out-of-the-box support for vSphere 6.7 U1 will be included in Update 4. See this Veeam forums topic for more details > https://t.co/BNgMNWDOmS
Where the fault lies with such release/compatibility issues is not the goal of this post (which Twitter seems to be more focused on). However, with a high number of pros likely raising internal changes to upgrade their vCenter(s) and ESXi hosts, you’ll want to implement the Veeam workaround in-line with this upgrade, as well as a number of solid backup/restore tests.
In this article we’ll cover the simple process of applying the latest Veeam Backup & Replication 9.5 update, Update 3a (released July 2nd 2018), however, I’d first like to cover what’s new in Update 3a, and why you might like to upgrade.
Update 3a brings support for a host of new VMware and Microsoft features, as well as a substantial number of enhancements. With VMware vSphere 6.5 U2 and 6.7 now well into GA, the release of Update 3a is something most of us have been craving in order to obtain that final green light to upgrade our vSphere environments. From the Veeam Release Notes for Veeam Backup & Replication 9.5 Update 3a the enhancements and newly supported features are detailed below.
To restore an Exchange mailbox/folder/item via the Veeam Explorer for Exchange, the account connecting to the Exchange server will require Full Access to the mailbox in question. To perform such a restore (without having to give your entire backup admin teams Full Access to every mailbox in your estate), we will cover the process of granting Application Impersonation to your administrative staff. This procedure applies to both on premise Exchange and Office 365 and can be easily implemented via a few simple Exchange Management Shell commands.
A reoccurring issue this one, and usually due to a failed backup. In my case, this was due to a failure of a Veeam Backup & Replication disk backup job which had, effectively, failed to remove it’s delta disks following a backup run. As a result, a number of virtual machines reported disk consolidation alerts and, due to the locked vmdks, I was unable to consolidate the snapshots or Storage vMotion the VM to a different datastore. A larger and slightly more pressing concern that arose (due to the size and amount of delta disks being held) meant the underlying datastore had blown it’s capacity, taking a number of VMs offline.
So, how do we identify a) the locked file, b) the source of the lock, and c) resolve the locked vmdks and consolidate the disks?
Disk consolidation required.
Manual attempts at consolidating snapshots fail with either DISKLOCKED errors…
…and/or ‘msg.fileio.lock’ errors.
Storage vMotion attempts fail, identifying the locked file.
Identify the Locked File
As a first step, we’ll need to check the hostd.log to try and identify what is happening during the above tasks. To do this, SSH to the ESXi host hosting the VM in question, and launch the hostd.log.
tail -f /var/log/hostd.log
While the log is being displayed, jump back to either the vSphere Client for Windows (C#) or vSphere Web Client and re-run a snapshot consolidation (Virtual Machine > Snapshot > Consolidate). Keep an eye on the hostd.log output while the snapshot consolidation task attempts to run, as any/all file lock errors will be displayed. In my instance, the file-lock error detailed in the Storage vMotion screenshot above is confirmed via the hostd.log output (below), and clearly shows the locked disk in question.
File lock errors, detailed via the hostd.log, should be fairly easy to identify, and will enable you to identify the locked vmdk.
Identify the Source of the Locked File
Next, we need to identify which ESXi host is holding the lock on the vmdk by using vmkfstools.
We are specifically interested in the ‘RO Owner’, which (in the below example) shows both the lock itself and the MAC address of the offending ESXi host (in this example, ending ‘f1:64:09’).
The MAC address shown in the above output can be used to identify the ESXi host via vSphere.
Resolve the Locked VMDKs and Consolidate the Disks
Now the host has been identified, place in Maintenance Mode and restart the Management Agent/host daemon service (hostd) via the below command.
/etc/init.d/hostd restart
Following a successful restart of the hostd service, re-run the snapshot consolidation. This should now complete without any further errors and, once complete, any underlying datastore capacity issues (such as in my case) should be cleared.
For more information, an official VMware KB is available by clicking here.