There is a new KB article 1018029 made by my team colleague Ruben, explaining how to configure vCenter Server to send alarms when a VM is running from snapshots. Very handy. There is also a video on the KB that I'm including here.
  
 
So, my first post in 2010 is just to announce that VMware is going acquire Zimbra.

VCP Exam

| No Comments | No TrackBacks
Things around here are quiet but I've been very busy in the last week. Lot's of work to do and when I was chez-mois I've spent my precious time studying for the VCP Exam. Well, I've passed, so I'm now a VCP4 guy. I've found this exam a little bit more difficult comparing to VCP 3.5. If you're applying to this exam, I recommend to first have a look at the official documentation, even if you're a TestKing fan. Don't forget to verify the new maximums, because a lot of things changed with vSphere4. Cheers.
vSphere came with a new feature called CBT, Changed Block Tracking. If you're using a backup software like VMware Data Recovery or Veeam Backup, you'll notice a couple of disk-ctk.vmdk files lying on the VM directory. These files are used to keep track of block changes for each virtual disk since a point in time, improving replication times and incremental backups. There is a great article written by Eric Sibert with more information about CBT, how it works and how you can enable it.
NFS performance issues on ESX can be caused by many factors. I'm just starting with the most basic stuff for now. Basic but pretty common stuff. Network or hardware problems. In this case, we're noticing a performance degradation for one specific host for the last hours or days, VMs are taking several minutes to power on, general slowness and at a last level, the ESX host hanging and disconnecting from VirtualCenter. Let's open a ssh connection to the ESX host to get access to the service console. First, try to ping the ip address of the NFS NAS with:

# vmkping -c 20 <ip_address_of_nfs_filer>

If you don't know the address of your NAS, you can get it by issuing:

# esxcfg-nas -l

It will give you a list of available volumes and NAS ip addresses.

Check for any discrepancy in the response time values or any packet loss. If you have packet loss, there is a problem with your network (bad hardware: switch, ethernet card, NAS controller; logical: broadcast storm, bad configuration, NAS performance, etc). Even if you don't have packet loss, NFS performance degradation can be caused by a wrong configuration or change on the Ethernet Switch ports (the ones you have your ESX hosts connected). Watch for Gigabit cards configured as FastEthernet, Duplex mismatches (Half-Duplex means trouble), fixed speeds vs auto-negotiation, etc. Most of the times, you should be using a vSwitch for dedicated NFS traffic. If you have 2 nics linked on the vSwitch, a faulty one (100Mbit speed, instead of 1000Mbit speed) will impact your performance. You may think that using 2 nics on a vSwitch (one in Standby) might give you protection. And it will, if something bad happens at the other nic, like a port down. But a change in speed will not affect the failover. If you're using Load Balancing, you'll also be affected, because you'll have one nic with half of the performance. To know which nics are being used for NFS traffic, run:

# esxcfg-vswitch -l

You'll have a list for each vSwitch, associated vmkernel ports and nics used (Uplinks). You can check the nics status with:

# esxcfg-nics -l

Watch for trouble on the values for Link (Up/Down), Speed and Duplex. Have fun.
VDR (VMware Data Recovery) is a new backup software from VMware that aims to simplify how you backup your VMs. It's fully integrated with VirtualCenter (you have a VDR plugin) and you also have to install the VDR VM appliance (based on Linux CentOS) that will access your vmdk files (as raw disks) so you can backup them to a local mount point, nfs or cifs share. It can backup at file-level and image-level with full or incremental (talk about deduplication) mode.

Not so simple are some common error messages on VDR. I've stumbled on this one last week:

Failed to create snapshot for <servername>, error 3948 (vcb api exception)

The error will be the same, regardless of which VMs are being backed up. The VDR Plugin will give you this error but you have to check the VDR logs on the appliance VM. Those logs are located at
/var/vmware/datarecovery. Look for a file called vcbAPI.log or a similar one with a number, like vcbAPI-x.log. Match the same time you had the issue and you'll find something similar to this:

[2009-11-11 05:25:59.633 'blocklist' 2960120720 info] Creating snapshot
[2009-11-11 05:26:26.738 'blocklist' 2960120720 info] Snapshot created, ID: snapshot-1192
[2009-11-11 05:26:26.784 'vcbAPI' 2960120720 info] Establishing NFC connection to host esx1 on port 902, service vpxa-nfc
[2009-11-11 05:26:26.785 'vcbAPI' 2960120720 error] Exception in  VCBSnapshot_NumberOfFiles: Could not connect to ESX esx1: Host address lookup for server esx1 failed: Host name lookup failure


As we can see, it's a DNS issue. The VM appliance can not resolve the hostname of the ESX host that is running the VM and it will not connect, giving us this error. Check the dns settings of the appliance, point it to the same nameservers used by your VMware Infrastructure and the issue should be fixed. Happy backups.
And finally, vSphere4.0 U1 is out. I don't have the time to check what's new but here are the links for the Release Notes:

ESX 4.0 Update 1
vCenter Server 4.0 Update 1

Don't do it. Really. What happens when you decide to resize a virtual disk (vmdk) that has several snapshots associated? For each virtual disk, we have 2 vmdk files. One is the descriptor file, with metadata/geometry information on the disk: lsilogic/buslogic, size, thick/thin, geometry, CID, etc. This file is also pointing to the real disk, the flat.vmdk file. Usually, the vmdk files are named VMname.vmdk|VMname-flat.vmdk or VMname_1.vmdk|VMname_1-flat.vmdk for the second disk and so on. Snapshot files are the same thing. The only difference is that the descriptor file has a number associated, VMname-00001.vmdk that is pointing to the real disk, VMname-00001-delta.vmdk. There is also a parent reference for the CID of the base disk (parentCID), the VMname.vmdk.

So, when we resize the disk through the VI Client, the VMname-flat.vmdk (base disk) will grow to the new size and the VMname.vmdk will be modified with new RW value (disk size).  The problem is that you still have a snapshot pointing to the base disk, but with the old size on the RW value. Trying to power on the VM will fail with the error:

The parent virtual disk has been modified since the child was created

The error says it all. When a disk is running on snapshots, the .vmx file is pointing to the delta.vmdk that is recording changes and pointing back to the base disk. Since the geometry it's not the same on both sides, there is no way of powering ON that VM. So, what is the fix? As described on KB 1646892, you can edit the VMname.vmdk descriptor file and change the RW value to the old one (the same that is on the snapshot descriptor file). You then can get rid of the snapshots or clone the disk.

But there is a catch, I had a case where the disk was mounted as a secondary disk on a Windows VM, in order to resize the partition (growing it to the new available space). This way the system disk could be resized. By changing the RW value back to the original one would cause Windows to BSOD when powering ON the VM. Windows already knew about the new size of the disk, but the hardware was presenting a disk with the old size. The only solution was to point directly to the base disk (bypassing the snapshot file and loosing all the data since the date of creation of the snapshot).

So, be very careful and check first if you have any snapshots on a specific disk you want to resize. Delete the snapshots first and resize later.
vcb-design.png
Last week I had a case related to using VCB for backing up a VM participating on a MSCS Cluster. A customer wanted to selectively backup just the system disk of the VM, letting the other disks untouched. When using vcbMounter command, it would return an error message saying:

Virtual Machine is configured to use a device that prevents the snapshot operation: Device "is a SCSI Controller engaged in bus-sharing"

The cluster was configured across boxes (VMs on two different ESX Hosts) and RDM disks were in use on physical compatibility mode. Physical RDM's, as disks on independent/persistent mode can't be snapshotted. We all know that, and that's why we reconfigure some disks to be independent/persistent, so they can be bypassed by VCB and only the system disk will be snapshotted and backed up. But when using MSCS, a disk has to be accessed by 2 VMs and you need to have a SCSI Controller with Bus Sharing enabled (Physical or Virtual). That's why snapshots are not supported on MSCS (right-clicking on the VM through VC you should notice that the Snapshot option is greyed out, KB 1006392). And that's easy to understand. When you snapshot a VM, we change the VM configuration (vmx) to point to a new COW disk (delta vmdk file). The other VM in the cluster doesn't know about that and would still be pointing to the original disk (flat vmdk file). Major pain.

You can selectively backup a VM disk using vcbVmName, vcbSnapshot and vcbExport, but when you use vcbSnapshot, you must be able to snapshot all the disks on the VM before choosing the vmdk you want to export on the VCB Proxy Server.

So, what's the choice? You still can backup the VM over network, using an agent inside the GuestOS (with your backup software of choice).

It begins

| No Comments | No TrackBacks
For start, let me say something about myself. My name is Frederico Marques, tech guy with a passion for life, my family, good food and portuguese wine. I've worked in the Telco/ISP Industry in Portugal for the last 9 years, including companies like FCCN, Comnexo, Oni and Novis (Optimus/Sonaecom). Later worked as independent consultant and ended joining serviSMART. Systems Administrator, Unix/Linux Engineer, Datacenter Geek, Security Consultant, Wireless Entrepreneur, I've done it all. My current focus is storage, but I can't fit on any particular area of interest. I'm interested in virtualization and infrastructure consolidation, cluster filesystems, backup and storage, scaling the web, ip networks, internet systems and datacenter/disaster recovery planning.

Since June this year I'm working as a TSE (Technical Support Engineer) for VMware in Cork, Ireland. I'm a VCP VI3 and going to apply for VCP4 next month. I have a need to document stuff that I learn and work everyday, so why don't share it on this blog instead of using a private notepad stored on some shaddy folder? Send your SPAM to frederico@marques.cx and I can be found on Linkedin, Twitter and my personal blog.

Disclaimer: The views expressed herein are my own and do not necessarily reflect the views of VMware.

Find recent content on the main index or look in the archives to find all content.

Categories

Pages

Disclaimer

The views expressed herein are my own and do not necessarily reflect the views of VMware. This is not a blog from VMware.
Creative Commons License
This blog is licensed under a Creative Commons License.