Nigel Boulton's Blog
30Jun/150

The Circle of Doom with vRA using an External vRO Server

I recently had reason to reconfigure a VMware vRealize Automation (vRA) 6.2.2 appliance that was using the default (i.e. inbuilt) vRealize Orchestrator (vRO) 6 instance to use an external one instead, and came across an interesting problem that I felt was worth blogging.

First, I reconfigured vRA to use the external Orchestrator server under Administration – Orchestrator Configuration – Server Configuration as shown below, and tested connectivity successfully:

SuccessConnectTovROServer

The mysterious symptom I then observed was the busy cursor (commonly known as 'The Circle of Doom') spinning constantly and indefinitely when trying to add an advanced services endpoint in vRA. As you may be aware, changing to an external vRO server or vice-versa deletes any existing advanced services endpoints, so adding them again is a necessary step.

When adding the endpoint, selecting the plug-in was fine:

AddADPlug-in

But on moving to the details tab, the dreaded circle appeared:

BusyCursor

I also noticed that, when attempting to create a new Service Blueprint, there was no vRO workflow tree displayed in the vRA UI:

NoWorkflowTree

The problem transpired to be one of authorisation - the account I had specified for authentication under server configuration didn't actually have rights to log on to the vRO server. As I had tested connectivity and received a success message in the first step, I made the assumption that all was OK, but the test clearly can't be treated as an indication that everything is fine between the vRA and the vRO server.

After rectifying the underlying problem, I was then able to add advanced services endpoints, and the vRO workflow tree was populated correctly:

WorkflowTree

I hope this information is helpful to someone who may be troubleshooting a similar problem.

Tagged as: , No Comments
24Apr/152

Disabling Password Expiration for the vCenter 6 Appliance Root Password

I am working on a vSphere 6 design for a customer, and they have requested that password expiration for the vCenter Appliance root password be disabled. By default this password expires after 365 days, so it could be easy to forget to change it before it does (it is possible to configure the appliance to notify you by email of impending password expiry, but for that to work you must remember to specify an email address for the root account first).

The vSphere 6 documentation details the command that you need to use to disable password expiration for the root user, but there are some prerequisite steps that aren't covered in the same place, so I though I'd document the whole process here.

  1. First, decide whether you want to access the vCenter Server Appliance (vCSA) using SSH, or its Direct Console User Interface (DCUI). If you want to use SSH, you will need to enable SSH access. If this wasn't enabled in the deployment wizard when the appliance was deployed, use the vCSA DCUI console to enable it (F2 – Customize System - Troubleshooting Mode Options) or the Web Client under Home - System Configuration - Nodes – (select node) - Manage - Settings – Access
  2. Use your favourite SSH client to connect to the vCSA and log on as root, or enable the shell using Alt-F1 at the vCSA DCUI console and do the same
  3. Enable bash shell access. To do this, issue the "shell.set --enabled True" command to the appliance shell. Note that the default timeout for this enablement is 1 hour, after that time the bash shell will be automatically disabled. You can use the "shell.get" command at the appliance shell Command> prompt to show the time remaining (you can also check it in the Web Client). As an alternative, you can enable the bash shell under Troubleshooting Mode Options in the DCUI if you prefer
  4. Start a bash shell using the "shell" command
  5. Run "chage -l root". This will display the current settings for the root user:

    image2

  6. Run "chage -M -1 root" to disable password expiration (i.e. set 'Maximum' to -1) for root
  7. Optionally, run "chage -l root" again to verify the change
    image5
  8. "exit" from the bash shell
  9. Finally, "exit" from the appliance shell

That's it!

28Nov/140

UK VMware User Group (VMUG) November 2014

I was lucky enough to be able to attend the fourth annual UK VMware User Group (VMUG) meeting at the National Motorcycle Museum in Birmingham last week. I have for quite some time been a regular attendee at the London VMUGs, but have only been to one UK meeting before - that was in fact the first one ever, so one of the first things I noticed was how much the event has grown over the last three years.

VMUGs are a great opportunity to meet like-minded professionals, customers and vendors, and most importantly gain and share information about VMware and partner products and solutions and participate in the fantastic community that surrounds VMware virtualisation worldwide.

Unfortunately I wasn’t able to attend the ‘vCurry’ event and quiz the previous evening but the day started with an introduction from the always entertaining Alaric Davies, which included an update on the previous evening’s festivities, so I was soon up to speed with those!

Helpfully, a number of the sessions were recorded on video and the VMUG Committee have kindly uploaded these recordings to YouTube - the playlist can be found here. Slides from many of the sessions and links to the individual videos on YouTube have also been uploaded to the London VMUG workspace on Box.com here.

Joe Baguley, CTO of VMware EMEA, gave the opening keynote, which was entertaining, as Joe’s sessions always are. Joe titled this session ‘CTO Rant-as-a-Service’. It wasn’t so much of a rant, but it was a great view of what's going on in the industry from VMware’s point of view. One of the key themes was the significant decrease that will be seen in the time between the traditional IT refresh cycles going forward, and how the whole Software Defined Enterprise / SDDC concept supports that. He also talked about some of the exciting announcement that VMware made at VMworld, such as EVO:RAIL and EVO:RACK. I noted that Joe also commented that the VMUG is the best community event that he has involvement with, which reinforces my views on the significance and importance of VMUGs within the ecosystem. The video recording of this session can be found here.

The next session I attended was Julian Wood’s ‘The Unofficial Low Down on Everything Announced at VMworld’. I wasn’t able to attend VMworld this year, so I thought this would be a great opportunity for me to get an overview of pretty much all the new products and improvements that were announced at VMworld. I was right - Julian put together and presented an excellent, information-packed session, and to be honest I was struggling to make coherent notes without missing anything as the information was coming thick and fast, but fortunately Julian's comprehensive slides (each of which includes supporting links) are available here, and the video recording here.

The next session I selected was ‘What's Coming for vSphere in Future Releases’ presented by VMware’s Chief Technologist Duncan Epping. Duncan expanded on a number of the products that Joe had mentioned in his keynote and also detailed some exciting improvements to existing products and features we all know and love! You can find the slides for this session here and watch the video here.

The first session I attended after lunch was presented by my good friend (and colleague again) Jonathan Medd, and was entitled ‘Designing Real-Word vCO Workflows for vRealize Automation Center (vCAC)’. This session was one of the reasons I really wanted to attend the UK VMUG this year – Jonathan is an expert in his field whose sessions always draw a good crowd. I am lucky enough to have worked with him personally for a number of years on and off, and every time I speak to him I learn something new, so I was expecting a good session. I am going to be involved in vCAC and vCO at Xtravirt because of my interest and skills in scripting and automation, so I was quite excited about hearing tips from someone who has most definitely ‘been there and done that’. And I wasn’t disappointed..!

The space available in the mezzanine section for this session was overcommitted by at least 100%, and people were crowding round the table two rows deep in places, which to my mind demonstrates the community interest in automation based around vCAC and vCO. Jonathan ran this as an interactive session and got everybody to think about important aspects of designing an automation process that need to be considered at an early stage, and we discussed within the group the pros and cons of many of the possible approaches. As someone who is just getting up to speed with these products, I found it (as I expected to) an incredibly interesting and informative session – thanks Jonathan!

The next session I selected was ‘vSphere Availability Updates and Tech Preview’ by Lee Dilworth, Principal Systems Engineer at VMware. This was a great opportunity to brush up on the significant number of improvements that VMware have made, and continue to make, in this area. The slides for this session have been uploaded here and you can watch the video here.

After this, I went along to a partner session, ‘Re-thinking Storage by Virtualizing Flash and RAM’ by Frank Denneman, who is Chief Evangelist at Pernix Data. Pernix Data are doing some exciting things with their ‘FVP Cluster’ technology which allows any VM to remotely access flash RAM on any other vSphere host, enabling fault tolerant storage write acceleration, with pretty impressive results. FVP supports all VM operations with no impact on performance, so features such as vMotion, DRS, HA, snapshots, VDP and SRM continue to operate transparently. Nice!

The final session of the day was a hugely entertaining closing keynote by Chris Wahl, a double VCDX, prolific blogger, author and vExpert from Chicago who describes himself as a ‘Virtualization Whisperer’! This session was entitled ‘Stop Being a Minesweeper’, and in it Chris talked us through his journey into automation and included a number of good resources to help people begin the learning process - these are in the slides here. You can watch the video of the session here - I highly recommend that you do!

So all in all, a great day, and thanks go to the London & UK VMUG Committee who once again did a fantastic job of organising the event – primarily Jane Rimmer, Alaric Davies, Simon Gallagher and Stuart Thompson, but of course also the wider VMUG organisation.

Jane has written a great review of the event on her blog here. The Committee are running a competition for new community speakers, known as ‘V-Factor’! Entrants will have the opportunity to give a 10-minute lightning talk at the London VMUG on 22 Jan 2015 and could win one of a number of great prizes. You can find more information here if you are interested in entering.

The next London VMUG is 22 Jan 2015 and the next UK event is provisionally scheduled for 19 November 2015. Both will be great events, so put the dates in your diary now!

Filed under: VMware No Comments
9Nov/130

VMworld PowerCLI Group Discussion

Whilst at VMworld in Barcelona last month, my old friend Alan Renouf asked me to help him out with one of his PowerCLI and Automation sessions. For those who don't know him, Alan is the Automation Frameworks Product Manager at VMware, a total PowerCLI guru and a co-author of the definitive PowerCLI book 'VMware vSphere PowerCLI Reference: Automating vSphere Administration'.

The session was a lively group discussion and the audience comprised of a great mix of people ranging from PowerCLI beginners right through to experts. It was my job as Alan's 'beautiful assistant' [don't know about that!] to capture the useful information flying around the room on a flip chart. With impressive use of his deciphering skills, Alan has written up the result on his personal blog in a series of four excellent posts. There is some good stuff in there for PowerCLI scripters at all stages in the learning process. You can find these posts via the links below – thanks Al!

VMworld PowerCLI Group Discussion–Part 1–Getting Started

VMworld PowerCLI Group Discussion–Part 2–Resources

VMworld PowerCLI Group Discussion–Part 3–Launching and Using

VMworld PowerCLI Group Discussion–Part 4–Advanced tools and scripting

Filed under: PowerCLI, VMware No Comments
30May/125

Configuring Syslog for all your vSphere Hosts using PowerCLI

If you have a need to configure remote Syslog logging for all (or perhaps a subset) of your vSphere hosts at the same time, PowerCLI can help!

Let's assume that your remote Syslog server has the IP address 192.168.1.10. Using the following PowerCLI one-liner, you can configure all hosts managed by a particular vCenter to send Syslog data to this server:

Get-VMHost | Set-VMHostSysLogServer -SysLogServer 192.168.1.10 -SysLogServerPort 514


To configure only the hosts in a particular cluster to do this, you would use:

Get-Cluster 'Cluster Name' | Get-VMHost | Set-VMHostSysLogServer -SysLogServer 192.168.1.10 -SysLogServerPort 514


The above assumes that you are using the vSphere PowerCLI console, and have already connected to the appropriate vCenter server as so:

Connect-VIServer -Server vcenter.domain.com

Easy! Automation really is great…

19Oct/110

Virtual Machine search in vSphere Client does not return expected results

I was recently responsible for troubleshooting a problem where searching for certain virtual machines in the vSphere Client didn't always return the expected results. The problem was occurring in both our production and non-production environments and the symptom was as described below:

In the vSphere Client, in Hosts and Clusters view, selecting (for example) a Datacenter in the tree in the left pane, then typing the name of an existing VM into the "Name, State or Guest OS contains:" box on the Virtual Machines tab wouldn't always return the VM in the search results. In some cases the VM search would behave in this way when targeted at at the Datacenter level, in others at the vCenter level and in others still, at the cluster level. The "Search Inventory" box exhibited the same behaviour. It was, however, possible to target the search at the host on which the VM resided and have it returned consistently in the search results. Similar behaviour occurred when searching for VMs in VMs and Templates view, and in all cases the VM in question continued to be displayed in the tree in the left pane of the vSphere Client.

After some searching online, I decided to raise a call with VMware Support. The Engineer who called me back immediately knew the cause of the problem, and directed me to a VMware Knowledge Base article:

Sort sequence is incorrect and sorting/scrolling in the Virtual Machines tab in vCenter Server is slow (1029665)

Neither of these observations were the case in our environments, but the underlying cause was the same – it was, as the article says, "due to a conservative Java Memory Pool setting on which the Tomcat service depends for various functions. This issue usually occurs when the number of virtual machines is more than 500, but is dependent on a number of factors in your environment." In our environments at the time we had 850 and 300 VMs respectively.

If you have a 64-bit vCenter (4.x) Server, increasing the value of the memory pool in the Java Memory Pool settings is an easy fix:

  1. On the vCenter Server, click Start > All Programs > VMware > VMware Tomcat > Configure Tomcat
  2. Click the Java tab
  3. Double the number in the Initial and Maximum memory pool field (defaults are 256 and 1024 MB respectively)
  4. Click OK
  5. Verify that there are no tasks running in the environment
  6. Restart the VirtualCenter Server service – this will also restart the VirtualCenter Management Webservices service, as the latter is dependent. Bear in mind that anybody running the vSphere Client will be logged off when the services restart

If you have a 32-bit vCenter Server, follow the instructions under "Additional Information" in the article.

30Jun/111

Black Console and 100% CPU after restoring a Windows 2003 Virtual Machine

I was recently involved in a Disaster Recovery rehearsal. The idea behind this was to prove that we could recover our key systems at another site should a disaster occur. We came across an interesting issue which I thought I would blog about in case it is of help to anyone who may also encounter it in a similar situation. Let's face it, in a disaster recovery scenario, you need as few difficult issues to deal with as possible..!

This issue is only really likely to affect virtual servers (provided you are using identical hardware to recover your physical ones that is).

We were using IBM Tivoli Storage Manager (TSM) to restore C: drive and System State backups of virtual servers (taken in the live environment) into a separate isolated network.

The process involved creating a new VM (typically from a template), with the same virtual hardware version, number of vCPUs, amount of memory, disk layout and virtual NIC type as the live server. This VM would have the same operating system version, edition, architecture (x86/amd64) and service pack installed as the live server, plus the TSM client to facilitate the restore.

On restoring the first Windows 2003 server in this manner, the server wouldn't boot. The VM console displayed a black screen (no error message) and the VM CPU usage immediately spiked to 100% and stayed there. This happened immediately after power on, so it was not possible to get the VM to respond to the F8 key in an attempt to put it into safe mode, to assist with troubleshooting.

It really looked like a hardware incompatibility, but I'd been very careful to make sure the necessary parameters matched, so was a bit mystified. After some head scratching and time spent comparing the hardware that Windows thought was present in the template and live VM (good job it wasn't a real disaster!), I spotted that the Hardware Abstraction Layer (HAL) didn't match between the two (the HAL can be checked via Device Manager – Computer). The template VM had an ACPI Uniprocessor HAL (which I expected as it had been built with only one vCPU), but the live VM had an ACPI Multiprocessor HAL. I was pretty sure at this point that this would be the cause of the issue.

Like the template VM, the live VM also had one vCPU, so why did it have a multiprocessor HAL? The key difference between the two VMs was how they had been created. The live VM had originally been created by P2V'ing a physical server. This physical server would no doubt have had multiple processors and hence when Windows was originally installed, had been given a multiprocessor HAL. This didn't change on P2V, but the person who did this elected for the VM to only have one vCPU - quite understandably as it was a relatively lightly loaded server. So it was running a single vCPU with a multiprocessor HAL (which is clearly a valid configuration).

The problem was introduced by the restore process. I assume that some aspect of the restore didn't replace something in the template, and part of the template VM's uniprocessor HAL was still operational after the restore and reboot – or not operational in fact!

The supported/correct way of setting a multiprocessor HAL would be to install the OS from scratch on a VM with more than one vCPU. However, that would have been time consuming for the number of variations of servers that we had to restore, and the time available didn't allow for that.

So how did I rectify this? Well, a few years ago I ran into a (different) issue attempting to give a singe vCPU VM an additional processor, and in the process of troubleshooting that, came across this post on ngohq.com by Squall Leonhart. This describes how to change the Windows HAL without reinstalling. Note that this approach is, obviously, totally unsupported!

The method involved using DevCon, which is basically a command-line version of Device Manager. DevCon can be downloaded from Microsoft here.

By running the following commands within the template VM, prior to the restore, I was able to update the HAL to an ACPI Multiprocessor one:

devcon sethwid @ROOT\ACPI_HAL\0000 := +acpiapic_mp !acpiapic_up
devcon update c:\windows\inf\hal.inf acpiapic_mp

Squall recommends rebooting twice after doing this, to ensure that the device and IRQ tables get updated correctly.

After performing the steps above, a subsequent TSM restore was successful, and the server booted with no further problems. Result!

I have reproduced Squall's entire post below as this is such useful information which could be lost should the ngohq.com forum cease to exist for any reason – which would be a massive shame. It includes information on how to go to and from various HALs. Thanks for this incredibly helpful information Squall!

Heres some tips for upgraders!

You require the Devcon utility for this, unpack it to a folder, then navigate to the folder its in using Command prompt (command prompt on context menu PowerToy is handy for this)

How to enable APIC without repair installing windows
in device manager you will notice that under computer type it says Advanced Power and Control Interface PC.. this is a standard single processor HAL driver without APIC. to upgrade to the APIC driver you input the following:

devcon sethwid @ROOT\ACPI_HAL\0000 := +acpiapic_up !acpipic_up
devcon update c:\windows\inf\hal.inf acpiapic_up

after this, enable APIC in the bios if you haven't already, and reboot twice so windows can update the device and irq tables, it should now say ACPI Uniprocessor PC in the device manager

How to go back to PIC
if you wish to go back to PIC from APIC enter this:

devcon sethwid @ROOT\ACPI_HAL\0000 := +
acpipic_up !acpiapic_up
devcon update c:\windows\inf\hal.inf acpipic_up

and reboot twice to update the device and IRQ tables, and then disable APIC in the bios (the reason is, if you disable APIC before the device and irq tables update, windows will crash at startup.

How to Update from a Single Core APIC compatible cpu to a Multicore APIC compatible cpu

under the computer entry in the device manager, you will see it says ACPI Uniprocessor PC, to update to the multiprocessor HAL input this:

devcon sethwid @ROOT\ACPI_HAL\0000 := +acpiapic_mp !acpiapic_up
devcon update c:\windows\inf\hal.inf acpiapic_mp.

Then reboot twice again to update the device and IRQ tables.

How to go back to Single Core (should it be needed)
if you accidentally burn your processor and have to go back to a single core backup, you input this into the devcon:

devcon sethwid @ROOT\ACPI_HAL\0000 := +acpiapic_up !acpiapic_mp
devcon update c:\windows\inf\hal.inf acpiapic_up.

and always reboot twice.
__________________

Filed under: VMware, Windows 1 Comment
16Dec/102

Taking Snaphots of all Virtual Machines using PowerCLI

I recently needed to apply a limited distribution patch to a number of Citrix servers, all of which are virtual on VMware ESXi 4.0. I wanted to take snapshots before doing this, to give me an easy backout route if things went horribly wrong. Of course I could always have done this using the VI Client, but that would have meant an awful lot of "mousing about" and clicking to be able to do this for 176 virtual machines.

With PowerCLI this is a cinch, in fact it's pretty much a one-liner! I chose to do this one host at a time, but with a small change to the code below you can easily expand this to encompass a larger chunk, or even all, of your virtual infrastructure.

First, connect to your vCenter Server (and provide the appropriate credentials when prompted):

Connect-VIServer -Server viserver.domain.com

Then run the following one-liner to take a snapshot of all VMs on a given host:

Get-VMHost vmhost.domain.com | Get-VM | New-Snapshot -Name "Pre patch" -Quiesce

In this case I chose to quiesce the file system first. Other options are available - see the help for the New-Snapshot cmdlet.

Once you have finished with the snapshots, delete them as follows:

Get-VMHost vmhost.domain.com | Get-VM | Get-Snapshot -Name "Pre patch" | Remove-Snapshot

And finally, disconnect from your vCenter Server:

Disconnect-VIServer -Server viserver.domain.com

How easy is that..?!

My good friends Alan Renouf and Jonathan Medd talk about how useful PowerCLI is for automating repetitive tasks in Episode 20 of the Get-Scripting Podcast, and this is a perfect example of that.

On the subject of the Get-Scripting Podcast, do be sure to check out Episode 20 - the guys interview none other than Jeffrey Snover, Lead Architect for Windows Server at Microsoft, and the man behind Windows PowerShell itself - excellent!

Filed under: PowerCLI, VMware 2 Comments