Upgrading VCSA from 6.0u3 to 6.5u1

Issues

Issue #1

I had an issue with ELM where our NA and UK vCenters were able to see each other in a webclient, but SA was orphaned, although, everything else worked. I could see tasks happening in the task pane, Netapp VSC was functional etc. After many hours of fighting and sifting through the logs, even after reverting to the snapshots and redeploy (because, you know, maybe I messed up) this issue persisted.  I decided that before I revert again (due to time constraints) I’d contact Vmware. Again, this was a further hours of enterprise senior support and eventually the below was found:

 18-03-11T12:05:33.093363+00:00 err vmdird  t@140174730172160: Bind Request Failed (ip of na vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-na.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
 18-03-11T12:05:41.621318+00:00 err vmdird  t@140174730172160: Bind Request Failed (ip of uk vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-uk.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
 18-03-11T12:05:42.058106+00:00 err vmdird  t@140174730172160: Bind Request Failed (ip of uk vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-uk.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
 18-03-11T12:06:09.119862+00:00 err vmdird  t@140174746957568: Bind Request Failed (ip of na vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-na.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

According to the engineer, this means that the machine account passwords were not being accepted. Apparently this happens when the domainjoin gets interrupted and possibly happened some time during the upgrade process. Question is, why did it happen twice?

The fix was:

On each PSC I had to run:

service-control --stop VMwareSTS

I had to log on to the orphaned PSC (in here I’ll call it SA-PSCP1) and change the password for NA-PSCP1 and UK-PSCP1 using the following commands:

root@sa-pscp1 [ ~ ]# /usr/lib/vmware-vmdir/bin/vdcadmintool


 
 ==================
 Please select:
 0. exit
 1. Test LDAP connectivity
 2. Force start replication cycle
 3. Reset account password
 4. Set log level and mask
 5. Set vmdir state
 6. Get vmdir state
 7. Get vmdir log level and mask
 ==================
 
3
   Please enter account UPN : uk-pscp1.domain.local@vsphere.local
 New password is -
 somesupersecretpassword-uk
 
3
   Please enter account UPN : na-pscp1.domain.local@vsphere.local
 New password is -
 somesupersecretpassword-na

Then I needed to log on to uk-pscp1 and na-pscp1 and run the following

root@uk-pscp1 [ ~ ]# /opt/likewise/bin/lwregshell
 
 \> cd HKEY_THIS_MACHINE\services\vmdir
 
 HKEY_THIS_MACHINE\services\vmdir> set_value dcAccountPassword "somesupersecretpassword-uk"
 
 HKEY_THIS_MACHINE\services\vmdir> quit
root@na-pscp1 [ ~ ]# /opt/likewise/bin/lwregshell
 
 \> cd HKEY_THIS_MACHINE\services\vmdir
 
 HKEY_THIS_MACHINE\services\vmdir> set_value dcAccountPassword "somesupersecretpassword-na"
 
 HKEY_THIS_MACHINE\services\vmdir> quit

Then I logged back in to sa-pscp1 and performed this in reverse to reset the password for SA

Finally, all PSCs were rebooted.

After the server came up, ELM started to work as it should. For my own sanity, I just gave all systems a reboot. When they came up, they were all still functional.

Issue #2

Vmware Update Manager was not functional in the sense that I couldn’t scan, add baselines, import ISOs or edit depot info.

Turns out the cache was busted, I remember seeing a post about this somewhere and took notes, turns out it helped.

Steps I had to do on each vCenter VCSA:

service-control --stop vmware-updatemgr

/lib/vmware-updatemgr/bin/updatemgr-util reset-db

service-control --start vmware-updatemgr

One thing I recommend is record all custom repositories you have currently as these are erased.

Also, all migrated baselines and baseline groups were destroyed so I had to recreate these too.

2 Comments

    Ytsejamer1

    This was a fantastic series! Thank you for the detail and full context of the steps for a project of this sort.

    One thing I ran into is when the stage 1 deployment created the new VCSA VM. It was unable to set some VM property, but had no idea which one. It went through and completed Stage 1 but it failed moving to Stage 2 because the migration couldn’t contact the new VCSA. The issue was that the vNIC was not set as connected.

    What was nice is that the migration application gave me the appliance link where I could manually kick off the Stage 2 portion of the migration. It worked without issue and was a mirror of what the migration would have presented. In fact, it worked a lot faster through the browser.

      Raymond

      Hi,

      I am glad you found this post useful. Let me know if there is anything else you are interested in me blogging about.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*