Upgrading VCSA from 6.0u3 to 6.5u1

Issues

Issue #1

I had an issue with ELM where our NA and UK vCenters were able to see each other in a webclient, but SA was orphaned, although, everything else worked. I could see tasks happening in the task pane, Netapp VSC was functional etc. After many hours of fighting and sifting through the logs, even after reverting to the snapshots and redeploy (because, you know, maybe I messed up) this issue persisted. I decided that before I revert again (due to time constraints) I’d contact Vmware. Again, this was a further hours of enterprise senior support and eventually the below was found:

 18-03-11T12:05:33.093363+00:00 err vmdird  t@140174730172160: Bind Request Failed (ip of na vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-na.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
 18-03-11T12:05:41.621318+00:00 err vmdird  t@140174730172160: Bind Request Failed (ip of uk vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-uk.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
 18-03-11T12:05:42.058106+00:00 err vmdird  t@140174730172160: Bind Request Failed (ip of uk vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-uk.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL
 18-03-11T12:06:09.119862+00:00 err vmdird  t@140174746957568: Bind Request Failed (ip of na vcenter) error 49: Protocol version: 3, Bind DN: "cn=fqdn-na.domain.local ou=Domain Controllers,dc=vsphere,dc=local", Method: SASL

According to the engineer, this means that the machine account passwords were not being accepted. Apparently this happens when the domainjoin gets interrupted and possibly happened some time during the upgrade process. Question is, why did it happen twice?

The fix was:

On each PSC I had to run:

service-control --stop VMwareSTS

I had to log on to the orphaned PSC (in here I’ll call it SA-PSCP1) and change the password for NA-PSCP1 and UK-PSCP1 using the following commands:

root@sa-pscp1 [ ~ ]# /usr/lib/vmware-vmdir/bin/vdcadmintool


 
 ==================
 Please select:
 0. exit
 1. Test LDAP connectivity
 2. Force start replication cycle
 3. Reset account password
 4. Set log level and mask
 5. Set vmdir state
 6. Get vmdir state
 7. Get vmdir log level and mask
 ==================
 
3
   Please enter account UPN : uk-pscp1.domain.local@vsphere.local
 New password is -
 somesupersecretpassword-uk
 
3
   Please enter account UPN : na-pscp1.domain.local@vsphere.local
 New password is -
 somesupersecretpassword-na

Then I needed to log on to uk-pscp1 and na-pscp1 and run the following

root@uk-pscp1 [ ~ ]# /opt/likewise/bin/lwregshell
 
 \> cd HKEY_THIS_MACHINE\services\vmdir
 
 HKEY_THIS_MACHINE\services\vmdir> set_value dcAccountPassword "somesupersecretpassword-uk"
 
 HKEY_THIS_MACHINE\services\vmdir> quit

root@na-pscp1 [ ~ ]# /opt/likewise/bin/lwregshell
 
 \> cd HKEY_THIS_MACHINE\services\vmdir
 
 HKEY_THIS_MACHINE\services\vmdir> set_value dcAccountPassword "somesupersecretpassword-na"
 
 HKEY_THIS_MACHINE\services\vmdir> quit

Then I logged back in to sa-pscp1 and performed this in reverse to reset the password for SA

Finally, all PSCs were rebooted.

After the server came up, ELM started to work as it should. For my own sanity, I just gave all systems a reboot. When they came up, they were all still functional.

Issue #2

Vmware Update Manager was not functional in the sense that I couldn’t scan, add baselines, import ISOs or edit depot info.

Turns out the cache was busted, I remember seeing a post about this somewhere and took notes, turns out it helped.

Steps I had to do on each vCenter VCSA:

service-control --stop vmware-updatemgr

/lib/vmware-updatemgr/bin/updatemgr-util reset-db

service-control --start vmware-updatemgr

One thing I recommend is record all custom repositories you have currently as these are erased.

Also, all migrated baselines and baseline groups were destroyed so I had to recreate these too.

Issues

Issue #1

Issue #2

Continue: Final Notes

Leave a Reply Cancel reply