Welcome to my website about hands-on quirks we bump into & hopefully how to resolve them !

vExpert Badge

About me : i am Abbed Sedkaoui, worked on VMware virtualization, since GSX and ESX 3, and before that on Virtual Server and VirtualPC from Connectix who also first made Virtual Game Station (VGS a PSX that holded in a floppy disk 1.41MB) back in 1998, all the way up to today latest VCF VMware Cloud Foundation infrastructure VMware Cloud SDDC is based on.

In my views "it" (the Cloud) all started since 2008 with the advent of AMD "Nested Pages" and then 2009 Intel "Extended Pages Tables" in their processor became the trends for alot of compute: for Router i think VRF (Virtual Routing and Forwarding), for Firewall (Context), for Switch (VSI Virtual Switching Instance).

And Hopefully for us labbers we get since then the ability to deploy End2End all virtualized infrastructure :) Following William Lam since around that times. Fast forward 2023 successfully deploying VCF, i am looking to certify VCP-VMC as the required course is offered for FREE! Look at Required Step 2.

About this site : i'll share what worked for me when facing issues and "the problem solving critical thinking mindset" (i know.. its a mouthful :) used to document root cause analysis. Please don't mind the rusticness of this site as i literally created this from scratch on AWS in a few hours.

11/22/2023 Broadcom announces successful acquisition of VMware

Hock Tan : President and Chief Executive Officer "Providing best-in-class solutions for our customers, partners and the industry"

11/12/2023 VMware Explore 2023 Breakout Session URLs

Links to videos with Customer Connect account and direct download links to supporting presentation slides.

VMware Explore EMEA 2023 Breakout Session URLs

VMware Explore US 2023 Breakout Session URLs

11/07/2023 Updated script Automated Tanzu lab Deployment with NSX VRF, Project, VPC

11/11/2023 Update Now merged to William Lam master repo! https://github.com/lamw/vsphere-with-tanzu-nsxt-automated-lab-deployment

My Fork with Branch NSX4 github.com/abbedsedk/vsphere-with-tanzu-nsxt-automated-lab-deployment/tree/nsx4

- Updated for vSphere 8.0 and NSX 4.1.1 due to API changes since vSphere 7 and NSX 3.
- Added a few checks to allow reuse of existing objects like vCenter VDS, VDPortGroup, StoragePolicy, Tag and TagCategory, NSX TransportNodeProfile.
- Added FAQ to create multiple Clusters, and using the same VDS/VDPortGroup, This allow Multi Kubernetes Cluster High-Availability with vSphere Zone and Workload Enablement.
- Added a few pause in the usecase where we deploy only a new cluster to allow Nested ESXi to boot and fully come online (180s) and before VSAN Diskgroup creation (30s).
- Added FTT configuration for VSAN allowing 0 redundancy and to use only one node demo lab VSAN Cluster. (This allow the whole Nested MultiAZ Tanzu lab with NSX VRF, Project, VPC, to run on 128GB box and the play by play of this usecase is next.)
$hostFailuresToTolerate = 0
- Added pause to the script to workaround without babysitting for AMD Zen DPDK FastPath capable owner CPU.
$NSXTEdgeAmdZenPause = 0
- Added -DownloadContentOnDemand option in TKG Content Library to prevent the download in advance of 250GB and reduce to a few GB.
- Added T0 VRF Gateway Automated Creation with Static route like the Parent T0 (Note: an uplink segment '$NetworkSegmentProjectVRF' is connected to parent T0 for connectivity to outside world)
- Added Project and VPC Automated Creation.

11/07/2023 A usecase vSphere with Tanzu using NSX Project VPC Networks and with Multi K8s Cluster High Availability using vSphere Zones

  1. Deploy 1st VSAN Cluster (+1h)vSphere with Tanzu using NSX-T Automated Lab same as before
  2. Deploy 2nd and 3rd VSAN Clusters (15min each) vSphere with Tanzu using NSX-T Automated Lab
  3. Todo after 3 Clusters deployments
  4. Deploy NSX T0 VRF and Project and VPC Subnets Segments IP Blocks (3 min)
  5. Create Zonal Storage Policy Multi-AZ-Storage-Policy
  6. Create 3 zones with the 3 Clusters
  7. Workload Control Plane (WCP) Enablement in Workload Management
  8. Enablement Begining to Ready
  9. Next Enterprise Developper's Tasks: Give a name to a Namespace, Deploy Class-Based or Tanzu Kubernetes Cluster (TKC) and, Deploy a stateful app with Cluster HA.
  10. Next Service Provider's Tasks: Create a provider VDC backed by a Supervisor Cluster, Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director, Offers Kubernetes as a Service (CaaS).

VMware Docs - VMware-vSphere 8.0 - Workflow for Deploying a Supervisor with NSX Networking
In the following section we will do a three-zone Supervisor deployment type.

Deploy 1st Cluster using vSphere with Tanzu using NSX-T Automated Lab same as before

With 3 Nested Esxi, if it is a requirement to fit in 128GB Memory box then specify only 1 Esxi hostname ip, this is possible with $hostFailuresToTolerate = 0
Fill the value of these 3 variables
$NestedESXiHostnameToIPs = @{...}
$NewVCVSANClusterName = "Workload-Cluster-1"
$vsanDatastoreName = "vsanDatastore-1"

1st Cluster

Now Deploying the 2nd and 3rd Cluster follow the steps:
- Change values of these 3 variables for 2nd and 3rd cluster deployments,
- Change to fixed value for the VAppName
- Change value of already deployed VMs (VCSA, NSXManager, NSXEdge) to 0,
- Change value in postDeployNSXConfig from $true to $false for all variables except ($runHealth, $runTransportNodeProfile, $runAddEsxiTransportNode),
    $NestedESXiHostnameToIPs = @{
    $NewVCVSANClusterName = "Workload-Cluster-2"
    $vsanDatastoreName = "vsanDatastore-2" 

    $VAppName = "Nested-vSphere-with-Tanzu-NSX-T-Lab-qnateilb"
		# "Nested-vSphere-with-Tanzu-NSX-T-Lab-$random_string" 
		# Random string can be used on the first cluster but reuse the $VAppName for 2nd and 3rd cluster deployments.

	$preCheck = 1
	$confirmDeployment = 1
	$deployNestedESXiVMs = 1
	$deployVCSA = 0
	$setupNewVC = 1
	$addESXiHostsToVC = 1
	$configureVSANDiskGroup = 1
	$configureVDS = 1
	$clearVSANHealthCheckAlarm = 1
	$setupTanzuStoragePolicy = 1
	$setupTKGContentLibrary = 1
	$deployNSXManager = 0
	$deployNSXEdge = 0.
	$postDeployNSXConfig = 1
	$setupTanzu = 1
	$moveVMsIntovApp = 1

	$deployProjectExternalIPBlocksConfig = 0
	$deployProject = 0
	$deployVpc = 0
	$deployVpcSubnetPublic = 0
	$deployVpcSubnetPrivate = 0
 if($postDeployNSXConfig -eq 1) {

2nd Cluster

3rd Cluster

NSX View


Todo after 3 Clusters deployments:

- Esxi -> Configure -> TCP/IP Configuration -> IPV6 CONFIGURATION -> Disable
- Esxi -> Configure -> TCP/IP Configuration -> Default -> Edit -> copy 'Search domains' to 'Domain'
- Esxi -> Configure -> TCP/IP Configuration -> Default -> Edit -> inverse Preferred and Alternate DNS server if needed. (In my case this is part of why Workload Enablement wouldn't come up)
- SSH Esxi's Reboot via Send to all 'Multitab Putty' and Enter in each Esxi's Tab
- Snapshot/Export the Outer ESXi VM or the Lab vApp
- Start the Lab vApp and reset the alarms
- SSH virtual routeur, i use vyos, configure a static route each Project and VPC Subnet IP/Netmask via $T0GatewayInterfaceAddress (In my case this is the other part of why Workload Enablement wouldn't come up).

- Deploy NSX T0 VRF and Project and VPC Subnets Segments IP Blocks (3 min)

Fill the variables of section:
# Project ,Public Ip Block, Private Ip Block
# VPC, Public Subnet, Private Subnet
VMware Docs - VMware-NSX 4.1 - Add a Subnet in an NSX VPC
Self Service Consumption with Virtual Private Clouds Powered by NSX
(Gotcha: $VpcPublicSubnetIpaddresses must be a subset of $ProjectPUBcidr, and can't use the first or last subnet block size.)
# T0 VRF Gateway
# Which T0 to use for the Project External connectivity : $T0GatewayName or $T0GatewayVRFName (This option is important as it determine whether the T0 VRF Gateway is created or not.)
$ProjectT0 = $T0GatewayVRFName
Change values of all variables to 0 and set to 1 ($preCheck , $confirmDeployment , Project's and Vpc's ones).
	$preCheck = 1
	$confirmDeployment = 1
	$deployNestedESXiVMs = 0
	$deployVCSA = 0
	$setupNewVC = 0
	$addESXiHostsToVC = 0
	$configureVSANDiskGroup = 0
	$configureVDS = 0
	$clearVSANHealthCheckAlarm = 0
	$setupTanzuStoragePolicy = 0
	$setupTKGContentLibrary = 0
	$deployNSXManager = 0
	$deployNSXEdge = 0.
	$postDeployNSXConfig = 0
	$setupTanzu = 0
	$moveVMsIntovApp = 0

	$deployProjectExternalIPBlocksConfig = 1
	$deployProject = 1
	$deployVpc = 1
	$deployVpcSubnetPublic = 1
	$deployVpcSubnetPrivate = 1
Note: Screenshot the summary before confirming as a reminder of the Subnet IP/Netmask later.

Deploy VRF, Project, VPC with all associated networking (IpBlocks, Segments, Subnets, Routing, DHCP) in 3.27 minutes.

Florilege of NSX API call from 2 PowerCLI Modules and from straight REST call.

NSX Topology T0/VRF - Project - VPC

- Create Zonal Storage Policy "Multi-AZ-Storage-Policy" -> No redanduncy (if you configured FTT = 0 one node cluster)

VMware Docs - VMware-vSphere 8.0 - Create Storage Policies for vSphere with Tanzu
VMware Docs - VMware-vSphere 8.0 - Deploy a Three-Zone Supervisor with NSX Networking

- Create 3 zones with the 3 Clusters

Workload Control Plane (WCP) Enablement in Workload Management

Enablement Begining to Ready

Next Developpers Tasks:Give a name to a Namespace, Deploy Class-Based or Tanzu Kubernetes Cluster (TKC) and, Deploy a stateful app with Cluster HA.

vSphere with Supervisor Cluster Configuration Files

Next Service Provider's Tasks: Create a provider VDC backed by a Supervisor Cluster, Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director, Offers Kubernetes as a Service (CaaS).

Publish a Provider VDC Kubernetes Policy to an Organization VDC in VMware Cloud Director

Next: Recovering NSX UnrecoverableCorfuError due to disk full. Stay tunned !


04/01/2023 Added Export option Nested VCF Lab VMs PR script.

The option can be set to run following the deployment or at later time wich is prefered to save a state of the Lab VMs as OVA at later time.

A FAQ is added to explain how to set option.

Note that script is coded to export the VMs of the latest vApp deployed by the script that start with the name Nested-VCF-Lab-.

15 min to stop, export as OVA, and start back the VMs

03/27/2023 Enable multiple vApp deployment on the same Cluster

Because i was unable to deploy multiple time i created an issue then a PR that got Merged.
Fixed Automated VMware Cloud Foundation Lab Deployment
Credit to LucD from VMTN

03/05/2023 Comparing CPU I/O usage during VCF SDDC Management Bringup on 4 vs 1 Nesed ESXi node

Follow-up on 02/14/2022 previous issue.

Found that the root cause to be a nested lab environment use case or CPU-I/O contention on the hosts,
occurring on a task towards the end of the bringup called "Configure Base Install Image Repository on SDDC Manager",
that copy vcsa iso and nsx ova to an nfs on the 4 Nested ESXi VSAN datastore,
that made the cpu to the roof and consequently applications ruuning in the three VMs vCenter, NSX and SDDC manager had kernel stuck at one point or
multiple time.
Looking deeper into it, i think the subsequent tasks might had issue with kernel stuck vms (i feel there maybe missing pieces to understand it all ...).
Was monitoring while that contention happened,then made screenshots CPU and I/O usage of 2 SDDC bringup at time of that copy task to illustrate:

one when that whole issue occured with 4 nested ESXi
one with 1 nested ESXi using FTT=0 trick given by William Lam.
using less vCPUs (8 instead of 4x8) and a faster I/O capable NVMe SSD (PCIe 4.0 instead of 3.0) confirmed without kernel stuck all is well.
I think that on real gears this should not happen.

03/03/2023 PCIE 4.0 LAB UPGRADE - AMD Ryzen 3700X + Netac NV7000

B.O.M 308€
AMD Ryzen7 3700X 3,6 GHz 7NM L3 = 32M at 158€
Netac SSD 2tb M2 NVMe PCIe 4.0 x4 at 150€

Ordered on 02/11/2023 and received 03/03/2023 but was worth the wait,not only it come from the Official Netac store but on the back it says Quality Check "QC PASS 02/2023".
Note you have to have PCIe 4.0 capable motherboard, i choosen mine MSI X570 just for that and the fact that it run my older Ryzen 2700.
What to expect of this speedup i mean from PCIe 3.0 at 2000MB/s to PCIe 4.0 at 7000MB/s sequential read/write throughput, not really that because we all know OS use mixed read/write random 4KB,
nevertheless VCF Nested deploy twice faster in 15 minutes instead of 30 because the bandwidth is twice faster 😀.

02/24/2023 VMware Cloud Foundation with a single ESXi host for Workload Management Domain made by William Lam.

This will give room to play AVN or Workload VI Domain in the futur.

02/23/2023 Removing NSX CPU/Memory reservations when deploying a VMware Cloud Foundation (VCF) Management or Workload Domain made by William Lam.

03/21/2023 Update
I followed the steps but in my case i had some issues with directory returned by ovftool wich needed /${NSX_FILENAME}/ in the path of the commands and
as final step to get the modified NSX ova into the overlay part of "/mnt/iso/" known as "/upper/" from "/work/".
  /mnt/iso/...ova # the bringup is seeing this directory wich is combination of the following 'oldiso' RO + 'upper' RW directories
  /root/oldiso/...ova # read only filesystems
  /overlay/upper/...ova # read write filesystems

  /overlay/work/work/...ova # read write filesystems
I simply issued a "cp" of the ova from "/work" to "/upper" wich is writable and it was presented in the "/mnt/iso" thus
i shared these on the page that what worked for me.

In the bringup lasts tasks the NSX ova is copied from "/mnt/iso" to an NFS share for SDDC Manager to consume when adding 'Workload VI Domain'.

Feel free to check it out, it's not only removing the NSX reservation for the 'Workload Management Domain' bringup but
also for later subsequent 'Workload VI Domain' which is wanted for limited resources on lab environement.

And now like "Neo" in "The Matrix" with "JuJitsu" i can say 'Yay i know the Linux overlay filesystems!' to make readonly writable, thanks to that (just a side note pointer, docker use exactly that for its layering).

02/14/2022 - SSDC Mananger 8 accounts disconnect

1) NSXT MANAGER root admin audit account
Just as in the post before click on the 3 dots and REMEDIATE using same password used in the deployment script

2) ESXI service accounts

Steps to recover expired Service Accounts in VMware Cloud Foundation (KB 83615)
SSH into each of the 4 Nested ESXi
[root@vcf-m01-esx01:~] passwd svc-vcf-vcf-m01-esx01
Changing password for svc-vcf-vcf-m01-esx01
Enter new password:
Re-type new password:
passwd: password updated successfully
(note i didn't do the reset failed login part)
SDDC Manager ESXI svc accounts -> 3dots REMEDIATE with this newly created password

3) PSC - KB: Password rotation for administrator@vsphere.local causes issues when multiple VMware Cloud Foundation instances share a single SSO domain (KB 85485)
we must be logged with an another SSO user with ADMIN role
to be able to click REMEDIATE on PSC administrator@vsphere.local

I think a proper SSO ADMIN user like vcf-secure-user@vsphere.local illustrated in the KB is the way to go on production.
In my case since it was a lab i found an SSO account, so i promoted it to admin role.
Disclamer: i do not know if that is the supported even thought:
from the remediate password window we learn that service acount will be rotate after the remediate,
we can remove admin role from this service account.
Using a)SDDC manager UI or b)vCenter UI, it's easly done instead of API
a) SDDC manager UI as administrator@vsphere.local -> Single Sign On -> +USERS AND GROUPS -> Search User: svc , Refine search by: Single User, Domain: vsphere.local
Select the user svc-vcf-m01-nsx01-vcf-m01-vc01 -> Choose Role: ADMIN (note this can be done from vCenter see below), then click ADD.

b) vCenter UI as administrator@vsphere.local -> Licensing -> Single Sign On -> Users and Groups -> Users -> Domain: vsphere.local, Find: svc -> EDIT: Password, Confirm Password

c) SDDC manager UI login as svc-vcf-m01-nsx01-vcf-m01-vc01@vsphere.local -> Security -> Password Management -> PSC -> administrator@vsphere.local -> REMEDIATE again using the same original password

d) logout
optionally e) redo a) but select the 3dots and remove the admin role on this service SSO user.


When we mouse hover there is a bubble informing us that sync should be happening no more than 24h.
So mine fall in expected result because i didn't give a chance after the deployment to sync and refresh, less than 24h.

Lesson learned, if this happening again i will wait 24h before taking action. Related this with someone experiencing similar effect on VMTN VCF 4.5.0 reporting accounts disconnected.

02/18/2023 - Importing VMs Vyos and nested ESXi, Checking and Configuring NTP

First the ovas import wizard don't need to be filled as the default are already set for our environment.


One thing to do on the vyos console
is to remove occurence of old mac address "hw-id" and
any new interfaces in the file using
"vi /config/config.boot" then
"dd" command to delete line then
save it with ":" "wq!" (note you got to learn where US QWERTY keymap are if you have AZERTY keyboard), then reboot.

nested ESXi and check NTP

One thing to do on all nested ESXi VM uppon import as well is to:
SSH into each of them to remount permanently the OS volume with this one liner for example and recheck NTP.

using Multi Tabbed Putty mtputty
ssh all 4 nestedesxi
tick send to all
UUID=$( esxcfg-volume -l | grep UUID | cut -b 17-52 ); esxcfg-volume -M $UUID
ssh cb
tick send to all
ntpq -p

At this point not all Esxi had ntp running or even setup or sitting in INIT state

Configure NTP server on nested ESXi

We're tempted to edit ntp.conf but there is a comment that tell not to
[root@vcf-m01-esx02:~] cat /etc/ntp.conf
# Do not edit this file, config store overwites it
So how do we it:
Troubleshooting NTP on ESX and ESXi 6.x / 7.x / 8.x (KB 1005092)
for builds 7.0.3 onwards
this KB explain how to add "tos maxdist 15" setting
So we can use this same method to configure the server setting

/etc/init.d/ntpd restart

NTPold="`cat /etc/ntp.conf | grep server`"
NTPprefered="server 0.pool.ntp.org"
cp /etc/ntp.conf /etc/ntp.conf.bak -f && sed -i 's/'"$NTPold"'/'"$NTPprefered"'/' /etc/ntp.conf.bak  && esxcli system ntp set -f /etc/ntp.conf.bak
cp /etc/ntp.conf /etc/ntp.conf.bak -f && echo "tos maxdist 15" >> /etc/ntp.conf.bak && esxcli system ntp set -f /etc/ntp.conf.bak
esxcli system ntp set -e 0 && esxcli system ntp set -e 1
/etc/init.d/ntpd restart
ntpq -p
NTP service auto start is not working in ESXi 7.0 (KB 80189)
chkconfig --list ntpd
chkconfig ntpd on
That's it, you're set for success! Remember before you begin the bringup to shutdown all VMs to snapshots them all, just to be safe!

02/09/2023 UPDATE - Contributed to William vSphere with Tanzu using NSX-T Automated Lab Deployment script to allow additional Edge nodes creation. Now merged.

02/08/2023 - SDDC Manager account disconnected NSXT MANAGER

The trick here is to understand, the text "Specify the password that was set manually on the component", that means the same password we set on the deployment script, more than the misleading warning.

02/06/2023 UPDATE - Finally solution to issues NSX Installation and HA agent install on ESXi were due to lake of memory.

Clicking on NSX Install Fail.. we see that the ESXi host is laking memory.

This 2nd Esxi Node happened to be one hosting the nsx VM but it had more than 13GB of free memory.

We can work around this issue by live migrating the NSX vm to the 3rd ESXi node, and then hit the Resolve Button

We see an unknown node status, but from KB 94377 we learn that is health check issue.

Next install of the HA agent onto this exact same 3rd ESXi Node fail.

I was thinking of doing the same trick with live Migration of NSX but not possible, then i shutdown NSX and migrated it to ESXi 4th node.
But then it wouldn't power on. Needing an extra ~200MB.

Looking at the 4th ESXi node there was plenty of memory apparently 28.7GB.

At that point i was curious, from vCenter enabled SSH service since it's stopped during bringup, to have a look at the available Reservation memory for the user namespace using this command found on VMTN:

memstats -r group-stats -g0 -l2 -s gid:name:parGid:nChild:min:max:conResv:availResv:memSize -u mb 2> /dev/null | sed -n '/^-\+/,/.*\n/p'

I figured out that if NSX need 16384MB of reservation when here we see 16372MB reservation available + 178 MB overhead,
16384-16372+178=200MB that would explain why vCenter admission failure wouldn't let NSX vm power on.

The solution is easy, just bump the ESXi memory a bit more, at that time i was testing 42GB, so redone the lab with 46GB and it worked flawlessly on these tasks. Now merged.

Stay tunned for the next series of issues/solutions (VMCA, SSH Key Rotate, account disconnected from SSDC Manager).

02/04/2023 UPDATE - Note i do not recommend doing the Bringup with nesting the Nested environment like in the picture above, to have better performance (having 3 hypervisors "in a row" is only meant for lite Lab deployment 😀) Following below i'll explain how deploy, then modify Cloud Builder timeout, then export.

And to avoid BUG soft lockup which makes feel pretty much like PSOD with some nasty effect (more on that later in disconnected account posts here and here) and many others issues that could arise during bringup. Clearly the expected I/O throughput is minimum on 100s MB/s not 10s MB/s.

To play with VCF Lab on a laptop with 16cores and 128GB, it's prefectly acceptable to deploy like in the picture
but after the script deployment, do the export of the VCF's VM (it means double time 30min deploy + 30min export + then import, you'll see real SSD speed at this point!).

Export the Nested VCF Lab's VMs
If you do know how to connect the VirtualInfrastrure then that can be done in one liner powershell to export the VApp:

Get-VM -Name vcf-m01-* | Export-VApp -Destination "D:\VM\Nested\Vapp\" -Force -Format Ova | Out-Null

If you don't, i'll soon make a PR to William Lam's script to add ExportVM option (Just leave option $exportVMs = 0, for the deployment).

Update 04/01/2023 PR Done

Customization pre export:
Use multi tabbed SSH client, on Windows MTPuTTY is free.
For Cloud Builder vm ssh to it and extend this two timeout:
sed -i 's/ovf.deployment.timeout.period.in.minutes=40/ovf.deployment.timeout.period.in.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties
sed -i -e's/nsxt.disable.certificate.validation=true/nsxt.disable.certificate.validation=true\nnsxt.manager.wait.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties

systemctl status vcf-bringup
systemctl restart vcf-bringup

Note: the second timeout "nsxt.manager.wait.minutes" is shown in vcf-bringup-debug.log in milliseconds and converting it from 1 200 000 ms, it is 20 minutes and this is part of why the installation of NSX bits is interrupted, the other reason is a lake of memory on ESXi wich have been fixed in the script to be 46GB.

After the customization of the vm done and CB validation is all green, rerun the script with all option set to 0 execpt
$preCheck = 1
$confirmDeployment = 1
$exportVMs = 1
Export the Virtual router's VMs
Additionally export also your virtual router(s), in my case it is a csr1000v, supposedly there are deployed with name convention csr-*

Get-VM -Name csr-* | Export-VApp -Destination "D:\VM\Nested\Vapp\" -Force -Format Ova | Out-Null

If your virtual router(s) is/are Vyos, supposedly there are deployed with name convention vyos-*

Get-VM -Name vyos-* | Export-VApp -Destination "D:\VM\Nested\Vapp\" -Force -Format Ova | Out-Null

01/25/2023 UPDATE - Good News Automated VMware Cloud Foundation Lab Deployment script new version already here !

Just asked for it few days ago here, then shared some of these tips on William Lam website and on the same day, (would you believe it ?) a PR and a merge make it happen ! The virtualization community is fast 😀. This version include fix for step 1,3,4 (need to follow the KB i choose option 2 patch with winscp or integrate it in the ova),5,7.

12/25/2022 - VCF v4.5 Lab

PHYSICAL LAB B.O.M 900€ (GPU & HDD not counted)
• RYZEN 2700 BOX (230€) officially support 64GB but it takes
• 128GB DDR4 Patriot 4 x 32GB at (100€) each with few BSOD MEMORY MANAGEMENT
• 1TB SSD NVMe M.2 Micron P1 (100€) (100GB for OS and 831GB for LAB that became full! I got a story)


For Router specifically
1 adapter not tagged for management
8 adapter trunk port group vlan 4095 (coming from windows VMware Workstation VMNet adapter Configuration Jumbo + vlan 4095 + all IP protocols unchecked)
7 configured sub-interface dot1q tag corresponding to VLAN desired for the bringup
1 configured as trunk

For Nested ESXi specifically
4 adapter on trunk port group

Validation errors -> solutions

1. After deployment Automated VMware Cloud Foundation Lab Deployment
Open Outer vCenter change the 1st disk from 12GB to 32GB in Nested ESXi VMs or Cloud builder fail "VSAN_MIN_BOOT_DISKS.error".
2. Change the 3rd (Vsan Capacity) disk from 60GB to more than 150GB if the Nested ESXi are Nested themselves in an ESXi VM !!
(I go into the inception movie running the Outer ESXi in a VM on windows VMware Workstation. The advantage to snapshot the whole thing is
significantly appreciated especially for the VCF bringup but the slowness less appreciated.) Regarding speed I’m looking forward trying PCIe 4.0 NVME
once I upgrade my CPU 3700X to speedup some tasks of the bringup and avoid some CPU issues related (windows BSOD).
3. Change all four passwords of SddcManager with ones as strong as the NSX ones
4. I got “Gateway IP Management not contactable” -> patch it with KB 89990 (release notes)
5. Failed VSAN Diskgroup -> “esxcli system settings advanced set -o /VSAN/FakeSCSIReservations -i 1” on the Outer ESXi.
6. For DUP “esxcli system settings advanced set -o /Net/ReversePathFwdCheckPromisc -i 1”
7. Instead of DHCP, use IP Pool VMware Cloud Foundation API Reference Guide SDDC look for
"ESXi Host Overlay TEP IP Pool"
8. Use a router IP as NTP for VCF but configure on the router a reliable stratum external NTP server
9. After Validation All green, Before launching the bringup Modify some CloudBuilder timeout:

vim /opt/vmware/bringup/webapps/bringup-app/conf/application.properties
sed -i 's/ovf.deployment.timeout.period.in.minutes=40/ovf.deployment.timeout.period.in.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties
sed -i -e's/nsxt.disable.certificate.validation=true/nsxt.disable.certificate.validation=true\nnsxt.manager.wait.minutes=180/' /opt/vmware/bringup/webapps/bringup-app/conf/application.properties
echo "bringup.mgmt.cluster.minimum.size=1" >> /etc/vmware/vcf/bringup/application.properties
systemctl restart vcf-bringup
watch "systemctl status vcf-bringup"

tail -f /opt/vmware/bringup/logs/vcf-bringup-debug.log
10. Disable Automatic DRS of VC NSX and SDDC Manager after each deployment
in the Inner Vcenter or else VSAN will rebalance those critical VM during the others being deployed :
Cluster -> Configure -> VM Overrides -> Automatic DRS -> Disabled or Manual

Knowing these issues beforehand allow to modify the OVAs and scripts before deploying for Nested ESXi and Cloud Builder, until a new version come up.

9/25/2022 - Deploying Cloud Director in small form factor : the troubleshoot

VMware Technology Network > Cloud & SDDC > vCloud > VMware vCloud Director Discussions > Re: Configure-vcd script failed to complete

Long story short, issue this command "sed -i s/10s/60s/ /opt/vmware/appliance/bin/appliance-sync.sh" and bump up the vCPUs from 2 to 4.

The best way to avoid thinkering the appliance scripts file is to give it at least 4 vCPUs before deploying, as there is an hard coded value of 8 CPUs, i detemined that 4 is sufficient based on top utility showing 400% cpu usage, meaning 4 x 100% x 1 CPU core.