Published on

VMware vSphere Troubleshooting

Authors
  • Name
    Jackson Chen

Check vSphere and vCenter Monitoring and Performance

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.monitoring.doc/GUID-A8B06BE0-E5FC-435C-B12F-A31618B21E2C.html

Overview of VMware Tools

https://kb.vmware.com/s/article/340

This article:

  1. Includes an overview of VMware Tools.
  2. Lists the product documentation that contains instructions for installing VMware Tools.
  3. Provides additional links to VMware Tools installation instructions and troubleshooting information.

VMware Admin Tools

https://blogs.vmware.com/virtualblocks/2019/03/16/top-10-vmware-admin-tools/

Troubleshooting Virtual Machines

Error Message When You Try to Migrate Virtual Machine with USB Devices Attached

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-A9A20C0A-1B87-4BDE-8719-1546CC2CFB63.html

Migration with vMotion cannot proceed and issues a confusing error message when you connect multiple USB devices from an ESXi host to a virtual machine and one or more devices are not enabled for vMotion.

To successfully pass vMotion compatibility checks, you must enable all USB devices that are connected to the virtual machine from a host for vMotion. If one or more devices are not enabled for vMotion, migration will fail.

# Solution
1. Make sure that the devices are not in the process of transferring data before removing them.
2. Re-add and enable vMotion for each affected USB device

Cannot Copy Data From an ESXi Host to a USB Device That Is Connected to the Host

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vm_admin.doc/GUID-EBAF0AED-AC82-4BB8-B829-6AA313D5F02C.html

You can connect a USB device to an ESXi host and copy data to the device from the host. For example, you might want to gather the vm-support bundle from the host after the host loses network connectivity. To perform this task, you must stop the USB arbitrator.

This problem occurs because the nonbootable USB device is reserved for the virtual machine by default. It does not appear on the host's file system, even though lsusb can see the device.

# Solution
1. Stop the usbarbitrator service
    /etc/init.d/usbarbitrator stop
2. Physically disconnect and reconnect the USB device.
    By default, the device location is 
    /vmfs/devices/disks/mpx.vmhbaXX:C0:T0:L0.
3. After you reconnect the device, restart the usbarbitrator service
    /etc/init.d/usbarbitrator start
4. Restart hostd and any running virtual machines to restore access to the passthrough devices in the virtual machine

Troubleshooting with logs

You can often obtain valuable troubleshooting information by looking at the logs provided by the various services and agents that your implementation is using. Most logs are located in /var/log/ for vCenter Server deployments.

# Common Log Directories
Log Directory   Description
--------------------------------
applmgmt        VMware Appliance Management Service
cloudvm         Logs for allotment and distribution of resources between services
cm              VMware Component Manager
firstboot       Location where first boot logs are stored
rhttpproxy      Reverse Web Proxy
sca             VMware Service Control Agent
statsmonitor    Vmware Appliance Monitoring Service
vapi            VMware vAPI Endpoint
vmaffd          VMware Authentication Framework daemon
vmdird          VMware Directory Service daemon
vmon            VMware Service Lifecycle Manager

Management Node Logs

The following logs are available if a management node deployment is chosen.

# Management Node Log Directories
Log Directory   Description
--------------------------------------------
autodeploy      VMware vSphere Auto Deploy Waiter
content-library VMware Content Library Service
eam             VMware ESX Agent Manager
invsvc          VMware Inventory Service
mbcs            VMware Message Bus Config Service
netdump         VMware vSphere ESXi Dump Collector
perfcharts      VMware Performance Charts
vmcam           VMware vSphere Authentication Proxy
vmdird          VMware Directory Service daemon
vmware-sps      VMware vSphere Profile-Driven Storage Service
vmware-vpx      VMware VirtualCenter Server
vpostgres       vFabric Postgres database service
mbcs            VMware Message Bus Config Service
vcha            VMware High Availability Service

Virtual machine hardware

Virtual machine hardware versions (1003746)

https://kb.vmware.com/s/article/1003746

Some common symptoms

  1. A virtual machine does not power on.
  2. Some virtual machine operations are greyed out and unavailable.
  3. You experience unexpected behavior in a guest operating system.
#  Resolution
1. VMware products and their virtual hardware version
2. Upgrading the virtual hardware
    a. Power on the virtual machine.
    b. Install VMware Tools.
    c. Power off the virtual machine.
    d. Change the hardware setting, change hardware compatibility
Products            Virtual Hardware Version    
--------------------------------------
ESXi 7.0 U2 (7.0.2)     19
ESXi 7.0 U1 (7.0.1)     18
ESXi 7.0  (7.0.0)       17 
ESXi 6.7 U2             15 
ESXi 6.7                14
ESXi 6.5                13 
ESXi 6.0                11    

vSphere 7.0 – Upgrade virtual VM hardware and VMware Tools

https://4sysops.com/archives/vsphere-7-0-upgrade-virtual-vm-hardware-and-vmware-tools/

The website provides some common upgrade scenarios.

How to properly change virtual SCSI controller for VMware PVSCSI

https://kb.vmware.com/s/article/1002149

https://www.vladan.fr/how-to-change-virtual-scsi-controler/

The following explains how to properly change virtual machine virtual SCSI controller to VMware PVSCSI.

1. Power off VM
2. Clone the VM as backup
3. Add a new disk (1GB in size). 
    Important: This disk need to use VMware paravirtual SCSI controller
4. Power on VM
    VM should install the drivers automatically and recognise the disk
    Note: You might have to initialise, bring online and format the volume
5. After all drivers have been properly installed, restart the VM, then gracefully shutdown the VM
6. Change the first controller to PVSCSI
7. Delete the newly added 1GB disk

Stopping, Starting or Restarting VMware vCenter Server Appliance 6.x and above services and vCenter Services

https://kb.vmware.com/s/article/2109887

In VMware vCenter Server 6.0 and later, VMware recommends to use the vSphere Web Client or Service Control command-line tool to stop, start, or restart vCenter Server Appliance services.

Listing the vCenter Server Appliance services, start and stop
# To list the vCenter Server Appliance services within the vSphere Web Client
1. Log in to the vSphere Web Client with a vCenter Single Sign-on administrator account.
2. Navigate to Administration > Deployment > System Configuration.
3. Click Nodes, select the vCenter Server Appliance node and click the Related Objects tab.

# To list the vCenter Server Appliance services using the command-line:
1. Log in as root through an SSH or console session on the vCenter Server Appliance.
2. Run this command to launch the shell
    shell
3. change directories to /bin:
    cd /bin
4. list the vCenter Server Appliance services
    service-control --list
5. To view the current status of the vCenter Server Appliance services
    service-control --status

# Start services
service-control --start <servicename>   # Start specific service
service-control --start --all       # start all services

# To stop a vCenter Server Appliance service using the vSphere Web Client
1. Log in to the vSphere Web Client with a vCenter Single Sign-on administrator account.
2. Navigate to Administration > Deployment > System Configuration.
3. Click Nodes, select the vCenter Server Appliance node and click the Related Objects tab.
4. Right-click on the service you would like to stop and select Stop.

# Stop services
service-control --stop <servicename>   # Stop specific service 
service-control --stop --all    # stop all services

vCenter Server 7.x services

Service         Description
--------------------------------------------------
vmware-vmon     VMware Service Lifecycle Manager
vmonapi         VMware Service Lifecycle Manager API
vmafdd          VMware Authentication Framework
vmdird          VMware Directory Service
vmcad           VMware Certificate Service
lookupsvc       VMware Lookup Service
vmware-sca      VMware Service Control Agent
vmware-stsd     VMware Security Token Service
vmware-rhttpproxy   VMware HTTP Reverse Proxy
vmware-envoy    VMware Envoy Proxy
vmware-netdumper    VMware vSphere ESXi Dump Collector
vmware-vapi-endpoint    VMware vAPI Endpoint
vmware-vpxd-svcs    VMware vCenter-Services
vmware-perfcharts   VMware Performance Charts
applmgmt        VMware Appliance Management Service
vmware-statsmonitor     VMware Appliance Monitoring Service
vmware-cis-license  VMware License Service
vmware-vpostgres    VMware Postgres
vmware-postgres-archiver    VMware Postgres Archiver
vmware-vdtc     VMware vSphere Distrubuted Tracing Collector
vmware-vpxd     VMware vCenter Server
vmware-eam      VMware ESX Agent Manager
vmware-vsm      VMware vService Manager
vmware-sps      VMware vSphere Profile-Driven Storage Service
pschealth       VMware Platform Services Controller Health Monitor
vmware-rbd-watchdog     VMware vSphere Auto Deploy Waiter
vmware-content-library  VMware Content Library Service
vmware-imagebuilder     VMware Image Builder Manager
lwsmd           Likewise Service Manager
vmcam           VMware vSphere Authentication Proxy
vmware-vcha     VMware vCenter High Availability
vmware-updatemgr    VMware Update Manager
vmware-vsan-health  VMware VSAN Health Service
vsphere-ui      VMware vSphere Client
vmware-hvc      VMware Hybrid VC Service
vmware-trustmanagement  VMware Trust Management Service
vmware-certificatemanagement    VMware Certificate Management Service
vmware-certificateauthority     VMware Certificate Authority Service
vmware-pod      VMware Patching and Host Management Service
vlcm            VMware vCenter Lifecycle API
vmware-analytics    VMware Analytics Service
vmware-topologysvc  VMware Topology Service
vmware-infraprofile     VMware Infraprofile Service
wcp             Workload Control Plane
vtsdb           VMware vTsdb Service
vstats          VMware vStats Service
observability   VMware VCSA Observability Service
observability-vapi  VMware VCSA Observability VAPI Service

vCenter Server Appliance Services

Service Name    Description
---------------------------------------------
applmgmt        VMware Appliance Management Service
vmware-cis-license  VMware License Service
vmware-cm       VMware Component Manager
vmware-eam      VMware ESX Agent Manager
vmware-sts-idmd VMware Identity Management Service
vmware-invsvc   VMware Inventory Service
vmware-mbcs     VMware Message Bus Configuration Service
vmware-netdumper    VMware vSphere ESXi Dump Collector
vmware-perfcharts   VMware Performance Charts
vmware-rbd-watchdog VMware vSphere Auto Deploy Waiter
vmware-rhttpproxy   VMware HTTP Reverse Proxy
vmware-sca      VMware Service Control Agent
vmware-sps      VMware vSphere Profile-Driven Storage Service
vmware-stsd     VMware Security Token Service
vmware-syslog   VMware Common Logging Service
vmware-syslog-health    VMware Syslog Health Service
vmware-vapi-endpoint    VMware vAPI Endpoint
vmware-vdcs     VMware Content Library Service
vmafdd          VMware Authentication Framework
vmcad           VMware Certificate Service
vmdird          VMware Directory Service
vmware-vpostgres    VMware Postgres
vmware-vpx-workflow VMware vCenter Workflow Manager
vmware-vpxd     VMware vCenter Server
vmware-vsm      VMware vService Manager
vsphere-client  vSphere Web Client
vmware-vws      VMware System and Hardware Health Manager
vmware-vsan-health  VMware vSAN Health Service

Troubleshooting with Logs

The following logs are common to all vCenter Server deployments. Most logs are located in /var/log/ for vCenter Server deployments.

Common vCenter Server Logs

# Common all vCenter Server deployments
Log Directory       Description
--------------------------------------------------------------
vmware/applmgmt     VMware Appliance Management Service
cloudvm             Logs for allotment and distribution of resources between services
firstboot           Location where first boot logs are stored
rhttpproxy          Reverse Web Proxy
sca                 VMware Service Control Agent
vmware/applmgmt     VMware Appliance Monitoring Service
vapi                VMware vAPI Endpoint
vmafdd              VMware Authentication Framework daemon
vmdird              VMware Directory Service daemon
vmon                VMware Service Lifecycle Manager

Management Node Logs

The following logs are available if a management node deployment is chosen.

Log Directory       Description
-----------------------------------------------------------
vmware/rbd          VMware vSphere Auto Deploy Waiter
content-library     VMware Content Library Service
eam                 VMware ESX Agent Manager
netdumper           VMware vSphere ESXi Dump Collector
perfcharts          VMware Performance Charts
vmcam               VMware vSphere Authentication Proxy
vmdird              VMware Directory Service daemon
vmware-sps          VMware vSphere Profile-Driven Storage Service
vpxd                VMware VirtualCenter Server
vpostgres           vFabric Postgres database service
vcha                VMware High Availability Service

vCenter Server Upgrade Fails When Unable to Stop Tomcat Service

A vCenter Server upgrade can fail when the installer is unable to stop the Tomcat service.

If the vCenter Server installer cannot stop the Tomcat service during an upgrade, the upgrade fails with an error message similar to Unable to delete VC Tomcat service. This problem can occur even if you stop the Tomcat service manually before the upgrade, if some files that are used by the Tomcat process are locked.

Solution
1. From the Windows Start menu, select Settings > Control Panel > Administrative Tools > Services.
2. Right-click VMware VirtualCenter Server and select Manual.
3. Right-click VMware vCenter Management Webservices and select Manual.
4. Reboot the vCenter Server machine before upgrading.

This releases any locked files that are used by the Tomcat process, and enables the vCenter Server installer to stop the Tomcat service for the upgrade.
Alternatively, you can restart the vCenter Server machine and restart the upgrade process, but select the option not to overwrite the vCenter Server data.

Troubleshooting vCenter Server and ESXi Host Certificates

Certificates are automatically generated when you install vCenter Server. These default certificates are not signed by a commercial certificate authority (CA) and might not provide strong security. You can replace default vCenter Server certificates with certificates signed by a commercial CA. When you replace vCenter Server and ESXi certificates, you might encounter errors.

New vCenter Server Certificate Does Not Appear to Load

When you install new vCenter Server certificates, you might not see the new certificate.

Cause: Existing open connections to vCenter Server are not forcibly closed and might still use the old certificate.

#To force all connections to use the new certificate, use one of the following methods.
1. Restart the network stack or network interfaces on the server.
2. Restart the vCenter Server service.
vCenter Server Cannot Connect to Managed Hosts

vCenter Server cannot connect to managed hosts after vCenter server certificates are replaced and the system is restarted.

Log into the host as the root user and reconnect the host to vCenter Server.

Troubleshooting ESXi Hosts

vCenter Server reports vSphere HA host states that indicate an error condition on the host. Such errors can prevent vSphere HA from fully protecting the virtual machines on the host and can impede vSphere HA's ability to restart virtual machines after a failure. Errors can occur when vSphere HA is being configured or unconfigured on a host or, more rarely, during normal operation. When this happens, you should determine how to resolve the error, so that vSphere HA is fully operational.

vSphere HA Agent Is in the Agent Unreachable State

vSphere HA reports that an agent is in the Agent Unreachable state when the agent for the host cannot be contacted by the primary host or by vCenter Server. Consequently, vSphere HA is not able to monitor the virtual machines on the host and might not restart them after a failure.

Cause

A vSphere HA agent can be in the Agent Unreachable state for several reasons. This condition most often indicates that a networking problem is preventing vCenter Server or the primary host from contacting the agent on the host, or that all hosts in the cluster have failed. This condition can also indicate the unlikely situation that vSphere HA was disabled and then re-enabled on the cluster while vCenter Server could not communicate with the vSphere HA agent on the host, or that the ESXi host agent on the host has failed, and the watchdog process was unable to restart it. In any of these cases, a failover event is not triggered when a host goes into the Unreachable state.

Solution

Determine if vCenter Server is reporting the host as not responding. If so, there is a networking problem, an ESXi host agent failure, or a total cluster failure. After the condition is resolved, vSphere HA should work correctly. If not, reconfigure vSphere HA on the host. Similarly, if vCenter Server reports the hosts are responding but a host's state is Agent Unreachable, reconfigure vSphere HA on that host.

vSphere HA Agent is in the Uninitialized State

vSphere HA reports that an agent is in the Uninitialized state when the agent for the host is unable to enter the run state and become the primary host or to connect to the primary host. Consequently, vSphere HA is not able to monitor the virtual machines on the host and might not restart them after a failure.

Cause

A vSphere HA agent can be in the Uninitialized state for one or more reasons. This condition most often indicates that the host does not have access to any datastores. Less frequently, this condition indicates that the host does not have access to its local datastore on which vSphere HA caches state information, the agent on the host is inaccessible, or the vSphere HA agent is unable to open required firewall ports. It is also possible that the ESXi host agent has stopped.

Solution

Search the list of the host's events for recent occurrences of the event vSphere HA Agent for the host has an error. This event indicates the reason for the host being in the uninitialized state. If the condition exists because of a datastore problem, resolve whatever is preventing the host from accessing the affected datastores. If the ESXi host agent has stopped, you must restart it. After the problem has been resolved, if the agent does not return to an operational state, reconfigure vSphere HA on the host.

vSphere HA Agent is in the Initialization Error State

vSphere HA reports that an agent is in the Initialization Error state when the last attempt to configure vSphere HA for the host failed. vSphere HA does not monitor the virtual machines on such a host and might not restart them after a failure.

Cause

This condition most often indicates that vCenter Server was unable to connect to the host while the vSphere HA agent was being installed or configured on the host. This condition might also indicate that the installation and configuration completed, but the agent did not become a primary host or a secondary host within a timeout period. Less frequently, the condition is an indication that there is insufficient disk space on the host's local datastore to install the agent, or that there are insufficient unreserved memory resources on the host for the agent resource pool.

Solution

When a Configure HA task fails, a reason for the failure is reported.

Reason for Failure      Action
Host communication      Resolve any communication problems with the host and retry the configuration operation.
errors
Timeout errors          Possible causes include that the host crashed during the configuration task, the agent failed to
                        start after being installed, or the agent was unable to initialize itself after starting up. Verify that
                        vCenter Server is able to communicate with the host. If so, see vSphere HA Agent Is in the
                        Agent Unreachable State or vSphere HA Agent is in the Uninitialized State for possible solutions.
Lack of resources       Free up approximately 75MB of disk space. If the failure is due to insufficient unreserved
                        memory, free up memory on the host by either relocating virtual machines to another host or
                        reducing their reservations. In either case, retry the vSphere HA configuration task after resolving the problem.
Reboot pending          If an installation for a 5.0 or later host fails because a reboot is pending, reboot the host and
                        retry the vSphere HA configuration task.

vSphere HA Agent is in the Uninitialization Error State

The vSphere HA agent on a host is in the Uninitialization Error state. User intervention is required to resolve this situation.

vSphere HA reports that an agent is in the Uninitialization Error state when vCenter Server is unable to unconfigure the agent on the host during the Unconfigure HA task. An agent left in this state can interfere with the operation of the cluster.

Cause

This condition usually indicates that vCenter Server lost the connection to the host while the agent was being unconfigured.

Solution

Add the host back to vCenter Server (version 5.0 or later)

Unable to Download VIBs When Using vCenter Server Reverse Proxy

If vCenter Server is using a custom port for the reverse proxy, the custom port is not automatically enabled in the ESXi firewall and the VIB downloads fail.

# Solution
1. Open an SSH connection to the host and log in as root.
2. List the existing firewall rules.
    esxcli network firewall ruleset list
3. Back up the /etc/vmware/firewall/service.xml file.
    cp /etc/vmware/firewall/service.xml /etc/vmware/firewall/service.xml.bak
4. Edit the access permissions of the service.xml file to allow writes by running the chmod command.
    a. To allow writes, run chmod 644/etc/vmware/firewall/service.xml.
    b. To toggle the sticky bit flag, run chmod +t /etc/vmware/firewall/service.xml.
5. Open the service.xml file in a text editor.
6. Add a new rule to the service.xml file that enables the custom port for the vCenter Server reverse proxy .
    <service id='id_value'>
    <id>vcenterrhttpproxy</id>
    <rule id='0000'>
    <direction>outbound</direction>
    <protocol>tcp</protocol>
    <port type='dst'>custom_reverse_proxy_port</port>
    </rule>
    <enabled>true</enabled>
    <required>false</required>
    </service>
    
    Where id_value must be a unique value, for example, if the last listed service in the service.xml file has ID 0040, you must enter id number 0041.

7. Revert the access permissions of the service.xml file to the default read-only setting.
    chmod 444 /etc/vmware/firewall/service.xml
8. Refresh the firewall rules for the changes to take effect.
    esxcli network firewall refresh
9. List the updated rule set to confirm the change.
    esxcli network firewall ruleset list
10. If you want the firewall configuration to persist after a reboot of the ESXi host,
    copy the service.xml onto persistent storage and modify the local.sh file.
    a. Copy the modified service.xml file onto persistent storage, for example /store/, or onto a VMFS volume, for example /vmfs/volumes/volume/.
        cp /etc/vmware/firewall/service.xml location_of_xml_file
    You can store a VMFS volume in a single location and copy it to multiple hosts.
    b. Add the service.xml file information to the local.sh file on the host.
        cp location_of_xml_file /etc/vmware/firewall
        esxcli network firewall refresh
    Wherelocation_of_xml_file is the location to which the file was copied.

Troubleshooting vCenter HA Environment

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.avail.doc/GUID-CB537DD5-B6FD-4F32-9743-6B1AB88D9DA3.html

vCenter HA Clone Operation Fails During Deployment

If the vCenter HA configuration process does not create the clones successfully, you have to resolve that cloning error. Clone operation fails.

# Cause
1. Look for the clone exception. It might indicate one of the following problems.
    a. You have a DRS-enabled cluster, but do not have three hosts.
    b. The host or database connection is lost.
    c. Not enough disk space.
    d. Other Clone Virtual Machine errors
# Solution
1. Resolve the error that caused the problem.
2. Remove the cluster and start configuration again.

Redeploy the Passive or Witness node

If the passive or witness node fails and vCenter HA cluster was configured using the automatic cloning method, you can redeploy it in the vCenter HA Settings page.

# Procedure
1. Log in to the Active node with the vSphere Client.
2. Select the vCenter Server object in the inventory and select the Configure tab.
3. Select vCenter HA under Settings.
4. Click on the REDEPLOY button next to the node to start the Redeploy wizard.
5. 
    a. If your vCenter server is managed by another vCenter server in the same SSO domain, proceed to step 6.
    b. If your vCenter server is managed by another vCenter server in a different SSO domain, 
        input the location and credential details of that management vCenter server. 
        Enter the Management vCenter Server FQDN or IP address and Single Sign-On credentials.
6. Specify a unique name and target location.
7. Select the destination compute resource for the operation.
8. Select the datastore in which to store the configuration and disk files.
9. Configure the virtual machine networks.
    a. If you are redeploying the Passive node, select virtual machine Management (NIC 0) and vCenter HA (NIC 1) networks.
    b. If you are redeploying the Witness node, select vCenter HA (NIC 1) network.
    c. If there are issues with your selections, errors or compatibility warnings are displayed.
10. Review your selections and click Finish to redeploy the node.

Resolving Failover Failures

When a Passive node does not become the Active node during a failover, you can force the Passive node to become the Active node.

# Cause
A vCenter HA failover might not succeed for these reasons.
    a. The Witness node becomes unavailable while the Passive node is trying to assume the role of the Active node.
    b. An server state synchronization issue between the nodes exists.
# Solution
You recover from this issue as follows.
1. If the Active node recovers from the failure, it becomes the Active node again.
2. If the Witness node recovers from the failure, follow these steps.
    a. Log in to the Passive node through the Virtual Machine Console.
    b. To enable the Bash shell, enter shell at the appliancesh prompt.
    c. Run the following command.
        vcha-reset-primary
    d. Reboot the Passive node.
3. If both Active node and Witness node cannot recover, you can force the Passive node to become a standalone vCenter Server.
    a. Delete the Active node and Witness node virtual machines.
    b. Log in to the Passive node through the Virtual Machine Console.
    c. To enable the Bash shell, enter shell at the appliancesh prompt.
    d. Run the following command.
        vcha-destroy
    e. Reboot the Passive node.

Troubleshooting a Degraded vCenter HA Cluster

For a vCenter HA cluster to be healthy, each of the Active, Passive, and Witness nodes must be fully operational and be reachable over the vCenter HA cluster network. If any of the nodes fails, the cluster is considered to be in a degraded state.

# Cause
The cluster can be in a degraded state for a number of reasons:
1. One of the nodes fails
    a. If the Active node fails, a failover of the Active node to the Passive node occurs automatically. 
        After the failover, the Passive node becomes the Active node.
        At this point, the cluster is in a degraded state because the original Active node is unavailable.

        After the failed node is repaired or comes online, it becomes the new Passive node and the cluster returns to a healthy state
        after the Active and Passive nodes synchronize.

    b. If the Passive node fails, the Active node continues to function, but no failover is possible and the cluster is in a degraded state.
       If the Passive node is repaired or comes online, it automatically rejoins the cluster and the cluster state 
       is healthy after the Active and Passive nodes synchronize.

    c. If the Witness node fails, the Active node continues to function and replication between Active and Passive node continues, 
       but no failover can occur.
       If the Witness node is repaired or comes online, it automatically rejoins the cluster and the cluster state is healthy.

2. Database replication fails
    If replication fails between the Active and Passive nodes, the cluster is considered degraded. 
    The Active node continues to synchronize with the Passive node. If it succeeds, the cluster returns to a healthy state. 
    This state can result from network bandwidth problems or other resource shortages.

3. Configuration file replication issues
    If configuration files are not properly replicated between the Active and Passive nodes, the cluster is in a degraded state. 
    The Active node continues to attempt synchronization with the Passive node. 
    This state can result from network bandwidth problems or other resource shortages.

# Solution
How you recover depends on the cause of the degraded cluster state. 
If the cluster is in a degraded state, events, alarms, and SNMP traps show errors.

If one of the nodes is down, check for hardware failure or network isolation. Check whether the failed node is powered on.
In case of replication failures, check if the vCenter HA network has sufficient bandwidth and ensure network latency is 10 ms or less.

Recovering from Isolated vCenter HA Nodes

If all nodes in a vCenter HA cluster cannot communicate with each other, the Active node stops serving client requests.

# Problem
Node isolation is a network connectivity problem.

# Solution
1. Attempt to resolve the connectivity problem. If you can restore connectivity, 
   isolated nodes rejoin the cluster automatically and the Active node starts serving client requests.
2. If you cannot resolve the connectivity problem, you have to log in to Active node's console directly.
    a. Power off and delete the Passive node and the Witness node virtual machines.
    b. Log in to the Active node by using SSH or through the Virtual Machine Console.
    c. To enable the Bash shell, enter shell at the appliancesh prompt.
    d. Run the following command to remove the vCenter HA configuration.
    e. vcha-destroy -f
    f. Reboot the Active node.
    g. The Active node is now a standalone vCenter Server.
    9. Perform vCenter HA cluster configuration again

VMware vCenter HA Alarms and Events

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.avail.doc/GUID-1A52926C-030E-4B72-946A-8292566CCEB8.html

If a vCenter HA cluster is in a degraded state, alarms and events show errors.

Configuring vSwitch or vNetwork Distributed Switch from the command line in ESXi/ESX

Symptoms

You are unable to connect to an ESXi/ESX host via the network because uplinks (vmnics) have changed or are not in the correct order The Primary Service Console/VMkernel management interface is moved from a switch to a distributed switch on a non-routable network VLAN for the Management Network is changed or configured incorrectly on the uplink Switch Port

Verify vSwitch Configuration

# View the current vSwitch configuration and vmkernel interface configuration using these commands:
esxcli network vswitch standard list    # list current vswitch configuration
esxcli network vswitch dvs vmware list  # list Distributed Switch configuration
esxcli network ip interface list        # list vmkernel interfaces and their configuration
esxcli network nic list                 # display listing of physical adapters and their link state

Add or remove network cards (known as vmnics) - stanard vSwitch

# Add or remove network cards (known as vmnics) to or from a Standard vSwitch using these commands:
esxcli network vswitch standard uplink remove --uplink-name=vmnic --vswitch-name=vSwitch    # unlink/remove an uplink
esxcli network vswitch standard uplink add --uplink-name=vmnic --vswitch-name=vSwitch       # add an uplink

Add or remove network cards (known as vmnics) - vnetowrk Distributed Switch (vDS)

esxcfg-vswitch -Q vmnic -V dvPort_ID_of_vmnic dvSwitch  # unlink/remove a vDS uplink
esxcfg-vswitch -P vmnic -V unused_dvPort_ID dvSwitch    # add a vDS uplink

If connectivity was lost when migrating management networking to a Distributed Switch, it may be necessary to remove or disable the existing management vmkernel interface and recreate it in a Standard vSwitch port group with the same IP configuration.

On a vSphere Distributed Switch (vDS), delete an existing VMkernel port using this command:

esxcli network ip interface remove --interface-name=vmkX

Note: The vmk interface number used for management can be determined by running command
    esxcli network ip interface list

After the unreachable vmkernel port has been removed, it can be recreated on a Standard Switch.

If an existing Standard Switch does not exist, you can create a new one as well as a port-group to use with these commands:

esxcli network vswitch standard add --vswitch-name=vSwitch
esxcli network vswitch standard portgroup add --portgroup-name=portgroup --vswitch-name=vSwitch

Note: When creating a virtual switch, there are no linked vmnics by default. You will need to link vmnics as mentioned above

To create a VMkernel port and attach it to a portgroup on a Standard vSwitch, run these commands:

esxcli network ip interface add --interface-name=vmkX --portgroup-name=portgroup
esxcli network ip interface ipv4 set --interface-name=vmkX --ipv4=ipaddress --netmask=netmask --type=static

Note: By default, the ESXi, the management vmkernel port is vmk0 and resides in a Standard Switch portgroup called Management Network.

If the vmnics associated with the management network are VLAN trunks, you may need to specify a VLAN ID for the management portgroup.

# To set or correct the VLAN ID required for management connectivity on a Standard vSwitch, run this command:
esxcli network vswitch standard portgroup set -p portgroup --vlan-id VLAN

It may be necessary to restart the host's management agents if network connectivity is not restored despite a correct configuration:

services.sh restart

Verify ESXi and vSAN datastore

cd /vmfs/volumes    # list all the datastores
cd /vmfs/volumes/vsan:<datastore-id>

Exit ESXi host maintenance mode via ssh

VMware vim-cmd Quick Tutorial

# ssh to the ESXi host
vim-cmd hostsvc/maintenance_mode_exit

Useful ESXi esxcli commands

# Verify ESXi host storage core network adapter and network adapter
esxcli storage core adapter list
esxcli network nic list

# Restart ESXi host
esxcli system shutdown reboot --reason "Enter the reason"

ESXi VIB update

# Verify system in maintenance mode
esxcli system maintenanceMode get   
    a. Verify state "Enable"    # Maintenamce mode
    b. Connected    # In operational mode

# Method - remove existing and install new VIB
esxcli software vib list    # list all software vib installed, and identity the required VIB to be uninstalled
esxcli software vib remove -d "/vmfs/volumes/<path>/<vib-filenama>.zip"    # Remove the existing VIB
esxcli software vib install -d "/tmp/<vib.zip>"   # Install the VIB file, if VIB file in tmp directory
esxcli software vib list --rebooting-image  # Check the newly installed VIB in the rebooting image

esxcli software vib remove -n <name>
    esxcli software vib remove -n scsi-bnx2fnc

Note:
Open another ssh console, run
    tail -f /var/log/esxupdate.log  # monitoring the VIB installation

# Method - update VIB without need to remove it first
esxcli software vib update -d "/vmfs/volumes/<path>/<vib-filename>.zip"

# Reboot after VIB update
esxcli system shutdown Poweroff  --reason "system maintenance"  # Shutdown the ESXi host
esxcli system shutdown reboot --reason "system reboot"

How to backup and restore ESXi system

This process will backup the ESXi host configuration, then manually install ESXi host with an updated version and restore the configuration.

1. On the management server, run vicf-cfgbackup.pl to backup ESXi host configuration
vicfg-cfgbackup.pl --server=<esxi-FQDN|IP-address>
    --username=root --password=<root-credential>
    -s "d:\backup\esxibackup\<esxi-FQDN>.tgz"
    Note:
    Using "-s" switch to backup configuration
2. Format the esxi host USB or boot disk if required
3. Install fresh ESXi 6.7u3     # If upgrade from 6.7u1, after successful upgrade and restore, then upgrade to 7.0u1, and 7.0u3
    # Install as dummy install, without DNS IP and hostname
4. Put ESXi host in maintenance mode
5. Restore the newly installed ESXi with the backup configuration
    vicfg-cfgbackup.pl --server==<esxi-FQDN|IP-address>
        --username=root --password=<root-credential>
        -l "d:\backup\esxibackup\<esxi-FQDN>.tgz"
    Note:
    using "-l" switch to restore configuration
6. Verify ESXi host in vCenter and vSAN
How to enable ESXi Shell access using the Direct Console User Interface (DCUI)

https://kb.vmware.com/s/article/2004746

# Enabling ESXi Shell access using the vSphere Client
1. Login to a vCenter Server system using the vSphere Client
2. Select the host in the inventory panel
3. Click the Configuration tab -> Security Profile
4. In the Service section -> Properties
5. Select the ESXI Shell from the list
    a. ESXi Shell
    b. SSH
    c. Direct Console UI
6. click Optoins and select "Start and stop manually"
    Note: When you select Start and stop manually, the service does not start when you reboot the host. 
        If you want the service to start when you reboot the host, select Start and stop with host.
7. Click Start to enable the service
8. click OK

# Use the Host Client to enable local and remote access to the ESXi Shell
1. Log in to a Host Client using IP address of the host in a browser
2. Click on Manage under Navigator section
3. Click the Services tab
4, In the Services section, select TSM from the list
5. Click Actions and select Start to enable the ESXi shell.

# Use the direct console user interface to enable the ESXi Shell:
1. From the Direct Console User Interface, 
    press F2 to access the System Customization menu
2. Select Troubleshooting Options and press Enter
3. From the Troubleshooting Mode Options menu, 
    select Enable ESXi Shell

How to access ESXi Shell

# Accessing the local ESXi Shell
1. If you have direct access to the host, 
    press Alt+F1 to open the log in page on the machine's physical console.
2. Enter credentials when prompted
Note: To return to the Direct Console User Interface press Alt-F2

# Accessing the remote ESXi Shell
1. Open an SSH client, such as putty
2. Specify the IP address or domain name of the ESXi host, using TCP port 22
3. Enter credentials when prompted

How to update software profile

# ssh or run esxcli shell locall
esxcli software profile update -p <Vendor-ESXi> -d "/vmfs/volumes/<path>/VMware_ESXi_version_<vendor-version>.zip"

Challenges when ESXi host boot USB has less than 32GB in Size

When try to upgrade or install additional VIB, we are facing the issue with RAM disk size limit of 250MB, there is no enough free disk space in RAM disk. You will be error about 250MB disk space required, and only has 238MB free.

#*** Method 1 - Using vendor customize ISO to rebuild ESXi host
Note: Build from USB key directly connect to ESXi host is much faster than from a mounted ISO
# Preparation
1. Obtain the same specification of the physical host as the production ESXi host, or at least very close version
2. Directly connect the build ESXi host to a build desktop, such as Windows 10 desktop
3. The rebuild destkop and the rebuild ESXi host will be used to build and restore ESXi host configuration

# Result
After the ESXi host restore configuration, it will have the required ESXi host configuration.
It will restain vSAN disk group configuration.

# Steps
1. Backup the ESXi host configuration from the management server using PowerCLI, such as
    Get-VMHostFirmware -VMHost <ESXi-host-FQDN> -BackupConfiguration -DestinationPath "D:\ESXi-Backups"
2. Install a fresh copy of the original build to the physical ESXi host dual boot USB, such as HPE, DELL
Note: The version need to be the same as the running (backup version) of the ESXi host
    a. Plug dual boot USB into ESXi host internal USB slot
    b. Plug the HPE bootable USB at the front of the ESXi host
    c. Boot from the HPE bootable USB, and install ESXi to the internal USB
3. Configure the newly installed ESXi host
    a. Access DCUI and enable Shell and SSH access
        i. From the Direct Console User Interface, 
            press F2 to access the System Customization menu
        ii. Select Troubleshooting Options and press Enter
        iii. From the Troubleshooting Mode Options menu, 
            select Enable ESXi Shell
            Select Enable SSH
    b. Assign IP address to ESXi host, same subnet as the rebuild desktop
4. Using SCP or WinSCP to copy the backup configuration file to the ESXi host /tmp directory
5. Rename the backup configuration file
    rename configBundle-<ESXi-FQDN>.tgz file to configBundle.tgz    # The name need to be exactly configBundle.tgz
6. In the host's ESXi shell, either login locally or SSH from the rebuild desktop, run command
    i. Place the host into maintenance mode
        vim-cmd hostsvc/maintenance_mode_enter
    ii. Restore rebuild ESXi host with target ESXi host configuration
        vim-cmd hostsvc/firmware/restore_config 1 /tmp/configBundle.tgz

Note:
    a. The host will be rebooted after few seconds
    b. If using SSH, need to delete the ssh directory from the rebuild desktop.
        The ssh directory contains the ESXi host certificate thumpprint
        c:\users\<login-user>\.ssh 
7. Plug the requird update version of the bootable vendor ESXi USB into the front of the rebuid ESXi host
    boot from the front the USB
8. Select Upgrade to upgrade the internal USB (newly installed and restored ESXi host)
9. Reboot the rebuild ESXi host
    a. Verify host has no error
    b. Verify TCP/IP configuration
    c. Verify the root login credential as the restored ESXi host
10. Place the ESXi host into maintenance mode, by running command
    vim-cmd hostsvc/maintenance_mode_enter
11. Shutdown the ESXi host gracefully
    esxcli system shutdown poweroff --reason "maintenance"
    OR, press F2 to shutdown
12. Remove the rebuild USB key (interal USB), and replace the target ESXi host with the rebuilt USB
13. Verify the target ESXi host for any issue
14. Take the target ESXi host out of maintenance
15. Disable ESXi shell and SSH
#*** Method 2 - Build from VMware vanila ISO
1. Install build ESXi host with VMware vanila ISO, create the bootable USB using the VMware vanila ISO
    such as VMware-VMvisor-Installer-201912001-15160138.x86_64.iso
2. Install the required VIB and drivers
    a. Install NVIDIA vib
    esxcli software vib install -v /vmfs/volumes/temp/NVIDIA_bootbank_NVIDIA-VMware_ESXi_6.5_Host_Driver_440.107-1OEM.650.0.0.4598673.vib
    b. Install VMware tools
    esxcli software vib update -v /vmfs/volumes/temp/VMware_locker_tools-light_11.3.5.18557794-18558696.vib
    c. Install other drivers
    esxcli software vib update -d /vmfs/volumes/temp/VMW-ESX-6.7.0-nhpsa-2.0.44-offline_bundle-14136205.zip
    esxcli software vib update -d /vmfs/volumes/temp/ESXi670-202103001.zip
    esxcli software vib install -d /vmfs/volumes/temp/esxi6.7uX-mgmt-bundle-gen9.3.8.0.12-1.zip
    esxcli software vib install -v /vmfs/volumes/temp/HPE_bootbank_hpessacli-4.21.7.0-6.7.0.7535516.hpe.vib
3. Reboot
    Verify and ensure the ESXi host has no issue
3. Restore the target ESXi Host configuration backup file to the build ESXi host
    vim-cmd hostsvc/firmware/restore_config 1 /tmp/configBundle.tgz
    Note:
        The target ESXi host and the rebuild ESXi host on the same ESXi version and build number.
3. After reboot, configure ESXi host for NVIDIA GPU Shared Direct
    Update the Graphics Hardware Type
        Shared -> Shared Direct
    Note: Requirement for NVIDIA GPU    

# How to ssh to ESXi host from Windows 10
On the Windows 10 desktop, SSH to the ESXi host
    ssh <ESXi-host-ip> -l  root

# ssh to the ESXi host and verify the disk space and VIB installed
    df
    esxcli software vib list

    Note:
    Need to delete the ./ssh from the rebuild directly connected Windows 10 desktop  

VMware Useful Admin Tools

https://blogs.vmware.com/virtualblocks/2019/03/16/top-10-vmware-admin-tools/

1. RVTools
2. PowerCLI
3. vCheck
4. ESXTOP
5. Cross vCenter Workload Migration Utility

How to migrate VMkernel adapter from standard switch to distribute switch port groups

How to migrate service console / VMkernel port from standard switches to VMware vSphere Distributed Switch (1010614)

https://kb.vmware.com/s/article/1010614

Pre-requisites:

  1. Ensure that the dvPortGroup(s) used for the vmkernel or service console are configured to match the existing port group configuration on the standard switch (VLAN, Nic Teaming).
  2. When a new uplink is in use for the dVswitch, ensure you have network connectivity to the uplink with the same or relevant VLANs trunked through. Compare the configuration of the new uplink with the existing uplink of standard switch to ensure they are configured identically on the physical switch.
  3. When you are migrating the Management vmkernel or Service console, it is best practice to have access to the remote console of the ESXi host, in case the host gets unresponsive after the migration.
# To migrate one or more VMkernel or Service Console of one or more ESXi hosts from vCenter Server:
1. Navigate to Home > Inventory > Networking
2. Right-click the dVswitch
        that you have created or want to migrate to
3. If the host is already added to the dVswitch, click Manage Hosts
        else Click Add Host
4. Select the host(s), click Next
5. Select the physical adapters ( vmnic, such as vmicX) to use for the vmkernel, click Next
6. Select the Virtual adapter ( vmk, such as vmkX) to migrate and click Destination port group field. 
        For each adapter, select the correct port group from dropdown, Click Next
7. Click Next to omit virtual machine networking migration
8. Click Finish after reviewing the new vmkernel and Uplink assignment.
9. The wizard and the job completes moving both the vmk interface (vmk0, etc) and the vmnic (vmnic1, etc) to the dVswitch.
# To migrate the service console or vmkernel of one Host from vCenter

1. Click Host > Configuration > Networking
    Note: This is not a cluster wide option
2. Click the Distributed Virtual Switch view
3. Click Manage Virtual Adapters
4. Click Add
5. Select Migrate existing virtual network adapters
6. Click Next
7. Select the adapters you want to migrate
8. Select dvPortGroup from the dropdown
9. Click Next
10. Click Finish

Note:
    The migration does not interrupt traffic if the new Port Group and Uplinks are pre-configured properly.
How to migrate vmkernel to port group in distributed switch
1. In vCenter console, navigate to and expand the required datacenter -> cluster
2. Select the ESXi host, and select Configure -> Networking -> Virtual switches
3. On the right section, select "..." right next to "Manage Physical Adapter", and 
    select "Migrate Networking"
4. On Migrate Networking window, select "Manage VMkernel adapter" from left option, then
5. Under "On other switches/unclaimed", type and select vmkx (such vmk0, vmk1, etc)
6. Click "Assign port group"
7. In "Select Network" window
    a. under Name, type <port group name>, and select the port group name,
        such as pg-vds-pdc
    b. under Distributed Switch, type and select the pre-configured distributed switch
        such as vds-pdc
    Click OK, to close "Select Network" window
8. Back to Migrate Network window, verify and click Next
        Example
            Host/VMkernel Network Adapters  In Used by Switch   Source Port Group      Destination Port Group
            --------------------------------------------------------------------------------------------------
            vmk0 (Reassigned)                   vSwitch0        Management              pg-vds-pdc
9. On Migrate VM Network option
    Select or omit any VM for migration
10. On Ready to complete option, verify and click Finish

How to add ESXi hosts to the distributed switches

1. After creating the distributed switch in vcenter, select Network and choose the required distributed switch
2. Right click and select "Add and Managed Hosts"
3. In Add and Manage Hosts window
    a. Add hosts    <-------- Select "Add hosts"
        Add new hosts to this distributed switch
    b. Manage host networking
        Manage networking of hosts attached to this distributed switch
    c. Remove hosts
        Remove hosts from this distributed switch
4. In Select hosts, select the existing ESXi host(s), 
    or click + New hosts to add new ESXi host(s)
5. In Manage physical adapters, select required physical vmnicx adapter
    Click Assign uplink
6. On Select an Uplink | vmnicx Window, select required uplink
    Uplink(x)
    Note:
        Select "Apply this uplink assignment to the rest of the hosts", if 
            want to configure all the remaining hosts
7. Verify that have been populated with the required/selected vmnicx, uplinkx, and uplink port group
    a. Host/Physical Network Adapters
    b. In Use by Switch
    c. Uplink
    d. Uplink Port Group
8. In Manage VMkernel adapter, verify
    a. Host/VMkernel Network Adapters
    b. In Use by Switch
    c. Souce Port Group
    d. Destribution Port Group    
    Note
        If migrate existing vmk(x) from standard switch to distributed switch, then
        a. Host/VMkernel Network Adapters - vmk0 (Reassigned)
        b. In Use by Switch - vSwitch0
        c. Source Port Group - Management
        d. Destributed Port Group - pg-mgmt-vds-pdc     # example
9. In Migrate VM networking
    Select any VM that will be migrated at the same time, 
        click "Assign port group" and select the required destination port group
10. Confirm and click Finish

Host to remove management network vmkernel adapter and re-create

https://thevirtualist.org/remove-re-create-management-network-vmkernel-interface-using-esxi-command-line/

Confirm by checking the physical MAC address of all physical NICs and their link status, the existing vSwitch configuration, and the current vmkernel interfaces configuration. For ESXi, use these commands:

esxcfg-nics -l
esxcfg-vswitch -l
esxcfg-vmknic -l
# Prerequisite
Record the vmk0 TCP/IP configuration
    a. IP address
    b. Network Mask
    c. Default Gateway
    d. switch name and port group name, and port ID
    e. vLAN ID
# Method 1
1. Verify vmk0 port ID that is used
    esxcfg-vswitch -l | grep vmk0   # note down port ID number
2. Remove vmk0
    esxcli network ip interface remove --interface-name=vmk0
3. Recreate vmk0
    esxcli network ip interface add --interface-name=vmk0 --dvs-name=DVSWITCHNAME --dvport-id=PORT_ID_FROM_STEP_ONE
4. Configure TCP/IP
    esxcli network ip interface ipv4 set --interface-name=vmk0 --ipv4=IP --netmask=NETMASK --type=static
5. Set default gateway
    esxcfg-route –a <vmk0-ip-gateway>
6. Mark vmk0 for Management traffic
    esxcli network ip interface tag add -i vmk0 -t Management

# Alternative
1. Remove vmk0
    esxcfg-vmknic –d –p Management      # “Management” portgroup name
2. Add a vmknic to a port group
    esxcfg-vmknic –a –i <vmk0-ip> –n <netmask> Management   # using "Management" portgroup name
3. Mark vmk0 for Management 
    esxcli network ip interface tag add -i vmk0 -t Management
# Method 2 - 
https://gist.github.com/DevoKun/bfb5898da72050c95c5515755a4b5780

1. List existing ports
    esxcfg-vswitch -l
    esxcli network vswitch dvs vmware list 
    esxcfg-vswitch -Q vmnic0 -V ${DVS_ID_FROM_ABOVE} ${DVS_NAME}
2. Remove vmk0
    esxcli network ip interface remove --interface-name=vmk0
3. Create a new vSwitch
    esxcfg-vswitch -l
    esxcfg-vswitch -a vSwitch0

    esxcfg-vswitch -l
    esxcfg-vswitch -L vmnic0 vSwitch0
    esxcli network nic list
    esxcfg-vswitch -l
4. Create the Management port group
    esxcfg-vswitch -A Management vSwitch0
    esxcfg-vswitch -l
5. Create vmk0 and assign to the Management port group
    esxcli network ip interface list
    esxcli network ip interface add --interface-name=vmk0 --portgroup-name=Management
    esxcli network ip interface list
6. Configure TCP/IP on vmk0
    esxcli network ip interface ipv4 set \
    --interface-name=vmk0 \
    --ipv4=<vmk0-ip> \
    --netmask=<vmk0-netmask> \
    --type=static

    esxcfg-route -a default <vmk0-default-gateway>
7. Tag vmk0 for use as the Management interface
    esxcli network ip interface tag add -i vmk0 -t Management

In ESXi 5.x, most of the legacy commands used in 4.x will continue to work. VMware recommends using their esxcli equivalents where possible as legacy esxcfg commands will be deprecated in a future release.

# Method 3 - commands and information to restore management network connectivity via the correct vmnic interface
Configuring vSwitch or vNetwork Distributed Switch from the command line in ESXi/ESX (1008127)
https://kb.vmware.com/s/article/1008127

# In ESXi 5.x and 6.x
1. View the current vSwitch configuration and vmkernel interface configuration using these commands:
    esxcli network vswitch standard list    # list current vswitch configuration
    esxcli network vswitch dvs vmware list  # list Distributed Switch configuration
    esxcli network ip interface list        # list vmkernel interfaces and their configuration
    esxcli network nic list                 # display listing of physical adapters and their link state

2. Add or remove network cards (known as vmnics) to or from a Standard vSwitch using these commands:
    esxcli network vswitch standard uplink remove --uplink-name=vmnic --vswitch-name=vSwitch    # unlink an uplink
    esxcli network vswitch standard uplink add --uplink-name=vmnic --vswitch-name=vSwitch       # add an uplink

3. Add or remove network cards (known as vmnics) to or from a vNetwork Distributed Switch (vDS) using these commands:
    esxcfg-vswitch -Q vmnic -V dvPort_ID_of_vmnic dvSwitch # unlink/remove a vDS uplink
    esxcfg-vswitch -P vmnic -V unused_dvPort_ID dvSwitch # add a vDS uplink
    Note:
    If connectivity was lost when migrating management networking to a Distributed Switch, 
        it may be necessary to remove or disable the existing management vmkernel interface and 
        recreate it in a Standard vSwitch port group with the same IP configuration.

4. On a vSphere Distributed Switch (vDS), delete an existing VMkernel port using this command:
    esxcli network ip interface remove --interface-name=vmkX
        Note: 
        The vmk interface number used for management can be determined by running the command
            esxcli network ip interface list

After the unreachable vmkernel port has been removed, it can be recreated on a Standard Switch.

5. If an existing Standard Switch does not exist, 
        you can create a new one as well as a port-group to use with these commands:
    esxcli network vswitch standard add --vswitch-name=vSwitch0
    esxcli network vswitch standard portgroup add --portgroup-name=portgroup --vswitch-name=vSwitch0
        Note: 
        When creating a virtual switch, there are no linked vmnics by default. 
        You will need to link vmnics as described earlier in this article.

6. To create a VMkernel port and attach it to a portgroup on a Standard vSwitch, run these commands:
    esxcli network ip interface add --interface-name=vmkX --portgroup-name=portgroup
    esxcli network ip interface ipv4 set --interface-name=vmkX --ipv4=ipaddress --netmask=netmask --type=static

Note: By default, the ESXi, the management vmkernel port is vmk0 and resides in a Standard Switch portgroup called Management Network.

7. If the vmnics associated with the management network are VLAN trunks, you may need to specify a VLAN ID for the management portgroup. 
To set or correct the VLAN ID required for management connectivity on a Standard vSwitch, run this command:
    esxcli network vswitch standard portgroup set -p portgroup --vlan-id VLAN

8. It may be necessary to restart the host's management agents if network connectivity is not restored despite a correct configuration:
    services.sh restart

How to troubleshooting vMotion

https://kb.vmware.com/s/article/65184

https://kb.vmware.com/s/article/2020669

https://kb.vmware.com/s/article/1030264

# Procedure
1. Identify which vmknics are used for vMotion on both hosts.
Option 1 Using the vCenter Server User Interface (UI)
    a. Select the ESXi host
    b. On right pane, select Configure -> Networking -> VMKernel adapters
        Verify the vmkernel adapter with vMotion enable

Option 2 Using the ESXi command line
    esxcli network ip interface list    # list the vmkernel adapters
    esxcli network ip interface tag -i [VMkernel adapter name]
        # Example
            esxcli network ip interface tag -i vmk0
            esxcli network ip interface tag -i vmk1
        Verify
            Tags: VMotion

2. Run diagnostic tools between source and destination vMotion VMkernel adapters
    vmkping
    vmkping -I vmk(x) <dest-esxi-host-ip|hostname>
Note
    a. vmkping uses a VMkernel’s TCP/IP stack to send ICMP traffic to a destination host.
    b. With long distance vMotion, the maximum supported RTT is 150 miliseconds.

    nc is the netcat utility and can be used to verify connectivity on a specific remote port.

3. Run nc command verify vMotion TCP port 8000
    a. From source ESXi host, run this command:
        nc -z <dest-host-ip> 8000 -v    
            # -v  verbose
    b.  From destination ESXi host, run this command:
        nc -z <source-host-ip> 8000 -v
    c. Sample successful output:
        Connection to 192.168.0.2 8000 port [tcp/*] succeeded!
    d. Sample failed output:
        Connection to 192.168.0.2 8000 port [tcp/*] failed: connection refused

Troubleshooting Network and TCP/UDP Port Connectivity Issue

https://kb.vmware.com/s/article/2020669

using these troubleshooting tools:

# Troubleshoot network connectivity between two servers
    ping/vmkping

# Troubleshoot TCP port connectivity
    netcat (nc)

# Troubleshoot SSL port connectivity and verify SSL certificate information
    openssl

# collect packet traces to troubleshoot network issues
    tcpdump-uw & pktcap-uw

# View active TCP/UDP connections to the host
    netstat & esxcli network

vmkping

netcat (nc)

use netcat (nc) to confirm connectivity to a TCP port on a remote host

# Get help
    nc -h

# Syntax
    nc -z <destination-ip> <destination-port>

The nc command can also be used to check the connectivity to a range of TCP ports on a remote host

# syntax - example
# -w    wait
# 20-81     Example port range
s
    nc -w 1 -z 192.168.48.133 20-81
    nc -z 192.168.48.133 80 443 9443    # test few different ports
Testing SSL port connectivity and certificate information with openssl

To test SSL ports, you can use the openssl command to test connectivity and also to confirm the current SSL information.

# Syntax
    openssl s_client -connect destination-ip:ssl-port

Example
    openssl s_client -connect 192.168.48.133:443
tcpdump-uw and pktcap-uw

tcpdump-uw can be used on ESXi hosts to capture packet traces from a vmkernel (vmk) interface.

To display packets on the vmkernel interface vmk0, use the tcpdump-uw command with the -i option

    tcpdump-uw -i vmk0

To capture the entire packet, use the tcpdump-uw command with the -s option with a value of 1514 for normal traffic and 9014 if Jumbo Frames are enabled

# Normal traffic
    tcpdump-uw -i vmk0 -s 1514

# Jumbo Frames enabled
    tcpdump-uw -i vmk0 -s 9014 -B 9
# To display all of the packets on vmk0 with verbose detail, with the -vvv option
    tcpdump-uw -i vmk0 -s 1514 -vvv

# To display only the TCP packets, use tcp option
    tcpdump-uw -i vmk0 -s 1514 tcp

# To see traffic to/from only a single IP address, you can use the host option
    tcpdump-uw -i vmk0 -s 1514 host x.x.x.x

# To avoid seeing unwanted traffic types in the tcpdump-uw output, use the not option.
    tcpdump-uw -i vmk0 -s 1514 port not 22 and port not 53

To limit the log files to a specified number, you can use the -W option. You can use this option if a trace must be set running for a long period of time, waiting for an even to occur.

# This command creates 10 trace files of size 100MB each
# Note: Ensure do not run out of space on esx
    tcpdump-uw -i vmk0 -s 1514 -C 100M -W 10 -w /var/tmp/test.pcap
Using the pktcap-uw tool in ESXi 5.5 and later (2051814)

https://www.virten.net/2015/10/esxi-network-troubleshooting-with-tcpdump-uw-and-pktcap-uw/

The pktcap-uw tool is an enhanced packet capture and analysis tool that can be used in place of the legacy tcpdump-uw tool. The pktcap-uw tool is included by default in ESXi 5.5 and later versions.

# Get help
    pktcap-uw -h |more

# view a live capture of a vmkernel ports traffic
    pktcap-uw --vmk vmkX    # Example  pktcap-uw --vmk vmk1

# To view a live capture of a specific physical network card on the host vmnic:
    pktcap-uw --uplink vmnicX

How to identify the VM Port ID

# Works with both, distributed and standard switches
    To identify the Port-ID open 
        esxtop 
            and press n

Useful pktcap-uw commands for packet capture of VM traffic

# Use the Port-ID (example 33554439) of the virtual machines network interface to capture the traffic
    pktcap-uw --switchport 33554439

# To capture traffic that goes inside the virtual machine
    pktcap-uw --switchport 33554439 --capture PortOutput

# To capture any dropped packets use this command
    pktcap-uw --capture Drop

# To capture traffic that goes inside the virtual machine,
    pktcap-uw --switchport 33554439 --capture PortOutput

Stop pktcap-uw tracing with the kill command

    kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)

# Verify pktcap tracking has stopped
    lsof | grep pktcap-uw | awk '{print $1}' | sort -u
Advanced Usage - trace multiple ports at the same time

As an example, trace a particular vSwitch port and its associated uplink at the same time:

# pktcap-uw Syntax for Capturing Packets
    pktcap-uw switch_port_arguments capture_point_options filter_options output_control_options

# To get the vSwitch port number, run this command:
    net-stats -l

# Identify and make a note these parameters:
a. Port ID 
    returned by the esxtop command — --switchport 50331665
b. vmnic2 physical port 
    that you want to trace — --uplink vmnic2
c. location of the output pcap file 
    /tmp/vmnic2.pcap

d. Run the pktcap-uw command to capture packets at both points simultaneously:
# run multiple capture at the same time
    pktcap-uw --switchport 50331665 -o /tmp/50331665.pcap & pktcap-uw --uplink vmnic2 -o /tmp/vmnic2.pcap &


Viewing active TCP/UDP connections with netstat and esxcli network
# syntax
    excfg-vmknic -l     # list all vmkernel interface
    esxcli network ip interface list   # list vmkernel interfaces / adapters
# Verify TCP connections
    netstat -tnp    # view active TCP connections
    netstat -tunp   # view active TCP/UDP connections
    esxcli network connection list  # view active network connections

To retrieve errors and statistics for a network adapter,

    esxcli network nic stats get -n <vmnicX>

esxcli and vim-cmd commands

Collecting information about tasks in VMware ESXi/ESX (1013003) https://kb.vmware.com/s/article/1013003

When there are troubleshooting issues with ESXi/ESX hosts and VMware vCenter Server, there may be differences between what vCenter Server and an ESXi/ESX host considers tasks.

esxcli vm process list  # list active VMs

vim-cmd vimsvc/task_list    # To get a list of tasks on the host

# To get a list of tasks associated to specific virtual machines, 
    you must first get the Vmid of the virtual machine. Run the command
        vim-cmd vmsvc/getallvms 

# When you have the Vmid, you can then get a list of tasks associated with a specific virtual machine
    vim-cmd vmsvc/get.tasklist VMID

# List network configuration
    vim-cmd hostsvc/net/info | grep

Migrate VMkernel adapter on a host from Distributed Swtich to vSphere Standard Switch

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.networking.doc/GUID-55AFA706-C424-4308-9892-199C9B7A5877.html

If a host is associated with a distributed switch, you can migrate VMkernel adapters from the distributed to a standard switch.

# Prerequisites
Verify that the destination standard switch has at least one physical NIC.

# Procedure
1. In the vSphere Client, navigate to the host, or
    select the ESXi host in the "hosts & Cluster View"
2. On the Configure tab, 
    expand Networking and select Virtual Switches.
3. On the right pane, select the destination standard switch from the list.
4. click "..." and select Migrate VMkernel Adapter.
5. On the Select VMkernel adapter page, 
    select the virtual network adapter, such vmk0 to migrate to the standard switch from the list.
6. On the Configure settings page, 
    edit the Network label and VLAN ID for the network adapter.
7. On the Ready to complete page, review the migration details and click Finish.
    Click Back to edit settings.

Access ESXi host Direct Connect UI (DCUI) console from iDRAC (DELL), iLO (HP) or other vendor console method, then press F2 and login as root.

  1. Select Configure Management Network If Network Adapters and VLAN (optional) are greyed out, as the ESXi host management VMkernel is managed by vDS, due to the configuration of VMkernel management interfaces and VLANs must be done on vDS. We would need to carry out the process of removing the host from vDS to standard switch.
  2. Need to verify the Network Adapter(s) are enabled or selected in DCUI, by navigate to Network Restore Options https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.esxi.install.doc/GUID-7EAF0251-1EEA-49DC-AD71-39E5F25906BF.html
a. Restore vDS
    If continue to use vDS to managed ESXi
b. Restore Standard Switch  <---- select this option
    As we are migrating ESXi host from vDS back to standard switch

Then press F11 to confirm, and it will then restart the management network
  1. Change the management IP address and vLAN From DCUI -> Configure Managment Network, then select VLAN (optional) to change the VLAN to the new VLAN. Then, select IP Configuration to change
a. IP Address
b. Subnet Mask
c. Default Gateway
  1. You could restore the vmk0 interface from ESXi shell
# ESXi shell command
esxcli network vswitch standard add -v vSwitch0
esxcli network vswitch standard portgroup add -p 'Management Network' -v vSwitch0
esxcli network ip interface add -i vmk0 -p 'Management Network'

Note:
To verify ESXi network
esxcli network vswitch stanard list
esxcli network vswitch dvs wmare list
esxcli network nic list
esxcli network ip interface ipv4 get
  1. To check and enable network adapter for management adapters From DCUI -> Configure Managment Network -> network Adapters, then select the requied netowrk adapter that have the physical connection(s).

  2. Finally, update the DNS entries for the ESXi host management IP.

  3. Ping the ESXi management IP and ensure it is successful.

How to rename a VMware ESXi host using the command line
  1. SSH to ESXi, or access ESXi shell from DCUI
a. Need to enable ssh and ESXi shell from vCenter, or
b. From DCUI, press F2 and login as root
    Navigate to Troubleshooting Options -> Enable ESXi Shell and SSH
  1. If the ESXi host is part of cluster, first enter the Maintenance mode to remove it from the cluster
  2. If the ESXi host is managed by vCenter, then remove the ESXi host from Inventory
  3. . Run command
esxcli system hostname set --host=hostname
esxcli system hostname set --fqdn=fqdn
  1. Join the ESXi host to vCenter and the cluster
vmkfstools

vmkfstools is one of the ESXi Shell commands for managing VMFS volumes, storage devices, and virtual disks. You can perform many storage operations using the vmkfstools command. For example, you can create and manage VMFS datastores on a physical partition, or manipulate virtual disk files, stored on VMFS or NFS datastores.

# command syntax
vmfstools <options> <target> <-v>
    -v  # verbose

# check and repair virtual disks
vmksfstools  -x | --fix [check | repairs]
    vmksfstools -x check /vmfs/volumes/..../test.vmdk   # Check the vmdk file

# clone disk
vmksfstools -i /vmfs/volumes/sourcevmfs/testvm/test.vmdk  /vmfs/volumes/destvmfs/testvm2/test2.vmdk
Investigating disk space on an ESX or ESXi host

https://kb.vmware.com/s/article/1003564

df -h   # check disk free space
vdf -h  # Determine the free disk space on each filesystem using the command:
        # Reivew the used % any volumes listed are 100% full
du -h --max-depth=1  <dir>
    This command lists the directories within a given filesystem that contain the largest files. 
    By starting at the root (/) directory and finding the largest directories, 
    you can then drill down into these directories (using cd) and execute the same command recursively,
    until you find the files themselves which are occupying space.

# Directory usage
    /vmimages/      # Used to store operating system install files such as the VMware Tools or other ISO files.
    var/core/ and /root/    # sed to store crash files for on the service console and the VMkernel.
    /var/log/       # Used to store the majority of the logs for the ESX host.
    /vmfs/volumes/  # Used to store the virtual machine data.

# To review the space consumed by several of these common directories, run this command:    
    du -ch /vmimages /var/core /root /var/log

    find / -size +10M -exec du -h {} \; | less  # find files that are bigger than defind size

# To delete the file, using rm
# To zero the file, but not delete the file
    # > <file to be zero>       #  > file.log
# To compress the old log files
tar czvf /tmp/vmkwarning-logs.tgz /var/log/vmkwarning*
tar czvf /tmp/vmkernel-logs.tgz /var/log/vmkernel.*
tar czvf /tmp/messages-logs.tgz /var/log/messages.*

# To delete the source files
rm /var/log/vmkwarning.* /var/log/vmkernel.* /var/log/messages.*

# Move the new archive files back to your /var/log/ partition for long-term storage using the command:
mv /tmp/vmkwarning-logs.tgz/tmp/vmkernel-logs.tgz/tmp/messages-logs.tgz /var/log/
Test remote port connectivity
nc -z <dest-ip>  <dest-port>
nc -w 1 -z 192.168.1.10  20-81      # test range of ports