- Published on
VMware vSphere Troubleshooting
- Authors
- Name
- Jackson Chen
Check vSphere and vCenter Monitoring and Performance
Overview of VMware Tools
https://kb.vmware.com/s/article/340
This article:
- Includes an overview of VMware Tools.
- Lists the product documentation that contains instructions for installing VMware Tools.
- Provides additional links to VMware Tools installation instructions and troubleshooting information.
VMware Admin Tools
https://blogs.vmware.com/virtualblocks/2019/03/16/top-10-vmware-admin-tools/
Troubleshooting Virtual Machines
Error Message When You Try to Migrate Virtual Machine with USB Devices Attached
Migration with vMotion cannot proceed and issues a confusing error message when you connect multiple USB devices from an ESXi host to a virtual machine and one or more devices are not enabled for vMotion.
To successfully pass vMotion compatibility checks, you must enable all USB devices that are connected to the virtual machine from a host for vMotion. If one or more devices are not enabled for vMotion, migration will fail.
# Solution
1. Make sure that the devices are not in the process of transferring data before removing them.
2. Re-add and enable vMotion for each affected USB device
Cannot Copy Data From an ESXi Host to a USB Device That Is Connected to the Host
You can connect a USB device to an ESXi host and copy data to the device from the host. For example, you might want to gather the vm-support bundle from the host after the host loses network connectivity. To perform this task, you must stop the USB arbitrator.
This problem occurs because the nonbootable USB device is reserved for the virtual machine by default. It does not appear on the host's file system, even though lsusb can see the device.
# Solution
1. Stop the usbarbitrator service
/etc/init.d/usbarbitrator stop
2. Physically disconnect and reconnect the USB device.
By default, the device location is
/vmfs/devices/disks/mpx.vmhbaXX:C0:T0:L0.
3. After you reconnect the device, restart the usbarbitrator service
/etc/init.d/usbarbitrator start
4. Restart hostd and any running virtual machines to restore access to the passthrough devices in the virtual machine
Troubleshooting with logs
You can often obtain valuable troubleshooting information by looking at the logs provided by the various services and agents that your implementation is using. Most logs are located in /var/log/ for vCenter Server deployments.
# Common Log Directories
Log Directory Description
--------------------------------
applmgmt VMware Appliance Management Service
cloudvm Logs for allotment and distribution of resources between services
cm VMware Component Manager
firstboot Location where first boot logs are stored
rhttpproxy Reverse Web Proxy
sca VMware Service Control Agent
statsmonitor Vmware Appliance Monitoring Service
vapi VMware vAPI Endpoint
vmaffd VMware Authentication Framework daemon
vmdird VMware Directory Service daemon
vmon VMware Service Lifecycle Manager
Management Node Logs
The following logs are available if a management node deployment is chosen.
# Management Node Log Directories
Log Directory Description
--------------------------------------------
autodeploy VMware vSphere Auto Deploy Waiter
content-library VMware Content Library Service
eam VMware ESX Agent Manager
invsvc VMware Inventory Service
mbcs VMware Message Bus Config Service
netdump VMware vSphere ESXi Dump Collector
perfcharts VMware Performance Charts
vmcam VMware vSphere Authentication Proxy
vmdird VMware Directory Service daemon
vmware-sps VMware vSphere Profile-Driven Storage Service
vmware-vpx VMware VirtualCenter Server
vpostgres vFabric Postgres database service
mbcs VMware Message Bus Config Service
vcha VMware High Availability Service
Virtual machine hardware
Virtual machine hardware versions (1003746)
https://kb.vmware.com/s/article/1003746
Some common symptoms
- A virtual machine does not power on.
- Some virtual machine operations are greyed out and unavailable.
- You experience unexpected behavior in a guest operating system.
# Resolution
1. VMware products and their virtual hardware version
2. Upgrading the virtual hardware
a. Power on the virtual machine.
b. Install VMware Tools.
c. Power off the virtual machine.
d. Change the hardware setting, change hardware compatibility
Products Virtual Hardware Version
--------------------------------------
ESXi 7.0 U2 (7.0.2) 19
ESXi 7.0 U1 (7.0.1) 18
ESXi 7.0 (7.0.0) 17
ESXi 6.7 U2 15
ESXi 6.7 14
ESXi 6.5 13
ESXi 6.0 11
vSphere 7.0 – Upgrade virtual VM hardware and VMware Tools
https://4sysops.com/archives/vsphere-7-0-upgrade-virtual-vm-hardware-and-vmware-tools/
The website provides some common upgrade scenarios.
How to properly change virtual SCSI controller for VMware PVSCSI
https://kb.vmware.com/s/article/1002149
https://www.vladan.fr/how-to-change-virtual-scsi-controler/
The following explains how to properly change virtual machine virtual SCSI controller to VMware PVSCSI.
1. Power off VM
2. Clone the VM as backup
3. Add a new disk (1GB in size).
Important: This disk need to use VMware paravirtual SCSI controller
4. Power on VM
VM should install the drivers automatically and recognise the disk
Note: You might have to initialise, bring online and format the volume
5. After all drivers have been properly installed, restart the VM, then gracefully shutdown the VM
6. Change the first controller to PVSCSI
7. Delete the newly added 1GB disk
Stopping, Starting or Restarting VMware vCenter Server Appliance 6.x and above services and vCenter Services
https://kb.vmware.com/s/article/2109887
In VMware vCenter Server 6.0 and later, VMware recommends to use the vSphere Web Client or Service Control command-line tool to stop, start, or restart vCenter Server Appliance services.
Listing the vCenter Server Appliance services, start and stop
# To list the vCenter Server Appliance services within the vSphere Web Client
1. Log in to the vSphere Web Client with a vCenter Single Sign-on administrator account.
2. Navigate to Administration > Deployment > System Configuration.
3. Click Nodes, select the vCenter Server Appliance node and click the Related Objects tab.
# To list the vCenter Server Appliance services using the command-line:
1. Log in as root through an SSH or console session on the vCenter Server Appliance.
2. Run this command to launch the shell
shell
3. change directories to /bin:
cd /bin
4. list the vCenter Server Appliance services
service-control --list
5. To view the current status of the vCenter Server Appliance services
service-control --status
# Start services
service-control --start <servicename> # Start specific service
service-control --start --all # start all services
# To stop a vCenter Server Appliance service using the vSphere Web Client
1. Log in to the vSphere Web Client with a vCenter Single Sign-on administrator account.
2. Navigate to Administration > Deployment > System Configuration.
3. Click Nodes, select the vCenter Server Appliance node and click the Related Objects tab.
4. Right-click on the service you would like to stop and select Stop.
# Stop services
service-control --stop <servicename> # Stop specific service
service-control --stop --all # stop all services
vCenter Server 7.x services
Service Description
--------------------------------------------------
vmware-vmon VMware Service Lifecycle Manager
vmonapi VMware Service Lifecycle Manager API
vmafdd VMware Authentication Framework
vmdird VMware Directory Service
vmcad VMware Certificate Service
lookupsvc VMware Lookup Service
vmware-sca VMware Service Control Agent
vmware-stsd VMware Security Token Service
vmware-rhttpproxy VMware HTTP Reverse Proxy
vmware-envoy VMware Envoy Proxy
vmware-netdumper VMware vSphere ESXi Dump Collector
vmware-vapi-endpoint VMware vAPI Endpoint
vmware-vpxd-svcs VMware vCenter-Services
vmware-perfcharts VMware Performance Charts
applmgmt VMware Appliance Management Service
vmware-statsmonitor VMware Appliance Monitoring Service
vmware-cis-license VMware License Service
vmware-vpostgres VMware Postgres
vmware-postgres-archiver VMware Postgres Archiver
vmware-vdtc VMware vSphere Distrubuted Tracing Collector
vmware-vpxd VMware vCenter Server
vmware-eam VMware ESX Agent Manager
vmware-vsm VMware vService Manager
vmware-sps VMware vSphere Profile-Driven Storage Service
pschealth VMware Platform Services Controller Health Monitor
vmware-rbd-watchdog VMware vSphere Auto Deploy Waiter
vmware-content-library VMware Content Library Service
vmware-imagebuilder VMware Image Builder Manager
lwsmd Likewise Service Manager
vmcam VMware vSphere Authentication Proxy
vmware-vcha VMware vCenter High Availability
vmware-updatemgr VMware Update Manager
vmware-vsan-health VMware VSAN Health Service
vsphere-ui VMware vSphere Client
vmware-hvc VMware Hybrid VC Service
vmware-trustmanagement VMware Trust Management Service
vmware-certificatemanagement VMware Certificate Management Service
vmware-certificateauthority VMware Certificate Authority Service
vmware-pod VMware Patching and Host Management Service
vlcm VMware vCenter Lifecycle API
vmware-analytics VMware Analytics Service
vmware-topologysvc VMware Topology Service
vmware-infraprofile VMware Infraprofile Service
wcp Workload Control Plane
vtsdb VMware vTsdb Service
vstats VMware vStats Service
observability VMware VCSA Observability Service
observability-vapi VMware VCSA Observability VAPI Service
vCenter Server Appliance Services
Service Name Description
---------------------------------------------
applmgmt VMware Appliance Management Service
vmware-cis-license VMware License Service
vmware-cm VMware Component Manager
vmware-eam VMware ESX Agent Manager
vmware-sts-idmd VMware Identity Management Service
vmware-invsvc VMware Inventory Service
vmware-mbcs VMware Message Bus Configuration Service
vmware-netdumper VMware vSphere ESXi Dump Collector
vmware-perfcharts VMware Performance Charts
vmware-rbd-watchdog VMware vSphere Auto Deploy Waiter
vmware-rhttpproxy VMware HTTP Reverse Proxy
vmware-sca VMware Service Control Agent
vmware-sps VMware vSphere Profile-Driven Storage Service
vmware-stsd VMware Security Token Service
vmware-syslog VMware Common Logging Service
vmware-syslog-health VMware Syslog Health Service
vmware-vapi-endpoint VMware vAPI Endpoint
vmware-vdcs VMware Content Library Service
vmafdd VMware Authentication Framework
vmcad VMware Certificate Service
vmdird VMware Directory Service
vmware-vpostgres VMware Postgres
vmware-vpx-workflow VMware vCenter Workflow Manager
vmware-vpxd VMware vCenter Server
vmware-vsm VMware vService Manager
vsphere-client vSphere Web Client
vmware-vws VMware System and Hardware Health Manager
vmware-vsan-health VMware vSAN Health Service
Troubleshooting with Logs
The following logs are common to all vCenter Server deployments. Most logs are located in /var/log/ for vCenter Server deployments.
Common vCenter Server Logs
# Common all vCenter Server deployments
Log Directory Description
--------------------------------------------------------------
vmware/applmgmt VMware Appliance Management Service
cloudvm Logs for allotment and distribution of resources between services
firstboot Location where first boot logs are stored
rhttpproxy Reverse Web Proxy
sca VMware Service Control Agent
vmware/applmgmt VMware Appliance Monitoring Service
vapi VMware vAPI Endpoint
vmafdd VMware Authentication Framework daemon
vmdird VMware Directory Service daemon
vmon VMware Service Lifecycle Manager
Management Node Logs
The following logs are available if a management node deployment is chosen.
Log Directory Description
-----------------------------------------------------------
vmware/rbd VMware vSphere Auto Deploy Waiter
content-library VMware Content Library Service
eam VMware ESX Agent Manager
netdumper VMware vSphere ESXi Dump Collector
perfcharts VMware Performance Charts
vmcam VMware vSphere Authentication Proxy
vmdird VMware Directory Service daemon
vmware-sps VMware vSphere Profile-Driven Storage Service
vpxd VMware VirtualCenter Server
vpostgres vFabric Postgres database service
vcha VMware High Availability Service
vCenter Server Upgrade Fails When Unable to Stop Tomcat Service
A vCenter Server upgrade can fail when the installer is unable to stop the Tomcat service.
If the vCenter Server installer cannot stop the Tomcat service during an upgrade, the upgrade fails with an error message similar to Unable to delete VC Tomcat service. This problem can occur even if you stop the Tomcat service manually before the upgrade, if some files that are used by the Tomcat process are locked.
Solution
1. From the Windows Start menu, select Settings > Control Panel > Administrative Tools > Services.
2. Right-click VMware VirtualCenter Server and select Manual.
3. Right-click VMware vCenter Management Webservices and select Manual.
4. Reboot the vCenter Server machine before upgrading.
This releases any locked files that are used by the Tomcat process, and enables the vCenter Server installer to stop the Tomcat service for the upgrade.
Alternatively, you can restart the vCenter Server machine and restart the upgrade process, but select the option not to overwrite the vCenter Server data.
Troubleshooting vCenter Server and ESXi Host Certificates
Certificates are automatically generated when you install vCenter Server. These default certificates are not signed by a commercial certificate authority (CA) and might not provide strong security. You can replace default vCenter Server certificates with certificates signed by a commercial CA. When you replace vCenter Server and ESXi certificates, you might encounter errors.
New vCenter Server Certificate Does Not Appear to Load
When you install new vCenter Server certificates, you might not see the new certificate.
Cause: Existing open connections to vCenter Server are not forcibly closed and might still use the old certificate.
#To force all connections to use the new certificate, use one of the following methods.
1. Restart the network stack or network interfaces on the server.
2. Restart the vCenter Server service.
vCenter Server Cannot Connect to Managed Hosts
vCenter Server cannot connect to managed hosts after vCenter server certificates are replaced and the system is restarted.
Log into the host as the root user and reconnect the host to vCenter Server.
Troubleshooting ESXi Hosts
vCenter Server reports vSphere HA host states that indicate an error condition on the host. Such errors can prevent vSphere HA from fully protecting the virtual machines on the host and can impede vSphere HA's ability to restart virtual machines after a failure. Errors can occur when vSphere HA is being configured or unconfigured on a host or, more rarely, during normal operation. When this happens, you should determine how to resolve the error, so that vSphere HA is fully operational.
vSphere HA Agent Is in the Agent Unreachable State
vSphere HA reports that an agent is in the Agent Unreachable state when the agent for the host cannot be contacted by the primary host or by vCenter Server. Consequently, vSphere HA is not able to monitor the virtual machines on the host and might not restart them after a failure.
Cause
A vSphere HA agent can be in the Agent Unreachable state for several reasons. This condition most often indicates that a networking problem is preventing vCenter Server or the primary host from contacting the agent on the host, or that all hosts in the cluster have failed. This condition can also indicate the unlikely situation that vSphere HA was disabled and then re-enabled on the cluster while vCenter Server could not communicate with the vSphere HA agent on the host, or that the ESXi host agent on the host has failed, and the watchdog process was unable to restart it. In any of these cases, a failover event is not triggered when a host goes into the Unreachable state.
Solution
Determine if vCenter Server is reporting the host as not responding. If so, there is a networking problem, an ESXi host agent failure, or a total cluster failure. After the condition is resolved, vSphere HA should work correctly. If not, reconfigure vSphere HA on the host. Similarly, if vCenter Server reports the hosts are responding but a host's state is Agent Unreachable, reconfigure vSphere HA on that host.
vSphere HA Agent is in the Uninitialized State
vSphere HA reports that an agent is in the Uninitialized state when the agent for the host is unable to enter the run state and become the primary host or to connect to the primary host. Consequently, vSphere HA is not able to monitor the virtual machines on the host and might not restart them after a failure.
Cause
A vSphere HA agent can be in the Uninitialized state for one or more reasons. This condition most often indicates that the host does not have access to any datastores. Less frequently, this condition indicates that the host does not have access to its local datastore on which vSphere HA caches state information, the agent on the host is inaccessible, or the vSphere HA agent is unable to open required firewall ports. It is also possible that the ESXi host agent has stopped.
Solution
Search the list of the host's events for recent occurrences of the event vSphere HA Agent for the host has an error. This event indicates the reason for the host being in the uninitialized state. If the condition exists because of a datastore problem, resolve whatever is preventing the host from accessing the affected datastores. If the ESXi host agent has stopped, you must restart it. After the problem has been resolved, if the agent does not return to an operational state, reconfigure vSphere HA on the host.
vSphere HA Agent is in the Initialization Error State
vSphere HA reports that an agent is in the Initialization Error state when the last attempt to configure vSphere HA for the host failed. vSphere HA does not monitor the virtual machines on such a host and might not restart them after a failure.
Cause
This condition most often indicates that vCenter Server was unable to connect to the host while the vSphere HA agent was being installed or configured on the host. This condition might also indicate that the installation and configuration completed, but the agent did not become a primary host or a secondary host within a timeout period. Less frequently, the condition is an indication that there is insufficient disk space on the host's local datastore to install the agent, or that there are insufficient unreserved memory resources on the host for the agent resource pool.
Solution
When a Configure HA task fails, a reason for the failure is reported.
Reason for Failure Action
Host communication Resolve any communication problems with the host and retry the configuration operation.
errors
Timeout errors Possible causes include that the host crashed during the configuration task, the agent failed to
start after being installed, or the agent was unable to initialize itself after starting up. Verify that
vCenter Server is able to communicate with the host. If so, see vSphere HA Agent Is in the
Agent Unreachable State or vSphere HA Agent is in the Uninitialized State for possible solutions.
Lack of resources Free up approximately 75MB of disk space. If the failure is due to insufficient unreserved
memory, free up memory on the host by either relocating virtual machines to another host or
reducing their reservations. In either case, retry the vSphere HA configuration task after resolving the problem.
Reboot pending If an installation for a 5.0 or later host fails because a reboot is pending, reboot the host and
retry the vSphere HA configuration task.
vSphere HA Agent is in the Uninitialization Error State
The vSphere HA agent on a host is in the Uninitialization Error state. User intervention is required to resolve this situation.
vSphere HA reports that an agent is in the Uninitialization Error state when vCenter Server is unable to unconfigure the agent on the host during the Unconfigure HA task. An agent left in this state can interfere with the operation of the cluster.
Cause
This condition usually indicates that vCenter Server lost the connection to the host while the agent was being unconfigured.
Solution
Add the host back to vCenter Server (version 5.0 or later)
Unable to Download VIBs When Using vCenter Server Reverse Proxy
If vCenter Server is using a custom port for the reverse proxy, the custom port is not automatically enabled in the ESXi firewall and the VIB downloads fail.
# Solution
1. Open an SSH connection to the host and log in as root.
2. List the existing firewall rules.
esxcli network firewall ruleset list
3. Back up the /etc/vmware/firewall/service.xml file.
cp /etc/vmware/firewall/service.xml /etc/vmware/firewall/service.xml.bak
4. Edit the access permissions of the service.xml file to allow writes by running the chmod command.
a. To allow writes, run chmod 644/etc/vmware/firewall/service.xml.
b. To toggle the sticky bit flag, run chmod +t /etc/vmware/firewall/service.xml.
5. Open the service.xml file in a text editor.
6. Add a new rule to the service.xml file that enables the custom port for the vCenter Server reverse proxy .
<service id='id_value'>
<id>vcenterrhttpproxy</id>
<rule id='0000'>
<direction>outbound</direction>
<protocol>tcp</protocol>
<port type='dst'>custom_reverse_proxy_port</port>
</rule>
<enabled>true</enabled>
<required>false</required>
</service>
Where id_value must be a unique value, for example, if the last listed service in the service.xml file has ID 0040, you must enter id number 0041.
7. Revert the access permissions of the service.xml file to the default read-only setting.
chmod 444 /etc/vmware/firewall/service.xml
8. Refresh the firewall rules for the changes to take effect.
esxcli network firewall refresh
9. List the updated rule set to confirm the change.
esxcli network firewall ruleset list
10. If you want the firewall configuration to persist after a reboot of the ESXi host,
copy the service.xml onto persistent storage and modify the local.sh file.
a. Copy the modified service.xml file onto persistent storage, for example /store/, or onto a VMFS volume, for example /vmfs/volumes/volume/.
cp /etc/vmware/firewall/service.xml location_of_xml_file
You can store a VMFS volume in a single location and copy it to multiple hosts.
b. Add the service.xml file information to the local.sh file on the host.
cp location_of_xml_file /etc/vmware/firewall
esxcli network firewall refresh
Wherelocation_of_xml_file is the location to which the file was copied.
Troubleshooting vCenter HA Environment
vCenter HA Clone Operation Fails During Deployment
If the vCenter HA configuration process does not create the clones successfully, you have to resolve that cloning error. Clone operation fails.
# Cause
1. Look for the clone exception. It might indicate one of the following problems.
a. You have a DRS-enabled cluster, but do not have three hosts.
b. The host or database connection is lost.
c. Not enough disk space.
d. Other Clone Virtual Machine errors
# Solution
1. Resolve the error that caused the problem.
2. Remove the cluster and start configuration again.
Redeploy the Passive or Witness node
If the passive or witness node fails and vCenter HA cluster was configured using the automatic cloning method, you can redeploy it in the vCenter HA Settings page.
# Procedure
1. Log in to the Active node with the vSphere Client.
2. Select the vCenter Server object in the inventory and select the Configure tab.
3. Select vCenter HA under Settings.
4. Click on the REDEPLOY button next to the node to start the Redeploy wizard.
5.
a. If your vCenter server is managed by another vCenter server in the same SSO domain, proceed to step 6.
b. If your vCenter server is managed by another vCenter server in a different SSO domain,
input the location and credential details of that management vCenter server.
Enter the Management vCenter Server FQDN or IP address and Single Sign-On credentials.
6. Specify a unique name and target location.
7. Select the destination compute resource for the operation.
8. Select the datastore in which to store the configuration and disk files.
9. Configure the virtual machine networks.
a. If you are redeploying the Passive node, select virtual machine Management (NIC 0) and vCenter HA (NIC 1) networks.
b. If you are redeploying the Witness node, select vCenter HA (NIC 1) network.
c. If there are issues with your selections, errors or compatibility warnings are displayed.
10. Review your selections and click Finish to redeploy the node.
Resolving Failover Failures
When a Passive node does not become the Active node during a failover, you can force the Passive node to become the Active node.
# Cause
A vCenter HA failover might not succeed for these reasons.
a. The Witness node becomes unavailable while the Passive node is trying to assume the role of the Active node.
b. An server state synchronization issue between the nodes exists.
# Solution
You recover from this issue as follows.
1. If the Active node recovers from the failure, it becomes the Active node again.
2. If the Witness node recovers from the failure, follow these steps.
a. Log in to the Passive node through the Virtual Machine Console.
b. To enable the Bash shell, enter shell at the appliancesh prompt.
c. Run the following command.
vcha-reset-primary
d. Reboot the Passive node.
3. If both Active node and Witness node cannot recover, you can force the Passive node to become a standalone vCenter Server.
a. Delete the Active node and Witness node virtual machines.
b. Log in to the Passive node through the Virtual Machine Console.
c. To enable the Bash shell, enter shell at the appliancesh prompt.
d. Run the following command.
vcha-destroy
e. Reboot the Passive node.
Troubleshooting a Degraded vCenter HA Cluster
For a vCenter HA cluster to be healthy, each of the Active, Passive, and Witness nodes must be fully operational and be reachable over the vCenter HA cluster network. If any of the nodes fails, the cluster is considered to be in a degraded state.
# Cause
The cluster can be in a degraded state for a number of reasons:
1. One of the nodes fails
a. If the Active node fails, a failover of the Active node to the Passive node occurs automatically.
After the failover, the Passive node becomes the Active node.
At this point, the cluster is in a degraded state because the original Active node is unavailable.
After the failed node is repaired or comes online, it becomes the new Passive node and the cluster returns to a healthy state
after the Active and Passive nodes synchronize.
b. If the Passive node fails, the Active node continues to function, but no failover is possible and the cluster is in a degraded state.
If the Passive node is repaired or comes online, it automatically rejoins the cluster and the cluster state
is healthy after the Active and Passive nodes synchronize.
c. If the Witness node fails, the Active node continues to function and replication between Active and Passive node continues,
but no failover can occur.
If the Witness node is repaired or comes online, it automatically rejoins the cluster and the cluster state is healthy.
2. Database replication fails
If replication fails between the Active and Passive nodes, the cluster is considered degraded.
The Active node continues to synchronize with the Passive node. If it succeeds, the cluster returns to a healthy state.
This state can result from network bandwidth problems or other resource shortages.
3. Configuration file replication issues
If configuration files are not properly replicated between the Active and Passive nodes, the cluster is in a degraded state.
The Active node continues to attempt synchronization with the Passive node.
This state can result from network bandwidth problems or other resource shortages.
# Solution
How you recover depends on the cause of the degraded cluster state.
If the cluster is in a degraded state, events, alarms, and SNMP traps show errors.
If one of the nodes is down, check for hardware failure or network isolation. Check whether the failed node is powered on.
In case of replication failures, check if the vCenter HA network has sufficient bandwidth and ensure network latency is 10 ms or less.
Recovering from Isolated vCenter HA Nodes
If all nodes in a vCenter HA cluster cannot communicate with each other, the Active node stops serving client requests.
# Problem
Node isolation is a network connectivity problem.
# Solution
1. Attempt to resolve the connectivity problem. If you can restore connectivity,
isolated nodes rejoin the cluster automatically and the Active node starts serving client requests.
2. If you cannot resolve the connectivity problem, you have to log in to Active node's console directly.
a. Power off and delete the Passive node and the Witness node virtual machines.
b. Log in to the Active node by using SSH or through the Virtual Machine Console.
c. To enable the Bash shell, enter shell at the appliancesh prompt.
d. Run the following command to remove the vCenter HA configuration.
e. vcha-destroy -f
f. Reboot the Active node.
g. The Active node is now a standalone vCenter Server.
9. Perform vCenter HA cluster configuration again
VMware vCenter HA Alarms and Events
If a vCenter HA cluster is in a degraded state, alarms and events show errors.
Configuring vSwitch or vNetwork Distributed Switch from the command line in ESXi/ESX
Symptoms
You are unable to connect to an ESXi/ESX host via the network because uplinks (vmnics) have changed or are not in the correct order The Primary Service Console/VMkernel management interface is moved from a switch to a distributed switch on a non-routable network VLAN for the Management Network is changed or configured incorrectly on the uplink Switch Port
Verify vSwitch Configuration
# View the current vSwitch configuration and vmkernel interface configuration using these commands:
esxcli network vswitch standard list # list current vswitch configuration
esxcli network vswitch dvs vmware list # list Distributed Switch configuration
esxcli network ip interface list # list vmkernel interfaces and their configuration
esxcli network nic list # display listing of physical adapters and their link state
Add or remove network cards (known as vmnics) - stanard vSwitch
# Add or remove network cards (known as vmnics) to or from a Standard vSwitch using these commands:
esxcli network vswitch standard uplink remove --uplink-name=vmnic --vswitch-name=vSwitch # unlink/remove an uplink
esxcli network vswitch standard uplink add --uplink-name=vmnic --vswitch-name=vSwitch # add an uplink
Add or remove network cards (known as vmnics) - vnetowrk Distributed Switch (vDS)
esxcfg-vswitch -Q vmnic -V dvPort_ID_of_vmnic dvSwitch # unlink/remove a vDS uplink
esxcfg-vswitch -P vmnic -V unused_dvPort_ID dvSwitch # add a vDS uplink
If connectivity was lost when migrating management networking to a Distributed Switch, it may be necessary to remove or disable the existing management vmkernel interface and recreate it in a Standard vSwitch port group with the same IP configuration.
On a vSphere Distributed Switch (vDS), delete an existing VMkernel port using this command:
esxcli network ip interface remove --interface-name=vmkX
Note: The vmk interface number used for management can be determined by running command
esxcli network ip interface list
After the unreachable vmkernel port has been removed, it can be recreated on a Standard Switch.
If an existing Standard Switch does not exist, you can create a new one as well as a port-group to use with these commands:
esxcli network vswitch standard add --vswitch-name=vSwitch
esxcli network vswitch standard portgroup add --portgroup-name=portgroup --vswitch-name=vSwitch
Note: When creating a virtual switch, there are no linked vmnics by default. You will need to link vmnics as mentioned above
To create a VMkernel port and attach it to a portgroup on a Standard vSwitch, run these commands:
esxcli network ip interface add --interface-name=vmkX --portgroup-name=portgroup
esxcli network ip interface ipv4 set --interface-name=vmkX --ipv4=ipaddress --netmask=netmask --type=static
Note: By default, the ESXi, the management vmkernel port is vmk0 and resides in a Standard Switch portgroup called Management Network.
If the vmnics associated with the management network are VLAN trunks, you may need to specify a VLAN ID for the management portgroup.
# To set or correct the VLAN ID required for management connectivity on a Standard vSwitch, run this command:
esxcli network vswitch standard portgroup set -p portgroup --vlan-id VLAN
It may be necessary to restart the host's management agents if network connectivity is not restored despite a correct configuration:
services.sh restart
Verify ESXi and vSAN datastore
cd /vmfs/volumes # list all the datastores
cd /vmfs/volumes/vsan:<datastore-id>
Exit ESXi host maintenance mode via ssh
# ssh to the ESXi host
vim-cmd hostsvc/maintenance_mode_exit
Useful ESXi esxcli commands
# Verify ESXi host storage core network adapter and network adapter
esxcli storage core adapter list
esxcli network nic list
# Restart ESXi host
esxcli system shutdown reboot --reason "Enter the reason"
ESXi VIB update
# Verify system in maintenance mode
esxcli system maintenanceMode get
a. Verify state "Enable" # Maintenamce mode
b. Connected # In operational mode
# Method - remove existing and install new VIB
esxcli software vib list # list all software vib installed, and identity the required VIB to be uninstalled
esxcli software vib remove -d "/vmfs/volumes/<path>/<vib-filenama>.zip" # Remove the existing VIB
esxcli software vib install -d "/tmp/<vib.zip>" # Install the VIB file, if VIB file in tmp directory
esxcli software vib list --rebooting-image # Check the newly installed VIB in the rebooting image
esxcli software vib remove -n <name>
esxcli software vib remove -n scsi-bnx2fnc
Note:
Open another ssh console, run
tail -f /var/log/esxupdate.log # monitoring the VIB installation
# Method - update VIB without need to remove it first
esxcli software vib update -d "/vmfs/volumes/<path>/<vib-filename>.zip"
# Reboot after VIB update
esxcli system shutdown Poweroff --reason "system maintenance" # Shutdown the ESXi host
esxcli system shutdown reboot --reason "system reboot"
How to backup and restore ESXi system
This process will backup the ESXi host configuration, then manually install ESXi host with an updated version and restore the configuration.
1. On the management server, run vicf-cfgbackup.pl to backup ESXi host configuration
vicfg-cfgbackup.pl --server=<esxi-FQDN|IP-address>
--username=root --password=<root-credential>
-s "d:\backup\esxibackup\<esxi-FQDN>.tgz"
Note:
Using "-s" switch to backup configuration
2. Format the esxi host USB or boot disk if required
3. Install fresh ESXi 6.7u3 # If upgrade from 6.7u1, after successful upgrade and restore, then upgrade to 7.0u1, and 7.0u3
# Install as dummy install, without DNS IP and hostname
4. Put ESXi host in maintenance mode
5. Restore the newly installed ESXi with the backup configuration
vicfg-cfgbackup.pl --server==<esxi-FQDN|IP-address>
--username=root --password=<root-credential>
-l "d:\backup\esxibackup\<esxi-FQDN>.tgz"
Note:
using "-l" switch to restore configuration
6. Verify ESXi host in vCenter and vSAN
How to enable ESXi Shell access using the Direct Console User Interface (DCUI)
https://kb.vmware.com/s/article/2004746
# Enabling ESXi Shell access using the vSphere Client
1. Login to a vCenter Server system using the vSphere Client
2. Select the host in the inventory panel
3. Click the Configuration tab -> Security Profile
4. In the Service section -> Properties
5. Select the ESXI Shell from the list
a. ESXi Shell
b. SSH
c. Direct Console UI
6. click Optoins and select "Start and stop manually"
Note: When you select Start and stop manually, the service does not start when you reboot the host.
If you want the service to start when you reboot the host, select Start and stop with host.
7. Click Start to enable the service
8. click OK
# Use the Host Client to enable local and remote access to the ESXi Shell
1. Log in to a Host Client using IP address of the host in a browser
2. Click on Manage under Navigator section
3. Click the Services tab
4, In the Services section, select TSM from the list
5. Click Actions and select Start to enable the ESXi shell.
# Use the direct console user interface to enable the ESXi Shell:
1. From the Direct Console User Interface,
press F2 to access the System Customization menu
2. Select Troubleshooting Options and press Enter
3. From the Troubleshooting Mode Options menu,
select Enable ESXi Shell
How to access ESXi Shell
# Accessing the local ESXi Shell
1. If you have direct access to the host,
press Alt+F1 to open the log in page on the machine's physical console.
2. Enter credentials when prompted
Note: To return to the Direct Console User Interface press Alt-F2
# Accessing the remote ESXi Shell
1. Open an SSH client, such as putty
2. Specify the IP address or domain name of the ESXi host, using TCP port 22
3. Enter credentials when prompted
How to update software profile
# ssh or run esxcli shell locall
esxcli software profile update -p <Vendor-ESXi> -d "/vmfs/volumes/<path>/VMware_ESXi_version_<vendor-version>.zip"
Challenges when ESXi host boot USB has less than 32GB in Size
When try to upgrade or install additional VIB, we are facing the issue with RAM disk size limit of 250MB, there is no enough free disk space in RAM disk. You will be error about 250MB disk space required, and only has 238MB free.
#*** Method 1 - Using vendor customize ISO to rebuild ESXi host
Note: Build from USB key directly connect to ESXi host is much faster than from a mounted ISO
# Preparation
1. Obtain the same specification of the physical host as the production ESXi host, or at least very close version
2. Directly connect the build ESXi host to a build desktop, such as Windows 10 desktop
3. The rebuild destkop and the rebuild ESXi host will be used to build and restore ESXi host configuration
# Result
After the ESXi host restore configuration, it will have the required ESXi host configuration.
It will restain vSAN disk group configuration.
# Steps
1. Backup the ESXi host configuration from the management server using PowerCLI, such as
Get-VMHostFirmware -VMHost <ESXi-host-FQDN> -BackupConfiguration -DestinationPath "D:\ESXi-Backups"
2. Install a fresh copy of the original build to the physical ESXi host dual boot USB, such as HPE, DELL
Note: The version need to be the same as the running (backup version) of the ESXi host
a. Plug dual boot USB into ESXi host internal USB slot
b. Plug the HPE bootable USB at the front of the ESXi host
c. Boot from the HPE bootable USB, and install ESXi to the internal USB
3. Configure the newly installed ESXi host
a. Access DCUI and enable Shell and SSH access
i. From the Direct Console User Interface,
press F2 to access the System Customization menu
ii. Select Troubleshooting Options and press Enter
iii. From the Troubleshooting Mode Options menu,
select Enable ESXi Shell
Select Enable SSH
b. Assign IP address to ESXi host, same subnet as the rebuild desktop
4. Using SCP or WinSCP to copy the backup configuration file to the ESXi host /tmp directory
5. Rename the backup configuration file
rename configBundle-<ESXi-FQDN>.tgz file to configBundle.tgz # The name need to be exactly configBundle.tgz
6. In the host's ESXi shell, either login locally or SSH from the rebuild desktop, run command
i. Place the host into maintenance mode
vim-cmd hostsvc/maintenance_mode_enter
ii. Restore rebuild ESXi host with target ESXi host configuration
vim-cmd hostsvc/firmware/restore_config 1 /tmp/configBundle.tgz
Note:
a. The host will be rebooted after few seconds
b. If using SSH, need to delete the ssh directory from the rebuild desktop.
The ssh directory contains the ESXi host certificate thumpprint
c:\users\<login-user>\.ssh
7. Plug the requird update version of the bootable vendor ESXi USB into the front of the rebuid ESXi host
boot from the front the USB
8. Select Upgrade to upgrade the internal USB (newly installed and restored ESXi host)
9. Reboot the rebuild ESXi host
a. Verify host has no error
b. Verify TCP/IP configuration
c. Verify the root login credential as the restored ESXi host
10. Place the ESXi host into maintenance mode, by running command
vim-cmd hostsvc/maintenance_mode_enter
11. Shutdown the ESXi host gracefully
esxcli system shutdown poweroff --reason "maintenance"
OR, press F2 to shutdown
12. Remove the rebuild USB key (interal USB), and replace the target ESXi host with the rebuilt USB
13. Verify the target ESXi host for any issue
14. Take the target ESXi host out of maintenance
15. Disable ESXi shell and SSH
#*** Method 2 - Build from VMware vanila ISO
1. Install build ESXi host with VMware vanila ISO, create the bootable USB using the VMware vanila ISO
such as VMware-VMvisor-Installer-201912001-15160138.x86_64.iso
2. Install the required VIB and drivers
a. Install NVIDIA vib
esxcli software vib install -v /vmfs/volumes/temp/NVIDIA_bootbank_NVIDIA-VMware_ESXi_6.5_Host_Driver_440.107-1OEM.650.0.0.4598673.vib
b. Install VMware tools
esxcli software vib update -v /vmfs/volumes/temp/VMware_locker_tools-light_11.3.5.18557794-18558696.vib
c. Install other drivers
esxcli software vib update -d /vmfs/volumes/temp/VMW-ESX-6.7.0-nhpsa-2.0.44-offline_bundle-14136205.zip
esxcli software vib update -d /vmfs/volumes/temp/ESXi670-202103001.zip
esxcli software vib install -d /vmfs/volumes/temp/esxi6.7uX-mgmt-bundle-gen9.3.8.0.12-1.zip
esxcli software vib install -v /vmfs/volumes/temp/HPE_bootbank_hpessacli-4.21.7.0-6.7.0.7535516.hpe.vib
3. Reboot
Verify and ensure the ESXi host has no issue
3. Restore the target ESXi Host configuration backup file to the build ESXi host
vim-cmd hostsvc/firmware/restore_config 1 /tmp/configBundle.tgz
Note:
The target ESXi host and the rebuild ESXi host on the same ESXi version and build number.
3. After reboot, configure ESXi host for NVIDIA GPU Shared Direct
Update the Graphics Hardware Type
Shared -> Shared Direct
Note: Requirement for NVIDIA GPU
# How to ssh to ESXi host from Windows 10
On the Windows 10 desktop, SSH to the ESXi host
ssh <ESXi-host-ip> -l root
# ssh to the ESXi host and verify the disk space and VIB installed
df
esxcli software vib list
Note:
Need to delete the ./ssh from the rebuild directly connected Windows 10 desktop
VMware Useful Admin Tools
https://blogs.vmware.com/virtualblocks/2019/03/16/top-10-vmware-admin-tools/
1. RVTools
2. PowerCLI
3. vCheck
4. ESXTOP
5. Cross vCenter Workload Migration Utility
How to migrate VMkernel adapter from standard switch to distribute switch port groups
How to migrate service console / VMkernel port from standard switches to VMware vSphere Distributed Switch (1010614)
https://kb.vmware.com/s/article/1010614
Pre-requisites:
- Ensure that the dvPortGroup(s) used for the vmkernel or service console are configured to match the existing port group configuration on the standard switch (VLAN, Nic Teaming).
- When a new uplink is in use for the dVswitch, ensure you have network connectivity to the uplink with the same or relevant VLANs trunked through. Compare the configuration of the new uplink with the existing uplink of standard switch to ensure they are configured identically on the physical switch.
- When you are migrating the Management vmkernel or Service console, it is best practice to have access to the remote console of the ESXi host, in case the host gets unresponsive after the migration.
# To migrate one or more VMkernel or Service Console of one or more ESXi hosts from vCenter Server:
1. Navigate to Home > Inventory > Networking
2. Right-click the dVswitch
that you have created or want to migrate to
3. If the host is already added to the dVswitch, click Manage Hosts
else Click Add Host
4. Select the host(s), click Next
5. Select the physical adapters ( vmnic, such as vmicX) to use for the vmkernel, click Next
6. Select the Virtual adapter ( vmk, such as vmkX) to migrate and click Destination port group field.
For each adapter, select the correct port group from dropdown, Click Next
7. Click Next to omit virtual machine networking migration
8. Click Finish after reviewing the new vmkernel and Uplink assignment.
9. The wizard and the job completes moving both the vmk interface (vmk0, etc) and the vmnic (vmnic1, etc) to the dVswitch.
# To migrate the service console or vmkernel of one Host from vCenter
1. Click Host > Configuration > Networking
Note: This is not a cluster wide option
2. Click the Distributed Virtual Switch view
3. Click Manage Virtual Adapters
4. Click Add
5. Select Migrate existing virtual network adapters
6. Click Next
7. Select the adapters you want to migrate
8. Select dvPortGroup from the dropdown
9. Click Next
10. Click Finish
Note:
The migration does not interrupt traffic if the new Port Group and Uplinks are pre-configured properly.
How to migrate vmkernel to port group in distributed switch
1. In vCenter console, navigate to and expand the required datacenter -> cluster
2. Select the ESXi host, and select Configure -> Networking -> Virtual switches
3. On the right section, select "..." right next to "Manage Physical Adapter", and
select "Migrate Networking"
4. On Migrate Networking window, select "Manage VMkernel adapter" from left option, then
5. Under "On other switches/unclaimed", type and select vmkx (such vmk0, vmk1, etc)
6. Click "Assign port group"
7. In "Select Network" window
a. under Name, type <port group name>, and select the port group name,
such as pg-vds-pdc
b. under Distributed Switch, type and select the pre-configured distributed switch
such as vds-pdc
Click OK, to close "Select Network" window
8. Back to Migrate Network window, verify and click Next
Example
Host/VMkernel Network Adapters In Used by Switch Source Port Group Destination Port Group
--------------------------------------------------------------------------------------------------
vmk0 (Reassigned) vSwitch0 Management pg-vds-pdc
9. On Migrate VM Network option
Select or omit any VM for migration
10. On Ready to complete option, verify and click Finish
How to add ESXi hosts to the distributed switches
1. After creating the distributed switch in vcenter, select Network and choose the required distributed switch
2. Right click and select "Add and Managed Hosts"
3. In Add and Manage Hosts window
a. Add hosts <-------- Select "Add hosts"
Add new hosts to this distributed switch
b. Manage host networking
Manage networking of hosts attached to this distributed switch
c. Remove hosts
Remove hosts from this distributed switch
4. In Select hosts, select the existing ESXi host(s),
or click + New hosts to add new ESXi host(s)
5. In Manage physical adapters, select required physical vmnicx adapter
Click Assign uplink
6. On Select an Uplink | vmnicx Window, select required uplink
Uplink(x)
Note:
Select "Apply this uplink assignment to the rest of the hosts", if
want to configure all the remaining hosts
7. Verify that have been populated with the required/selected vmnicx, uplinkx, and uplink port group
a. Host/Physical Network Adapters
b. In Use by Switch
c. Uplink
d. Uplink Port Group
8. In Manage VMkernel adapter, verify
a. Host/VMkernel Network Adapters
b. In Use by Switch
c. Souce Port Group
d. Destribution Port Group
Note
If migrate existing vmk(x) from standard switch to distributed switch, then
a. Host/VMkernel Network Adapters - vmk0 (Reassigned)
b. In Use by Switch - vSwitch0
c. Source Port Group - Management
d. Destributed Port Group - pg-mgmt-vds-pdc # example
9. In Migrate VM networking
Select any VM that will be migrated at the same time,
click "Assign port group" and select the required destination port group
10. Confirm and click Finish
Host to remove management network vmkernel adapter and re-create
Confirm by checking the physical MAC address of all physical NICs and their link status, the existing vSwitch configuration, and the current vmkernel interfaces configuration. For ESXi, use these commands:
esxcfg-nics -l
esxcfg-vswitch -l
esxcfg-vmknic -l
# Prerequisite
Record the vmk0 TCP/IP configuration
a. IP address
b. Network Mask
c. Default Gateway
d. switch name and port group name, and port ID
e. vLAN ID
# Method 1
1. Verify vmk0 port ID that is used
esxcfg-vswitch -l | grep vmk0 # note down port ID number
2. Remove vmk0
esxcli network ip interface remove --interface-name=vmk0
3. Recreate vmk0
esxcli network ip interface add --interface-name=vmk0 --dvs-name=DVSWITCHNAME --dvport-id=PORT_ID_FROM_STEP_ONE
4. Configure TCP/IP
esxcli network ip interface ipv4 set --interface-name=vmk0 --ipv4=IP --netmask=NETMASK --type=static
5. Set default gateway
esxcfg-route –a <vmk0-ip-gateway>
6. Mark vmk0 for Management traffic
esxcli network ip interface tag add -i vmk0 -t Management
# Alternative
1. Remove vmk0
esxcfg-vmknic –d –p Management # “Management” portgroup name
2. Add a vmknic to a port group
esxcfg-vmknic –a –i <vmk0-ip> –n <netmask> Management # using "Management" portgroup name
3. Mark vmk0 for Management
esxcli network ip interface tag add -i vmk0 -t Management
# Method 2 -
https://gist.github.com/DevoKun/bfb5898da72050c95c5515755a4b5780
1. List existing ports
esxcfg-vswitch -l
esxcli network vswitch dvs vmware list
esxcfg-vswitch -Q vmnic0 -V ${DVS_ID_FROM_ABOVE} ${DVS_NAME}
2. Remove vmk0
esxcli network ip interface remove --interface-name=vmk0
3. Create a new vSwitch
esxcfg-vswitch -l
esxcfg-vswitch -a vSwitch0
esxcfg-vswitch -l
esxcfg-vswitch -L vmnic0 vSwitch0
esxcli network nic list
esxcfg-vswitch -l
4. Create the Management port group
esxcfg-vswitch -A Management vSwitch0
esxcfg-vswitch -l
5. Create vmk0 and assign to the Management port group
esxcli network ip interface list
esxcli network ip interface add --interface-name=vmk0 --portgroup-name=Management
esxcli network ip interface list
6. Configure TCP/IP on vmk0
esxcli network ip interface ipv4 set \
--interface-name=vmk0 \
--ipv4=<vmk0-ip> \
--netmask=<vmk0-netmask> \
--type=static
esxcfg-route -a default <vmk0-default-gateway>
7. Tag vmk0 for use as the Management interface
esxcli network ip interface tag add -i vmk0 -t Management
In ESXi 5.x, most of the legacy commands used in 4.x will continue to work. VMware recommends using their esxcli equivalents where possible as legacy esxcfg commands will be deprecated in a future release.
# Method 3 - commands and information to restore management network connectivity via the correct vmnic interface
Configuring vSwitch or vNetwork Distributed Switch from the command line in ESXi/ESX (1008127)
https://kb.vmware.com/s/article/1008127
# In ESXi 5.x and 6.x
1. View the current vSwitch configuration and vmkernel interface configuration using these commands:
esxcli network vswitch standard list # list current vswitch configuration
esxcli network vswitch dvs vmware list # list Distributed Switch configuration
esxcli network ip interface list # list vmkernel interfaces and their configuration
esxcli network nic list # display listing of physical adapters and their link state
2. Add or remove network cards (known as vmnics) to or from a Standard vSwitch using these commands:
esxcli network vswitch standard uplink remove --uplink-name=vmnic --vswitch-name=vSwitch # unlink an uplink
esxcli network vswitch standard uplink add --uplink-name=vmnic --vswitch-name=vSwitch # add an uplink
3. Add or remove network cards (known as vmnics) to or from a vNetwork Distributed Switch (vDS) using these commands:
esxcfg-vswitch -Q vmnic -V dvPort_ID_of_vmnic dvSwitch # unlink/remove a vDS uplink
esxcfg-vswitch -P vmnic -V unused_dvPort_ID dvSwitch # add a vDS uplink
Note:
If connectivity was lost when migrating management networking to a Distributed Switch,
it may be necessary to remove or disable the existing management vmkernel interface and
recreate it in a Standard vSwitch port group with the same IP configuration.
4. On a vSphere Distributed Switch (vDS), delete an existing VMkernel port using this command:
esxcli network ip interface remove --interface-name=vmkX
Note:
The vmk interface number used for management can be determined by running the command
esxcli network ip interface list
After the unreachable vmkernel port has been removed, it can be recreated on a Standard Switch.
5. If an existing Standard Switch does not exist,
you can create a new one as well as a port-group to use with these commands:
esxcli network vswitch standard add --vswitch-name=vSwitch0
esxcli network vswitch standard portgroup add --portgroup-name=portgroup --vswitch-name=vSwitch0
Note:
When creating a virtual switch, there are no linked vmnics by default.
You will need to link vmnics as described earlier in this article.
6. To create a VMkernel port and attach it to a portgroup on a Standard vSwitch, run these commands:
esxcli network ip interface add --interface-name=vmkX --portgroup-name=portgroup
esxcli network ip interface ipv4 set --interface-name=vmkX --ipv4=ipaddress --netmask=netmask --type=static
Note: By default, the ESXi, the management vmkernel port is vmk0 and resides in a Standard Switch portgroup called Management Network.
7. If the vmnics associated with the management network are VLAN trunks, you may need to specify a VLAN ID for the management portgroup.
To set or correct the VLAN ID required for management connectivity on a Standard vSwitch, run this command:
esxcli network vswitch standard portgroup set -p portgroup --vlan-id VLAN
8. It may be necessary to restart the host's management agents if network connectivity is not restored despite a correct configuration:
services.sh restart
How to troubleshooting vMotion
https://kb.vmware.com/s/article/65184
https://kb.vmware.com/s/article/2020669
https://kb.vmware.com/s/article/1030264
# Procedure
1. Identify which vmknics are used for vMotion on both hosts.
Option 1 Using the vCenter Server User Interface (UI)
a. Select the ESXi host
b. On right pane, select Configure -> Networking -> VMKernel adapters
Verify the vmkernel adapter with vMotion enable
Option 2 Using the ESXi command line
esxcli network ip interface list # list the vmkernel adapters
esxcli network ip interface tag -i [VMkernel adapter name]
# Example
esxcli network ip interface tag -i vmk0
esxcli network ip interface tag -i vmk1
Verify
Tags: VMotion
2. Run diagnostic tools between source and destination vMotion VMkernel adapters
vmkping
vmkping -I vmk(x) <dest-esxi-host-ip|hostname>
Note
a. vmkping uses a VMkernel’s TCP/IP stack to send ICMP traffic to a destination host.
b. With long distance vMotion, the maximum supported RTT is 150 miliseconds.
nc is the netcat utility and can be used to verify connectivity on a specific remote port.
3. Run nc command verify vMotion TCP port 8000
a. From source ESXi host, run this command:
nc -z <dest-host-ip> 8000 -v
# -v verbose
b. From destination ESXi host, run this command:
nc -z <source-host-ip> 8000 -v
c. Sample successful output:
Connection to 192.168.0.2 8000 port [tcp/*] succeeded!
d. Sample failed output:
Connection to 192.168.0.2 8000 port [tcp/*] failed: connection refused
Troubleshooting Network and TCP/UDP Port Connectivity Issue
https://kb.vmware.com/s/article/2020669
using these troubleshooting tools:
# Troubleshoot network connectivity between two servers
ping/vmkping
# Troubleshoot TCP port connectivity
netcat (nc)
# Troubleshoot SSL port connectivity and verify SSL certificate information
openssl
# collect packet traces to troubleshoot network issues
tcpdump-uw & pktcap-uw
# View active TCP/UDP connections to the host
netstat & esxcli network
vmkping
netcat (nc)
use netcat (nc) to confirm connectivity to a TCP port on a remote host
# Get help
nc -h
# Syntax
nc -z <destination-ip> <destination-port>
The nc command can also be used to check the connectivity to a range of TCP ports on a remote host
# syntax - example
# -w wait
# 20-81 Example port range
s
nc -w 1 -z 192.168.48.133 20-81
nc -z 192.168.48.133 80 443 9443 # test few different ports
Testing SSL port connectivity and certificate information with openssl
To test SSL ports, you can use the openssl command to test connectivity and also to confirm the current SSL information.
# Syntax
openssl s_client -connect destination-ip:ssl-port
Example
openssl s_client -connect 192.168.48.133:443
tcpdump-uw and pktcap-uw
tcpdump-uw can be used on ESXi hosts to capture packet traces from a vmkernel (vmk) interface.
To display packets on the vmkernel interface vmk0, use the tcpdump-uw command with the -i option
tcpdump-uw -i vmk0
To capture the entire packet, use the tcpdump-uw command with the -s option with a value of 1514 for normal traffic and 9014 if Jumbo Frames are enabled
# Normal traffic
tcpdump-uw -i vmk0 -s 1514
# Jumbo Frames enabled
tcpdump-uw -i vmk0 -s 9014 -B 9
# To display all of the packets on vmk0 with verbose detail, with the -vvv option
tcpdump-uw -i vmk0 -s 1514 -vvv
# To display only the TCP packets, use tcp option
tcpdump-uw -i vmk0 -s 1514 tcp
# To see traffic to/from only a single IP address, you can use the host option
tcpdump-uw -i vmk0 -s 1514 host x.x.x.x
# To avoid seeing unwanted traffic types in the tcpdump-uw output, use the not option.
tcpdump-uw -i vmk0 -s 1514 port not 22 and port not 53
To limit the log files to a specified number, you can use the -W option. You can use this option if a trace must be set running for a long period of time, waiting for an even to occur.
# This command creates 10 trace files of size 100MB each
# Note: Ensure do not run out of space on esx
tcpdump-uw -i vmk0 -s 1514 -C 100M -W 10 -w /var/tmp/test.pcap
Using the pktcap-uw tool in ESXi 5.5 and later (2051814)
https://www.virten.net/2015/10/esxi-network-troubleshooting-with-tcpdump-uw-and-pktcap-uw/
The pktcap-uw tool is an enhanced packet capture and analysis tool that can be used in place of the legacy tcpdump-uw tool. The pktcap-uw tool is included by default in ESXi 5.5 and later versions.
# Get help
pktcap-uw -h |more
# view a live capture of a vmkernel ports traffic
pktcap-uw --vmk vmkX # Example pktcap-uw --vmk vmk1
# To view a live capture of a specific physical network card on the host vmnic:
pktcap-uw --uplink vmnicX
How to identify the VM Port ID
# Works with both, distributed and standard switches
To identify the Port-ID open
esxtop
and press n
Useful pktcap-uw commands for packet capture of VM traffic
# Use the Port-ID (example 33554439) of the virtual machines network interface to capture the traffic
pktcap-uw --switchport 33554439
# To capture traffic that goes inside the virtual machine
pktcap-uw --switchport 33554439 --capture PortOutput
# To capture any dropped packets use this command
pktcap-uw --capture Drop
# To capture traffic that goes inside the virtual machine,
pktcap-uw --switchport 33554439 --capture PortOutput
Stop pktcap-uw tracing with the kill command
kill $(lsof |grep pktcap-uw |awk '{print $1}'| sort -u)
# Verify pktcap tracking has stopped
lsof | grep pktcap-uw | awk '{print $1}' | sort -u
Advanced Usage - trace multiple ports at the same time
As an example, trace a particular vSwitch port and its associated uplink at the same time:
# pktcap-uw Syntax for Capturing Packets
pktcap-uw switch_port_arguments capture_point_options filter_options output_control_options
# To get the vSwitch port number, run this command:
net-stats -l
# Identify and make a note these parameters:
a. Port ID
returned by the esxtop command — --switchport 50331665
b. vmnic2 physical port
that you want to trace — --uplink vmnic2
c. location of the output pcap file
/tmp/vmnic2.pcap
d. Run the pktcap-uw command to capture packets at both points simultaneously:
# run multiple capture at the same time
pktcap-uw --switchport 50331665 -o /tmp/50331665.pcap & pktcap-uw --uplink vmnic2 -o /tmp/vmnic2.pcap &
Viewing active TCP/UDP connections with netstat and esxcli network
# syntax
excfg-vmknic -l # list all vmkernel interface
esxcli network ip interface list # list vmkernel interfaces / adapters
# Verify TCP connections
netstat -tnp # view active TCP connections
netstat -tunp # view active TCP/UDP connections
esxcli network connection list # view active network connections
To retrieve errors and statistics for a network adapter,
esxcli network nic stats get -n <vmnicX>
esxcli and vim-cmd commands
Collecting information about tasks in VMware ESXi/ESX (1013003) https://kb.vmware.com/s/article/1013003
When there are troubleshooting issues with ESXi/ESX hosts and VMware vCenter Server, there may be differences between what vCenter Server and an ESXi/ESX host considers tasks.
esxcli vm process list # list active VMs
vim-cmd vimsvc/task_list # To get a list of tasks on the host
# To get a list of tasks associated to specific virtual machines,
you must first get the Vmid of the virtual machine. Run the command
vim-cmd vmsvc/getallvms
# When you have the Vmid, you can then get a list of tasks associated with a specific virtual machine
vim-cmd vmsvc/get.tasklist VMID
# List network configuration
vim-cmd hostsvc/net/info | grep
Migrate VMkernel adapter on a host from Distributed Swtich to vSphere Standard Switch
If a host is associated with a distributed switch, you can migrate VMkernel adapters from the distributed to a standard switch.
# Prerequisites
Verify that the destination standard switch has at least one physical NIC.
# Procedure
1. In the vSphere Client, navigate to the host, or
select the ESXi host in the "hosts & Cluster View"
2. On the Configure tab,
expand Networking and select Virtual Switches.
3. On the right pane, select the destination standard switch from the list.
4. click "..." and select Migrate VMkernel Adapter.
5. On the Select VMkernel adapter page,
select the virtual network adapter, such vmk0 to migrate to the standard switch from the list.
6. On the Configure settings page,
edit the Network label and VLAN ID for the network adapter.
7. On the Ready to complete page, review the migration details and click Finish.
Click Back to edit settings.
Access ESXi host Direct Connect UI (DCUI) console from iDRAC (DELL), iLO (HP) or other vendor console method, then press F2 and login as root.
- Select Configure Management Network If Network Adapters and VLAN (optional) are greyed out, as the ESXi host management VMkernel is managed by vDS, due to the configuration of VMkernel management interfaces and VLANs must be done on vDS. We would need to carry out the process of removing the host from vDS to standard switch.
- Need to verify the Network Adapter(s) are enabled or selected in DCUI, by navigate to Network Restore Options https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.esxi.install.doc/GUID-7EAF0251-1EEA-49DC-AD71-39E5F25906BF.html
a. Restore vDS
If continue to use vDS to managed ESXi
b. Restore Standard Switch <---- select this option
As we are migrating ESXi host from vDS back to standard switch
Then press F11 to confirm, and it will then restart the management network
- Change the management IP address and vLAN From DCUI -> Configure Managment Network, then select VLAN (optional) to change the VLAN to the new VLAN. Then, select IP Configuration to change
a. IP Address
b. Subnet Mask
c. Default Gateway
- You could restore the vmk0 interface from ESXi shell
# ESXi shell command
esxcli network vswitch standard add -v vSwitch0
esxcli network vswitch standard portgroup add -p 'Management Network' -v vSwitch0
esxcli network ip interface add -i vmk0 -p 'Management Network'
Note:
To verify ESXi network
esxcli network vswitch stanard list
esxcli network vswitch dvs wmare list
esxcli network nic list
esxcli network ip interface ipv4 get
To check and enable network adapter for management adapters From DCUI -> Configure Managment Network -> network Adapters, then select the requied netowrk adapter that have the physical connection(s).
Finally, update the DNS entries for the ESXi host management IP.
Ping the ESXi management IP and ensure it is successful.
How to rename a VMware ESXi host using the command line
- SSH to ESXi, or access ESXi shell from DCUI
a. Need to enable ssh and ESXi shell from vCenter, or
b. From DCUI, press F2 and login as root
Navigate to Troubleshooting Options -> Enable ESXi Shell and SSH
- If the ESXi host is part of cluster, first enter the Maintenance mode to remove it from the cluster
- If the ESXi host is managed by vCenter, then remove the ESXi host from Inventory
- . Run command
esxcli system hostname set --host=hostname
esxcli system hostname set --fqdn=fqdn
- Join the ESXi host to vCenter and the cluster
vmkfstools
vmkfstools is one of the ESXi Shell commands for managing VMFS volumes, storage devices, and virtual disks. You can perform many storage operations using the vmkfstools command. For example, you can create and manage VMFS datastores on a physical partition, or manipulate virtual disk files, stored on VMFS or NFS datastores.
# command syntax
vmfstools <options> <target> <-v>
-v # verbose
# check and repair virtual disks
vmksfstools -x | --fix [check | repairs]
vmksfstools -x check /vmfs/volumes/..../test.vmdk # Check the vmdk file
# clone disk
vmksfstools -i /vmfs/volumes/sourcevmfs/testvm/test.vmdk /vmfs/volumes/destvmfs/testvm2/test2.vmdk
Investigating disk space on an ESX or ESXi host
https://kb.vmware.com/s/article/1003564
df -h # check disk free space
vdf -h # Determine the free disk space on each filesystem using the command:
# Reivew the used % any volumes listed are 100% full
du -h --max-depth=1 <dir>
This command lists the directories within a given filesystem that contain the largest files.
By starting at the root (/) directory and finding the largest directories,
you can then drill down into these directories (using cd) and execute the same command recursively,
until you find the files themselves which are occupying space.
# Directory usage
/vmimages/ # Used to store operating system install files such as the VMware Tools or other ISO files.
var/core/ and /root/ # sed to store crash files for on the service console and the VMkernel.
/var/log/ # Used to store the majority of the logs for the ESX host.
/vmfs/volumes/ # Used to store the virtual machine data.
# To review the space consumed by several of these common directories, run this command:
du -ch /vmimages /var/core /root /var/log
find / -size +10M -exec du -h {} \; | less # find files that are bigger than defind size
# To delete the file, using rm
# To zero the file, but not delete the file
# > <file to be zero> # > file.log
# To compress the old log files
tar czvf /tmp/vmkwarning-logs.tgz /var/log/vmkwarning*
tar czvf /tmp/vmkernel-logs.tgz /var/log/vmkernel.*
tar czvf /tmp/messages-logs.tgz /var/log/messages.*
# To delete the source files
rm /var/log/vmkwarning.* /var/log/vmkernel.* /var/log/messages.*
# Move the new archive files back to your /var/log/ partition for long-term storage using the command:
mv /tmp/vmkwarning-logs.tgz/tmp/vmkernel-logs.tgz/tmp/messages-logs.tgz /var/log/
Test remote port connectivity
nc -z <dest-ip> <dest-port>
nc -w 1 -z 192.168.1.10 20-81 # test range of ports