Published on

NSX-T Cheatsheet and Troubleshooting

Authors
  • Name
    Jackson Chen

VMware NSX-T Data Center Troubleshooting

Enter privilege mode and command mode

st en   # NSX manager privilege mode
esxcli  # ESXi host to enter esxcli command mode
su -i   # KVM privilege mode

NSX-T Troubleshooting

https://www.simongreaves.co.uk/nsx-t/

Check L2 before L3

# Retrieve all available commands to query and configure
list    

# API query and configure (Postman, Insomnia)
http    GET     # Query
http    PUT     # Create
        PATCH   # Update
        POST    # Update
http    DELETE  # Delete

# PowerCLI
Connect-VIServer -Server <vCenterServer>    # Connect to vCenter

Check (L2)

1. MTU
2  VLAN
3. TEP
    IP  
    MTU
4. CCP

N-VDS settings (L3)

1. MTU (L2)
2. Routing table (L4)
3. TEP
4. vTEP tables
5. MAC tables

Manager Troubleshooting

1. CorfuDB3
2. nodes
3. Quorum must be up, at least 2 corfu servers required for quorum
4. Group Member Leader Election Server (GMLE) helps in detecting the fault with an NSX Manager node failure.  It also helps elect a new leader per group.
5. Day  2 OperationsUse st en to enter engineering mode (root privileged mode)

Logs

https://docs.vmware.com/en/VMware-NSX-T-Data-Center/3.1/administration/GUID-406AF9C3-E8F7-447A-8E3D-92AFB9D5E973.html

## NSX Manager
Component               Log Files and Locations
NSX Policy Manager      /var/log/policy/policy.log

NSX Manager             /var/log/syslog
                        /var/log/proton/nsxapi.log
                        /var/log/nsx-audit.log
                        manager.log                              
                            get log-file manager # can only be view from CLI command

NSXAPI Logs             /var/log/proton/nsxapi.log
CorfuDB logs            /var/log/corfu    # directory contains CorfuDB logs
Cluster BootstrapManager (CBM)          /var/log/cbm    # directory contains cbm logs
General Logs            /var/log/syslog     # syslog messages file
NSX Controller          /var/log/cloudnet/nsx-ccp.log
Audit Logs              /var/log/nsx-audit.log

## ESXi host
ESXi host               /var/log/nsx-syslog  
                        /var/log/esxupdate.log
                        /var/log/nsxa-opsagent.log
                        /var/log/syslog
                        /var/log/vmkernel.log
                        /var/log/nsx-proxy.log
                                
## KVM host
KVM host                /var/log/vmware/nsx-syslog  # syslog message file
                        /var/log/syslog
                        /var/log/openvswitch/ovswitchd.log
                        /var/log/dpkg.log 

DFW                     /var/log/dfwpktlogs.log (only fills if logging enabled on rule)

# Edge nodes
Edge Nodes              Syslog
                        (get log-file syslog)
                        /var/log/syslog

Load Balancer errors    Access-log [follow]     # Using follow to show log as it is being updated
                        Error-log [follow]
    get load-balancer <lb-uuid> error-log       # Example to view error log


## NSX Cli command
get log-file <fiilename>
get log-file <filename> follow

# Below are commonly used log files, there are many more log files
get log-file <auth.log | controller | controller-error | http.log | kern.log | manager.log | node-mgmt.log | policy.log | syslog> [follow]
    # use  [follow] to continuing monitor
    Example:  get log-file syslog follow


## Distributed firewall logs
# ESXi host         /var/log/nsx-syslog.log
                    /var/log/dfwpktlogs.log

# KVM host          /var/logs/vmware/nsx-syslog
                    /var/log/dfwpktlogs.log    

Set logging level on NSX Manager with

set logging-server <remote-syslog-server>:514 proto [udp|tcp] level [info|debug]
Set service manager logging-level debug

Other Logging Information

Log Message IDs
Infrastructure Preparation Logs
Policy Manager logs
View with       get log-file policy.log   
                get log-file syslog
                Controller Log
                CFG Agent Log 
                (ESXi)
                KVM

Syslog

Configure Syslog Exporter
Using vRLI with NSX (vRealize Log Insight)
    1. https://<vRLI>/admin
    2. Install log insight content pack for NSX-T data center

get logging-server  # Verify logging configuration
set logging-server <syslog-hostname or ip>:port proto [protocol] level <level> 

# ESXi host
esxcli system syslog config set --loghost=udp://<syslog-ip>:<port>
esxcli system syslog reload

# KVM
Edit /etc/rsyslog.d/<vmware log>.conf
*.* @<syslog-ip>:514
service rsyslog restart

Protocols Supported

TCP
UDP
TLS

Severity Level

1. Emergency
2. Alert
3. Critical
4. Error
5. Warning
6. Notice
7. Informational
8. Debug

Management and Edge Node configuration

set logging-server <hostname-or-ip-address[:port]> proto <protocol-type> level <level>

# Example
set logging-server <hostname-or-ip-address:514 proto udp level info

# Rmove logging
del logging-server <hostname-or-ip-address:514 proto udp level info 

ESXi Configuration

esxcli network firewall ruleset set -r syslog -e true
esxcli system syslog config set –loghost=<hostname-or-ip-address[:port]>
esxcli system syslog reload

KVM Troubleshooting - syslog

Login as root
Create this file    /etc/rsyslog.d/40-vmware-remote-logging.conf

Add this line to the file
‘.@:514;RFC5424fmt'

Restart syslog
Systemctl restart rsyslog

# Verify logging configuration
get logging-server

Monitoring Dashboards

Verify monitoring dashboards

Packet Capture

If you need detailed traffic info, use port mirroring.
Can use CLI to setup packet capture on:
1. NSX Manager
start capture interface [file ] [count ] [expression ]

2. NSX Edges
set capture session interface direction

3. ESXi
Collect packets
pktcap-uw

4. View packets
tcpdump -uw

5. KVM
Tcpdump

Troubleshooting scenarios

NSX Manager

If file corrupt         check OVA or QCOW2 install files
Password requiement     12 characters minimum on password
Check logs
get cluster-status
get cluster config      # Verify cluster configuration
get service <service-name>
start service <service-name>

Installation problems

NSX CLI Commands:
    get services
    get service 
    get cluster status 
    get configuration 

nsxcli

Can see that ESXi is connected to 46, and KVM is on 47, showing the Shards are working correctly

Logical Switching

Common switching problems

N-VDS is incorrectly configured on a host
Overlay tunnel (GENEVE) is misconfigured
TEPs unable to reach each other

Validate switch

esxcfg-vswitch -l   # verify switch configuration

nsxdp-cli       # Verify nsx local datapath services and statitics

# Verify network interfaces
ifconfig
net-stat -I
Verification Process
# ssh to NSX manager node
get logical-switches    # Verify all logical switches/segements configured in NSX manager
get logical-switch <segment-uuid> ports     # verify the logical switch ports connected to the segment
get logical-switch <segment-VNI> transport-node-table   # list the transport node table of the segment logical switch
get logical-switch <segment-VNI> arp-table
get logical-switch <segment-VNI> map-table
get logical-switch <segment-VNI> vtep
get nodes   # list all the transport nodes

## ssh to ESXi host
nsxcli      # enter nsxcli command mode
get logical-switches    # It will list the switches VNI, UUID, DVS name, VIF numbers
get logical-switch <segment-VNI>
get logical-switch <segment-VNI> map-table
get logical-switch <segment-VNI> arp-table
get logical-switch <segment-VNI> vtep-table

## ssh to KVM host
sudo -i     # enter root mode
virsh dumpxml <vm-name> | grep interfaceid  # obtain the interfaceid of the required vm
nsxcli      # enter nsxcli command mode
get logical-switches    # It will list the switches VNI, UUID, DVS name, VIF numbers
get logical-switch <segment-VNI>
get logical-switch <segment-VNI> ports
get logical-switch <segment-VNI> map-table
get logical-switch <segment-VNI> arp-table
get logical-switch <segment-VNI> vtep

Check GENEVE VMKernel

esxcli network ip interface ipv4 | get vmk10
        vmk10 is the TEP for NSX
esxcli network ip interface ipv4 | get vmk50
        vmk50 is for intra-tier networking/routing and containers.

Verifying overlay tunnel reachability

Ping destination TEP interface from the source host

vmkping ++netstack=vxlan -s     Vxlan is used by host rather than GENEVE.  It's the same stack for ESXi. 
Try 1572 if 1575 fails          This is the minimum size needed to support GENEVE. GENEVE adds 72 bytes to a 1500 byte data packet.
If 1572 fails try 1472          if that works, the overhead for the overlay hasn’t been configured.

Example
vmkping ++netstack=vxlan -s 1572 -d <TEP-IP>    # using 1572 data bytes, and ping destination TEP

N-VDS Not Initialised on a Host

If a VM is not able to communicate on a specific host, check that the segment is present, if it isn’t showing on the host, go into the GUI, and check the N-VDS segment is present. If it is, check the advanced settings virtual switches and look for any errors like Partial Success, or other information.

If this happens, check that the agents are running on the host.

/etc/init.d/nsx-mpa status
esxcli network ip connection list   grep 5671
/etc/init.d/nsx-proxy status
esxcli network ip connection list   grep 1235
/etc/init.d/nsx-opsagent status

# KVM host
service nsx-proxy status
netstat -nap | grep 1234
netstat -nap | grep 1235

Routing Problems

  1. Check if BGP neighbours are not misconfigured and as a result the neighbour relationship is not established.
  2. Check the internal route advertisement on the Tier-1 router is misconfigured
  3. Route redistribution on the Tier-0 router is misconfigured
Especially those check boxes! 
Check Routing Table get logical-router 
Check the SR for routing
Validate the routing table for the Tier-0 
Check SR 
Check VRF
get route b = BGP

DR (Distributed Routing)

For DR check the forwarder for similar information get forwarding

BGP neighbour

get bgp neighbor summary
Check the status is established     # Need status as "established"
    where
        Active means still setting up!

BGP route table

Tier0 SR can show BGP route info
get bgp ipv4

Logical routers verification process

## ssh to NSX manager
get logical-routers
get logical-routers | find <router-name>    # It will show distributed router (DR) or service router (SR)

## ssh to ESXi host
nsxcli  # enter nsxclic command mode
get logical-routers
get logical-router <logical-router-uuid>    # show logical router information, such as LIF number, state
get logical-router <logical-router-uuid> interfaces # show LIF uuid, overlay VNI, IP and netmask

# exit the nsxcli command mode, back to root user command prompt
exit    # exit nsxcli command mode
net-vdr -I --brief -l   # list distribute router UUID, LIFs, routes
        # This command is equivalent in nsxcli "get logical-routers"
net-vdr -I --brief -l <router-uuid>

## ssh to KVM host
sudo -i
nsxcli      # enter esxcli command mode
get logical-routers
get logical-router <logical-router-uuid>

## ssh to NSX edge node
get logical-routers     # list the logical routers
vrf <DR/SR-vrf-id>
get bgp neighbor summary    # list router id, local AS, neighbor IP, remote AS, state
get route       # list the routes learned from bgp
get route connected     # list the directly connected routes
get route bgp       # list routes learned from bgp
get interfaces      # display details of the logical router interfaces
get forwarding      # display the logical router forwarding table, such as gateway IP, MAC

Firewall

Most common firewall issues are

Firewall policy rules are configured but not enabled or published
Firewall policy rules are not applied to the intended entity
The sequence of rules is incorrect, remember it’s top to bottom, left to right (categories)

1. KVM
get firewall status summary 
    Ovs-appctl used for configuration of Firewall.
    
Validate with
    ovs-appctl -t /var/run/openvswitch/nsxa-ctl dfw/vif
Get the VIF then type
    ovs-appctl -t /var/run/openvswitch/nsxa-ctl dfw/rules  Rules are defined with addrsets (address sets).  These have GUIDs on them as well.

2. ESXi
Use vsipioctl and summarize-dvfilter Summarize-dvfilter | grep 
    Look for the filter name then use vsipiolctl getrules -f

The example adds the -A16 variable
    which tells grep to add 16 lines to the output.
    
This is without the -A16  and with

Can also use the addrsets in the filter instead of name.  

Same commands again but with -f addrset number.

The edges give definition of what’s in the rule sets 

using   get firewall ruleset rules 

You get the interface_id by running get firewall interfaces

Distributed Firewall verification process

# ssh to ESXi host
summarize-dvfilter | grep -A<number-of-lines> <VM-name> # Retrieve the name of the dvfilter associated with the vNIC of the VM
vsipioctl getrules -f <dvfilter-name>   # get the distributed firewall rule associated with a dvfilter
vsipioctl getaddrsets -f <dvfilter-name> # get IP and MAC address associated with the distributed firewall rule for a dvfilter
vsipioctl getfwconfig -f <dvfilter-name> # get the distributed firewall configuration
        # this command provides the combined output of getrules and getaddrsets

# ssh to KVM host
sudo -i     # enter root
ovs-appctl -t /var/run/openvswitch/nsxa-ctl  dfw/vif    
    # get the virtual interface identifier fro the vNICs that have associated distributed firewall rules on the KVM host
ovs-appctl -t /var/run/openvswitch/nsxa-ctl  dfw/vif  <VIF-id>
    # get the distributed firewall rule associated with a virtual interface
ovs-appctl -t /var/run/openvswitch/nsxa-ctl  dfw/addrset <addrset>
    # get the IP and MAC address associated with the distributed firewall rule for a dvfilter

# ssh to Edge node
get firewall interfaces # get all edge interfaces that have firewall rule configured
get firewall <uuid> ruleset rules   # get firewall rules associated with the uplink interface

nsxdp-cli

Can get deeper analysis with nsxdp-cli

Again, this command only shows 1 line as the -A1 is used in the egrep

Edge Validation

nsxcli
get configuration
get node-uuid
get interfaces
get managers
get host-switches
get tunnel-ports
get vtep

Verify network flow and packet

tcpdump

# KVM
Open vSwitch module must be installed on KVM hypervisor host before the host can be prepared and configured as a transport node

NSX-T CLI References

https://vdc-download.vmware.com/vmwb-repository/dcr-public/cc42e3c1-eb34-4567-a916-147e79798957/8264605c-a5e1-49a8-b603-cc78621eeeab/cli.html#about

https://www.simongreaves.co.uk/nsx-t-cli-reference-guide/

NSX CLI (nsxcli) is the command line tool for troubleshooting NSX-T. It’s run in a non-root mode so you have to use the command structure available. For instance there is no grep, But you can use find and pipe instead.

get logical-switches | find segments

You can use the nsxcli command line tool from various elements throughout the NSX-T deployment including NSX Manager, Edges and ESXi Transport Nodes.

From Edges

SSH Access to Edges, or nodes Post deployment, open the console

get service ssh         # verify the SSH service is stopped.
start service ssh       # Start the SSH service.
set service ssh start-on-boot   # Set the SSH service to autostart when the VM is powered on.
get service ssh         # Verify that the SSH service is running and Start on boot is set to True.
set cli-timeout 0       # Disable the command-line timeout.
get logical-routers     # Obtain routing information for the gateways.
                        # Verify that the SERVICE_ROUTER_TIER0 service gateway appears with an associated VRF ID.

vrf

Vrf is used to access the gateway virtual routing functions. For BGP and other Tier-0 services you access the Tier-0 service router (SR) function of the gateway.

vrf <vrf_ID>     vrf 2
get bgp neighbor summary        # verify the BGP state
                                # Important: Check the status is established. A status of Active means still setting up!
get bgp neighbor                # View further information on the BGP connection. Also shows whether the connection is established or not.
get bgp ipv4         # View ipv4 bgp information.

Press q to quit out of BGP neighbor output.     #Exit the Tier-0 VRF service gateway mode.

Route Information from the Edges

get logical-routers
vrf <vrf_id>
get route       # Shows the routes learned from the BGP peer

For DR check the forwarder for similar routing information.

get forwarding

Connect to the SR to collect that forwarding information, also get it from the DR. This is useful for seeing routing configuration for DR components throughout the environment, such as those on a Tier-1 DR.

DHCP

Runs on Edges.

get dhcp servers
get dhcp ip-pools
get dhcp leases

Load Balancer

# Load balancer commonly deployed as inline toplogy or one-arm topology

get load-balancer   
            # The output shows the general load balancer configuration, including UUID and Virtual Server ID.
get load-balancer UUID virtual-server <Virtual_Server_ID>   
            # Copy the UUID and the Virtual Server ID values and paste them. Verify the virtual server configuration.
get load-balancer UUID pools
            # Verify the server pool configuration, UUID is the value that you recorded for the load balancer.

Load balancer verification

# ssh to NSX edge node
get load-balancer   # verify load balancer configuration
get load-balancer <uuid> virtual-server     # verify virtual server configuration
get load-balancer <uuid> virtual-server <vs-uuid> status
get load-balancer <uuid> virtual-server <vs-uuid> stats
get load-balancer <uuid> pools
get load-balancer <uuid> error-log  # get the error log

VPN Connectivity Tests on Edges

get ipsecvpn session active
            # Verify that the L2VPN session is active, identify the peers, and ensure that the tunnel status is up.
get ipsecvpn session status
            # Verify that the sessions are up.
get ipsecvpn session summary
            # Check whether the ipsecvpn session is up between the local and remote peers.
get ipsecvpn ikesa <session-id>
get ipsecvpn tunnel stats

get l2vpn service config
get l2vpn sessions
            # Get the l2vpn session, tunnel, and IPSEC session numbers, and check that the status is UP.
get l2vpn session stats
            # Get statistical information of the local and remote peers, whether the status is UP, count of packets received, bytes received (RX), packets transmitted (TX), and packets dropped, malformed, or loops.
get l2vpn session config
            # Get the session configuration information.

Layer 2 VPN verification process

# ssh to NSX Edge node
get ipsecvpn session summary    # list all IPsec VPN sessions
get ipsecvpn session sessionid <session-id>   # list session information, such as tunnel
get l2vpn sessions  # list Layer2 VPN sessions
get l2vpn sessions config   # show the L2 VPN sessions configuration
get l2vpn session <uuid> logical-switches   # show L2 VPN session's logical switches

NSX Manager

set user <username> [password <password> [old-password \<old-password>]]
            # To change the password of an account run:
get certificate api thumbprint  # Obtain NSX manager certificate thumbprint

Authentication Policy Settings for Local Users

Use the following to set:

1. Password length
set auth-policy minimum-password-length <password-length>

2. UI and API authentication policies. The UI and API local users have the same policy.
set auth-policy api lockout-period <lockout-period>

set auth-policy api lockout-reset-period <lockout-reset-period>

set auth-policy api max-auth-failures <auth-failures>

3. Set CLI authentication policy
set auth-policy cli lockout-period lockout-period <lockout-period>

View Logs

# NSX CLI
get log-file policy.log

# Engineering Mode
Use st en to enter engineering mode (root privileged mode)

Syslog - Manager and Edges

set logging-server <hostname-or-ip-address[:port]> proto \<protocol> level <level>

Transport Nodes

Logical Switches

#* List all logical switches, it will shows VNI, UUID, Name, Type
get logical-switches

#* List all transport nodes associated with a logical switch.
get logical-switches <switch_UUID> transport-node-table

#* List all TEPs associated with a logical switch.
get logical-switches <switch_UUID> vtep

#* List MAC table associated with a logical switch.
get logical-switches <switch_UUID> mac-table

#* List ARP table associated with a logical switch.
get logical-switches <switch_UUID> arp-table

#*** On ESXi host to retrieve the VTEP, MAC and ARP entries
get logical-switch <VNI-uuid> vtep-table
get logical-switch <VNI-uuid> mac-table
get logical-switch <VNI-uuid> arp-table

Deploy Manager Cluster

To deploy a manager to an existing cluster, get the cluster configuration ID, make a note of the existing Managers certificate thumbprint and use that to join the new node to the cluster. Finally get the cluster status to confirm the new host has joined

get cluster config  

join <NSX-Manager-IP> cluster-id <cluster-id> username<NSX-Manager-username> password<NSX-Manager-password> thumbprint   <NSX-Manager1's-thumbpint>  

get cluster status

OR via API

POST https://<nsx-mgr>/api/v1/cluster?action=join_cluster

ESXi Configuration

Several esxcli commands can be used to aid in NSX-T configuration.

esxcli network firewall ruleset set -r syslog -e true  
esxcli system syslog config set --loghost=<hostname-or-ip-address[:port]>  
esxcli system syslog reload 

Can also use the nsxcli command set such as:

get logical-switches    # It shows Overlay Kernel Entry - VNI, DVS Name, VIF number
                        # Also Overlay LCP Entry - VNI, Logical Switch UUID

get logical-switch <UUDID>  # View detail of specific logical switch

N-VDS and Tunnel Information

#***** ESXi Host
# Verify the kernel modules installed
esxcli system module list | egrep -i 'vswitch|vdl2|ens'

# Verify TEP
esxcli network ip interface ipv4 address list

esxcfg-vswitch -l   # Verify switches and N-VDS configuration
esxcfg-vmknic -l    # Verify TEP interfaces configuration

esxcfg-nics -l      # verify the status of the uplinks
esxcli network nic up -n vmnic3     # bring up vmnic3

# Show details of N-VDS configuraton
net-dvs     # very detail information

# Show logical switch and N-VDS info
# Include - summary, Overlay Kernel Entry (VNI, DVS Name, VIF num), Overlay LCP Entry (VNI, UUID)
get logical-switch

# Display the status of the overlay tunnels Note N-VDS’s used to be called host switches, hence the get host-switch command.
get-host switch <N-VDS_NAME> tunnel

#****** KVM Host
ovs-vsctl list open_vswitch     # display the Open vSwitch configuration
ovs-vsctl show      # list the NSX bridges installed on the KVM hosts

KVM Configuration

Syslog

Login as root
Create this file
/etc/rsyslog.d/40-vmware-remote-logging.conf

Add this line to the file
'*.*@<syslog_server_ip>:514;RFC5424fmt'

Restart syslog
systemctl restart rsyslog

nsxcli can also be used as outlined above. ESXi gives a bit more info as the kernel info is available in ESXi that isn’t there for KVM.

Verify the VM running state

1. ssh to KVM host
2. Verify VM running state
    sudo virsh list -all    # list all VM running state
3. Power on VM
    sudo virsh start <VM-name>

Packet Capture

Can use CLI to setup network packet capture on:

1. NSX Manager
start capture interface <interface-name> [file <filename>] [count <packet-count>] [expression <expression>]

2. NSX Edges
set capture session <session-number> interface <port-uuid> direction <direction>

Example.
set capture session 1 interface fp-eth1 direction in
set capture session 1 expression src net 172.20.10.0/24

3. Removed captured session information with:
del capture session 1

ESXi

# Collect packets. Can send to a file.
pktcap-uw       # pktcap-uw --help

# Can Pipe it to view captured packets on the screen.
pktcap-uw | tcpdump -uw

# To review captured packets on the ESXi host
tcpdump-uw

KVM

tcpdump

Verify installation problems

get services  
get service <service name>  
get cluster status  
get configuration  
get managers 

Cluster Configuration Validation

NSX Manager nodes (Logs)

get cluster status (nsxcli)         get services (nsxcli)       get log-file (nsxcli)       Login as root (Linux)
DATASTORE                           datastore                       -                       /var/log/corfu/corfu.9000.log
CLUSTER_BOOT_MANAGER                cluster_manager                 -                       /var/log/cbm/cbm.log
CONTROLLER                          controller                      -                       /var/log/cloudnext/nsx-ccp.log
MANAGER                             manager                         manager.log             /var/log/proton/nsxapi.log
POLICY                              policy                          policy.log              /var/log/policy/policy.log
HTTP                                http                            http.log                /var/log/proxy/reverse-proxy.log
-                                   -                               syslog                  /var/log/syslog

Example to verify cluster service and start the service

get service http
start service http

Process to detach a failed NSX manager node

get cluster status      # Verify the status as DEGRADED, and note down the failed node UUID
detach node <failed-node-uuid>
get cluster status      # Ensure the cluster status as STABLE
get certificate api thumbprint  # retrieve the NSM manager thumbprint
get cluster config      # retrieve the cluster ID
# Then ssh to the new NSX manager node, and join the new NSM manager node to NSX manager cluster
join <NSX-node-IP> cluster-id <cluster-id> thumbprint <NSX-thumbprint> username <admin> password <password>
get cluster status      # verify cluster status as STABLE

Transport Node Preparation

Verify transport node preparation

# ESXi host
esxcli software vib list | egrep "nsx|vsip"     # verify NSX-T data center packages installed on the esxi host
esxcli system module list | grep nsx            # verify the kernel module installation
esxcli network ip interface ipv4 address list   # list the VMkernel IPv4 address list
esxcli network ip netstat list      # list the TCP/IP stacks available on the transport node
esxcfg-vswitch -l       # list the vSwitch available on the ESXi transport node
/etc/init.d/nsx-proxy  status    # Verify the NSX-Proxy agent service running status
esxcli network ip connection list | egrep "1234|1235"   # Verify the connections are established

# KVM host
dpkg --list | grep nsx
ifconfig    # verify IPv4 address
ovs-vsctl show      # Verify Open vSwitch configuration
service nsx-proxy status    # Verify NSX proxy running status
netstat -nap | grep 1234
netstat -nap | grep 1235

vSphere

Verify ESXi Hosts

# Check all VIBs / software installed on vSphere
esxcli software vib list | grep -e nsx -e vsip      
esxcfg-module -l | grep nsx         # This may work

# Check ESXi agenets status
/etc/init.d/nsx-proxy status
esxcli network ip connection list | grep 1234
esxcli network ip connection list | grep 1235

KVM

# Ubuntu
dpkg --list | grep nsx

# Redhat
rpm -qa | grep nsx

Check TEP and Hyperbus

Hyperbus is for containers.

# verify TEP
esxcli network ip interface ipv4 address list
    # vmk10     TEP interface
esxcli network ip netstack list     # verify the TCP/Ip stacks used by TEP and hyberbus interface
                                    # TEP interfacce uses vxlan
esxcfg-vswitch -l       # verify N-VDS configuration

vSphere

# Check ip v4 address list
esxcli network ip interface ipv4 address list
    # Output - Overlay (TEP and vmk10 (default vmk)). Hyperbus (vmk50 default vmk)).

# Verify TCP/IP for TEP and hyberbus.
esxcli network ip netstack list
    # Vxlan is GENEVE in ESXi.

KVM

# Check network interface and address
ifconfig
    # verify ethx, and nsx-vtep0.0

# Check open vswitch
ovs-vsctl list Open_vSwitch
    # It shows UUID of bridges,, bridges, datapath_types,ovs_version (Open switch version), system_type (Host OS type and version)

ovs-vsctl show
    # It shows information, such as "Bridge nsx-managed", is_connected
    # Port hypberbus: Interface hypberbus   # The hyperbus interface is connected to the nsx-managed bridge, manages the Containers
    # Bridge "nsx-switch.0"     # The first bridge
    #       Port "nsx-vtep0.0"
    #       Interface "nsx-vtep0.0"     # The nsx-vtep 0.0 interface is connected to the nsx-switch.0 bridge, 
                                        # and is responsible for encapsulating and decapsulating the overlay traffic            

dpkg --list | grep nsx      # list the installed nsx modules
service nsx-proxy status    # verify nsx agent status

Agents and Connectivity

ESXi

/etc/init.d/nsx-mpa status
esxcli network ip connection list | grep 5671  
/etc/init.d/nsx-proxy status  
esxcli network ip connection list | grep 1235  
/etc/init.d/nsx-opsagent status

KVM

service nsx-mpa status  
netstat -nap | grep 5671 
service nsx-proxy status 
netstat -nap | grep 1235 

Other commands

ESXi

type the esxcli command.
esxcli network ip connection list | grep 1235

KVM

netstat -anp –tcp | grep 1235

Verfiy NSX Edge installation and configuration

get configuration   # verify configuration
get managers    # list NSX manager nodes
get node-uuid
get interfaces
get host-switches
get tunnel-ports
get vteps

Checking Communication from Host to Controller and Manager

1. ESXi - Verify output for status
# On an ESXi host using NSX-T CLI commands:
get managers
get controllers

2. KVM
# On a KVM host using NSX-T CLI commands:
get managers
get controllers

View details of N-VDS, the N-VDS has its own command line tool, net-vdl2.

# View detail information, such as NsX VDS name, VDS ID, VTEP interface, logical network, Controller ip and status
net-vdl2 -l     # https://kb.vmware.com/s/article/66796

ESXi LIF MAC View The LIF (Logical Interface) vMAC to pMAC on ESXi host.

# Detail information, such as DvsName, NumLifs, DRvMAC (vMAC), pMAC, uplink, team or non-team member
net-vdr -C -l
    # Note:     02:50:56:56:44:52 is always the vMAC for the LIFs for all DRs.

# To view the DR instance information on an ESXi host
net-vdr -l -I       # -l   "l" for list
                    # -I   "I" for instance

Verify Infrastructure Communication Events

Infrastructure communication events arise from the NSX Edge, KVM, ESXi, and public gateway nodes.

# ssh to Edge node, verify status
nsxcli get tunnel-ports

# On each tunnel, check the stats for any drops
get tunnel-port <port-uuid> stats

# Verify service
get services
start service <service-name>    # if service stops

Verify node agent health

### For ESX
# Verify vmk50, if missing, recreate it
# if Hyperbus 4094 is missing,
    restart nsx-cfgagent

# if nsx-cfgagent has stopped
restart nsx-cfgagent

### For KVM
# if Hyperbus namespace is missing
restart nsx-cfgagent

# if nsx-agent has stopped
restart nsx-agent

DFW Validation

Distributed Firewall Configuration Validation

# Distributed firewall policies are divided into five default categoires
Ethernet -> Emergency -> Infrastruture -> Environment -> Application

NSX Manager

get firewall status
get firewall pubished-entities  # List the rules pusblished to CCP

ESXi

nsxt-vsip is the DFW module.

# Run commands to verify status
/etc/init.d/nsx-mpa status      # NSX-Management-Plane-Agent
/etc/init.d/nsx-proxy status    # nsx-proxy agent service running status
/etc/init.d/nsx-opsagent status     # opsAgent running status
esxcli system module list | grep nsxt-vsip      # Verify nsxt-vsip running status
esxcli network ip connection list | grep 5671   # Verify tcp 5671 for mpa established status
esxcli network ip connection list | grep 1235   # Verify nsx-proxy established status

Time based rules require NTP service

/etc/init.d/ntpd status     # Check NTP service
ntpd -p     # Verify NTP associations on the ESXi host

Check dvFilter for firewall rules.

summarize-dvfilter | grep <VM_NAME>

summarize-dvfilter | grep -A <portnumber> <VM_Name>     # Run summarize-dvfilter commmand, to list the port number and dvFilter name of a VM

vsipioctl getrule -f <vNIC>     # Verify the firewall rule appliedd to the vNIC of the virtual machine

KVM

KVM Host firewall configuration verification

# Use these to validate the distributed firewall Settings. View app firewall virtual interfaces.
ovs-appctl -t /var/run/openvswitch/nsxa-ctl dfw/vif    

# View firewall rules with containing addrsets.
ovs-appctl -t /var/run/openvswitch/nsxa-ctl dfw/rules <VIF_ID_NUMBER    

Gateway Firewall Validation

Gateway Firewall predefined categories

Emergency -> System -> Pre Rules -> Local Gateway -> Auto Service Rules -> Default
            Where
                Pre Rules   # globally applied across to all NSX gateway nodes

Useful firewall operation and troubleshootings

## On ESXi host
vsipioctl -h    # help menu

# Get the list of VMs on the ESXi host and associated filter name
summarize-dvfilter | grep -A 3 vmm

# Get the firewall rules applied to a VM
vsipioctl getrules -f <filter-name>
        # Example:  vsipioctl getrules -f nic-7014985-eth0-vmware-sfw.2

# Get stats per FW rule per VM VNIC
# Use "-s" with the above command to get the firewall stats associated with the VM firewall rules.
vsipioctl getrules -f nic-7014985-eth0-vmware-sfw.2 -s

# Get the addrset/groups used in the VM's Firewall rules
# The firewall rule uses groups/addrset in the Source or destination. This output gets the all the addrset used in the rules based on the grouping configuration
vsipioctl getaddrset -f nic-1371516-eth0-vmware-sfw.2

# Get the active Firewall flow per VM
# NSX DFW maintains active flow per VNIC. This output gets the all the active flows over that VNIC.
vsipioctl getflows -f nic-7014985-eth0-vmware-sfw.2

# et the active Full Firewall config per VM
# This output provides full firewall config per VNIC- Rules, Addrset & Profiles used.
vsipioctl getfwconfig -f nic-7014985-eth0-vmware-sfw.2

### NSX CLI for firewall troubleshooting
nsxcli  # enter nsxcli command mode
get firewall <enter>    # list all get firewall command options
get firewall <interface_uuid> interface stats | json    # check flow table usage
get firewall packetlog last 10   # get the last 10 packetlog entries
get firewall exclusion
get firewall thresholds

### Troubleshooting Distributed Firewall on KVM Hosts
get firewall vifs   # list all VIFs
get firewall <vif-uuid> ruleset rules  # Discover firewall rules that apply to a specific VIF
get firewall <vif-uuid> addrsets    # Get the list of address sets used in a specific VIF
get firewall <vif-uuid> profile     # Get the list of APPIDs and FQDNs used in a specific VIF
get firewall <vif-uuid> fqdn        # Discover FQDN of specific VIF
ovs-appctl dpctl/dump-conntrack -m | grep <source-ip>| grep <dest-ip> # look for flows between two specific IP addresses

### Troubleshooting Gateway Firewall
# Gateway firewall is implemented on NSX Edge transport node
get logical-router  # Get UUID of the Gateway on which Firewall is enabled

# Get all Gateway interfaces using UUID
# Gateway firewall is implemented per Uplink interface of a Gateway. Identify the uplink interface and get the interface ID from the output below.
get logical-router <gateway-uuid/SR-uuid/DR-uuid> interfaces
get firewall <GW-interface-uuid> ruleset rules  # Get Gateway Firewall Rules on a GW Interface

# Check Gateway Firewall Sync status
# Gateway Firewall sync flow status between Edge Nodes for high availability. Gateway firewall sync config can be seen using the output below
get firewall <GW-interface-uuid> sync config

# Check Gateway Firewall Active Flows
get firewall <GW-interface-uuid> connection

# Check Gateway Firewall Logs
# Gateway firewall logs provide the gateway VRF and GW Interface information, along with flow details. 
# Gateway firewall logs can be accessed on the edge, or can be sent to Syslog Server. 
# Firewall logs provide the logical router VRF, firewall interface ID, FW rule ID & flow details.
get log-file syslog | find datapathd.firewallpkt

# Other Command Line Options for debugging Gateway Firewall
get firewall <GW-interface-uuid>  <ENTER>

#### Distributed Firewall Packet Logs
 /var/log/dfwpktlogs.log    # The log file is for both ESXi and KVM hosts

Post Upgrade

Post upgrades, use these CLI commands to validate that NSX-T has been upgraded successfully and the correct NSX-T modules are installed.

vSphere

esxcli software vib list | grep nsx

KVM

# Ubuntu
dpkg -l | egrep 'nsx|openvswitch'

# Red Hat
rpm -qa | egrep 'nsx|openvwitch'

# nsxcli
get version

Simple Network Management Protocol (SNMP)

You can use Simple Network Management Protocol (SNMP) to monitor your NSX-T Data Center components. The SNMP service is not started by default after installation.

Download and install the file VMware-NSX-MIB.mib

### SNMP configuration - NSX manager CLI or NSX Edge CLI
# For SNMPv1/SNMPv2
set snmp community <community-string>
start service snmp

# For SNMPv3
set snamp v3-user <username> auth-password <auth-password> priv-password <priv-password>
start service snmp

Port Mirroring in Manager Mode

Note that logical SPAN is supported for overlay logical switches only and not VLAN logical switches.

For a local SPAN session, the mirror session source and destination ports must be on the same host vSwitch.

## Port mirroring session
# select session type, the available types are
Local SPAN      # Select transport node
Remote SPAN     # Sessoin Type - RSPaN source session, or RSPAN Destination session
                # select transport node
                # Encapsulation VLAN ID
Remote L3 SPAN  # Encapsulation - select GRE, ERSPAN TWO, or ERSPAN TREE
Logical SPAN    # Logical switch (overlay logical switch only, NOT VLAN logical switch)

NSX-T Command Line Cheat Sheet

https://www.simongreaves.co.uk/nsx-t-command-line-cheat-sheet/