Published on

Linux Useful Commands

Authors
  • Name
    Jackson Chen

Monitor Log in Real Time

By default, tail command will display the last 10 lines of a file. For instance, if you want to watch in real time only the last two lines of the log file, use the -n file combined with the -f flag.

# Keep track of the log file, and will start following the new file
sudo tail -F /var/log/apache2/access.log

# Only watch the last 2 lines
sudo tail -n2 -f /var/log/apache2/access.log
 
# without option, watch will run the command every two seconds
watch <command> 
watch -n (interval in seconds) <command>  Example: watch -n 5 df -h
watch -d <command>      # highlight the difference between updates
watch '<command_1> | <command-2> '
      Example     watch "netstat -anp | grep -c ':80\b.*LISTEN'"

# Check file type
file filename

Log in over the network

Login to remote Linux server, run "ssh remoteuser@remotehost", then enter the password when prompt.

Useful Command-line editing shortcuts

HORTCUT         DESCRIPTION
Ctrl+A          Jump to the beginning of the command line.
Ctrl+E          Jump to the end of the command line.
Ctrl+U          Clear from the cursor to the beginning of the command line.
Ctrl+K          Clear from the cursor to the end of the command line.
Ctrl+LeftArrow  Jump to the beginning of the previous word on the command line.
Ctrl+RightArrow Jump to the end of the next word on the command line.
Ctrl+R          Search the history list of commands for a pattern.

Directory Operations

# Create directory structure, include all the folders and subfolders
mkdir -p <Dir1>/<Dir2>/<Dir3>

# Copy all files and subdirectory to destination directory
cp -r  <DirSource> <DirDest>

# Delete all files and subdirectory forcely. Care must be taken!
rm -rf /<Dir1>/<Dir2>

Pattern Matching

Command-line metacharacters are replaced by the match list prior to command execution.

PATTERN         MATCHES
*               Any string of zero or more characters.
?               Any single character.
[abc...]        Any one character in the enclosed class (between the square brackets).
[!abc...]       Any one character not in the enclosed class.
[^abc...]       Any one character not in the enclosed class.
[[:alpha:]]     Any alphabetic character.
[[:lower:]]     Any lowercase character.
[[:upper:]]     Any uppercase character.
[[:alnum:]]     Any alphabetic character or digit.
[[:punct:]]     Any printable character not a space or alphanumeric.
[[:digit:]]     Any single digit from 0 to 9.
[[:space:]]     Any single white space character. This may include tabs, newlines,
                carriage returns, form feeds, or spaces.

Variable Expansion

A variable stores a value in memory, use variable expansion ${VariableName} to convert the variable name to its valuee

echo ${LogfileName}

Command Substitution

Command substitution occurs when a command is enclused in parentheses, and preceded by a dollar sign "$"

echo The output is ${date +%A}

Man Pages

Linux system manual pages are man pages. Each section contains information about a particular topic.

SECTION CONTENT TYPE
1   User commands (both executable and shell programs)
2   System calls (kernel routines invoked from user space)
3   Library functions (provided by program libraries)
4   Special files (such as device files)
5   File formats (for many configuration files and structures)
6   Games (historical section for amusing programs)
7   Conventions, standards, and miscellaneous (protocols, file systems)
8   System administration and privileged commands (maintenance tasks)
9   Linux kernel API (internal kernel calls)

To read required section of the man page, use the section number

man 1 echo

Search man Pages

# Lowercase k to list detail information about the keyword or command to search
man -k <Keyword/command>

# search string in manual
/string

Standard Input, Standard Output and Standard Error

                            |           | --------> stdout (1)  |                   |
   stdin (0) ------------>  |           |                       | Display/Monitor   |
   (Keyboard)               |  Process  | --------> stderr (2)  |                   |
                            |           | 
                            |           | --------> Other (3,4+) [Files for read/write]

We could redirect output to files

> filename      # Redirect stdout to file (overwrite if file exists)
>> filename     # redirect stdout and append to file
2> fiilename    # redirect stderr to file (overrite if file exists)
2> /dev/null    # discard stderr as we are redirecting them to /dev/null
&> filename     # redirect both stdout and stderr to same file (overrite if file exists)
& >> filename   # redirect both stdout and stderr to file, and append to same file

# redirect stderr(2) to terminal/display, only rediret stdout(1) to file
2>&1 > file     

Redirect Output Usage

To only see the result found, and ignore error by redirect stderr

find /var -name apache 2> /dev/null

To save both result fine and error in output files

find /var -name apache > /tmp/output 2> /tmp/errors

tee Command is a useful command when working with pipeline. tee copies its stdin to its stdout(1) and also redirects its stdout(1) to file named in the command arguments.

# It will save the list result to file, also display on the monitor/terminal
ls /var/apache -l | tee /tmp/apache-files

Verify Process

view process information with user association
It is using /etc/passwd to associate the users
ps -au  

User Management

id          # show current login user information, such as uid, gid
id test1    # show test1 information
   
useradd test1   # create user
usermod test1   # update user
passwd test1    # change user password
userdel -r test1    # remove/delete user

chage -d 0 test1    # force password change on first login
chage -M 90 test1   # set password maximum to 90 days
chage -l test1      # verify password expiry day

User Password Policy

User password policy is managed by /etc/login.defs

User Account Expiry Management

# Set user account expire day
$ExpiryDate = date -d "+365 days" +%F
chge -E $ExpiryDate test1

Group Management

id groupname            # verify group information
groupadd -r groupname   # create new group
groupmod    groupname   # modify group
groupdel    groupname   # delete group

Permission Update

chmod, chown, and chgrp

Special Permission

Setting special permissions

setuid = u+s    (4)
setgid = g+s    (2)
sticky = o+t    (1)

where
g+s (sgid)      File executes as the group that owns the file

chmod g+s  Dir1     # Add the setgid bit on Dir1 directory
chmod 2 Dir1        # Same effect as above

umask

To view the permissions of files and directory by run "umask" command

The normal value is 0002, where owner and group both have read, write, and execute permission on the directory. The other is having read permission

To grand others to have full permission, run "umask 0"

Process and Job Management

# List process status
ps

-r      # Running
-a      # Not associated with a terminal
-x      # Process owned by you
-p process_id   # Process by process ID
-s session_id   # Process by session ID
-f      # Format listing
-F      # Extra full format
v       # Display virtual memroy format
e       # See environment of the command

# View processes using highest memory
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem

kill -l # list the names and numbers of all available options
kill <pid>  # kill process ID
kill <number> <pid>     # send kill signal number to the process ID

Any command or pipeline can be started in the background by appending an ampersand (&) to the end of the command line.

Example
ls -l &
find /var -name test | sort &

To interactive with jobs

jobs            # list all the jobs
fg %<JobID>     # bring the background job with its job ID to forground 

ps j            # display information about jobs
        T   # Suspend state
ps jT           # view the remaining jobs

bg %<SuspendedJobID>    # start the suspended job
      [JobID]+          # Indicate the job is the current default job

To kill all jobs running by the user

pgrep -l -u <TestUser> | pkill -9
pkill -P <processid>

To view the process tree

pstree
pstree -p <testuser1>

System Process and Services

The systemd daemon manages startup for Linux, including service startup and service management in general. It activates system resources, server daemons, and other processes both at boot time and on a running system.

Some common commands:
systemctl       # It will list all units that are loaded and active
systemctl list-unit 
systemctl status <name.type> 
systemctl is-active
systemctl is-failed
systemctl is-enabled
systemctl list-dependencies <name.type>
systemctl enable <name.type>
systemctl start <name.type>
systemctl restart <name.type>
systemctl reload <name.type>

Remote Management

ssh <user1>@remotehost      # ssh to remote host
w   # list the users currently login the computer

Configure OpenSSH

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/ch-openssh

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_basic_system_settings/using-secure-communications-between-two-systems-with-openssh_configuring-basic-system-settings

Hardening Red Hat

Hardening Red Hat

Find files or directories, or word

find /file/path/name    -name "filename"
      find  /   -name "messages"
locate /    -name "filename"
grep 'ospf'  /var/log/message
find /file/path/name  -type f | grep "filename"
      file / -type f | grep message
tail -f /var/log/message
taile -F /var/log/message     # -F displays the new entry as it is appended

List Directory Usage and Size

du -sch
ls -ls -r /var/log      # List file size in reverse order
find . -name "*.gz" | xargs du -sch       # list all .gz files
ls *.gz -sch

View Disk Usage and Troubleshooting

df -h       # disk free and human readable format
vdf -h      # verify disk free
du -h       # verify disk usage
find / -xdev -size +1024k     # 
find / -size +1024k -exec du -h {} \; | less    # find files bigger than 1MB

df -kh  /<dir>  # verify directory disk usage, include allocated space and free space
du -hc /var/log && du -hc /var/log/audit | sort -n    # find disk usage for the required directory, and sorting

Useful Find Commands

find /<dir> -empty      # find empty file and directory
find /<dir> -perm 664   # Example, find file with "644" permission
find ./ -type f -name "*.txt" -exec grep 'Update' {} \;     # Print lines which have 'Update' in them
find / -type d -name Update   # find all directories which names is Update in / directory
find / -perm /a=x       # find all Executable files
find / -perm /u=r       # find all readonly files
find /tmp -type f -name ".*"        # find all hidden files
find / -type f -perm 0777 -print -exec chmod 644 {} \;      # find all 777 permission files, and set permission to 644

Verify network configuration

net-stat -I

DNS forward and reverse lookup
# Forward lookup
dig <hostname>    # dig www.google.com
dig +x <ip>       # reverse lookup
Removing carriage return in Linux or Unix
1. Use the sed: 
      sed 's/\r$//' file.txt > out.txt.
2. Use tr: 
      tr -d '\r' input.txt > out.txt.
Set service account never expire
# verify password expiry
chage -l service-account

# set password never expire
chage -m 0 -M 9999 -I -1 -E -1 service-account

Boot into Single User Mode

In single-user mode, your computer boots to runlevel 1

Your local file systems are mounted, but your network is not activated. You have a usable system maintenance shell. Unlike rescue mode, single-user mode automatically tries to mount your file system.

Note:
Do not use single-user mode if your file system cannot be mounted successfully. 
You cannot use single-user mode if the runlevel 1 configuration on your system is corrupted.
# How to boot to single user mode
1. Access the server console, via iLo, iDRAC, or Hitachi BMC, or from vSphere VM console
2. Reboot the server, at the GRUB splash screen at boot time, press any key to enter the GRUB interactive menu.
3. Select Red Hat Enterprise Linux with the version of the kernel that you wish to boot, and type a to append the line.
      "a"   Type a to append the line
4. Go to the end of the line starts with linux, and type single as a separate word (press the Spacebar and then type single)
      Note
      you could type    1     at the end of the line instead of  "single"
5. Press Enter to exit edit mode.
6. Once in single user mode, you could carry out disk partition resizing maintenance
7. Finally reboot the server after finishing the maintenance tasks.

How to boot RHEL into Maintenance Mode

In the event that the root password is forgotten, or the fstab mount has issue mounting the mount points, it is necessary to boot the system into maintenance mode.

Note
Sometimes, people refer the maintenance mode as single user mode, but they are different
1. On RHEL or Centos sytem, reboot the server, wait for GRUB boot menu to appear
Note:
At the bottom of the screen shows
      Press `e` to edit the select item, or 'c' for a command prompt
2. Select the kernel version from the GRUB menu, and press "e" key to edit the first boot option
3. Using the Down arrow key to find the kernel line starts with "linux", then change the argument
      ro   to     rw init=/sysroot/bin/sh
Note:
      Keep other options intact
4. Once the update has been done, press Ctrl+X  or F10 to reboot into emergency shell (Maintenance mode, sometime called single user mode)
5. Now mount the root (/) filesystem using the following command
      chroot  /sysroot/
6. Carry out maintenance tasks required
      passwd root       # Reset root password
      vi /etc/fstab     # Fix the /etc/fstab mount points
7. Update the change
      touch / .autorelabel
7. Reboot the system
      Type exit command twice, or type "reboot -f"
Netcat - nc command

https://linuxize.com/post/netcat-nc-command-with-examples/

https://www.studytonight.com/linux-guide/netcat-nc-command-with-examples

https://www.programmerhat.com/nc-command-not-found/

Netcat (or nc) is a command-line utility that reads and writes data across network connections, using the TCP or UDP protocols. It is one of the most powerful tools in the network and system administrators arsenal, and it as considered as a Swiss army knife of networking tools.

Netcat is cross-platform, and it is available for Linux, macOS, Windows, and BSD. You can use Netcat to debug and monitor network connections, scan for open ports, transfer data, as a proxy, and more.

# netcat syntax
nc [options] host port
      -z    # scan for open ports
      -u    # udp
      -t    # tcp  - default attempt to start a TCP connection without -t
      -v    # verbose
      -h    # help
      -n    # numeric-only IP address, no DNS   (for fast commuincation)

nc -z -v 10.10.10.10 20-80       # scanning tcp for open ports in the range of 20-80
nc -z -v 10.10.10.10 20-80 2>&1 | grep succeded       # only print the lines with open TCP ports
nc -z -v -u 10.10.10.10 20-80       # scanning udp for open ports in the range of 20-80

# find server software and version
echo "EXIT" | nc 10.10.8.8 22       # send EXIT command to server on default SSH port 22

#*** Creating a Simple Chat Server communication
# on first server - process to listern on port 5555, or other port
nc -l 5555        # example port 5555

# on the second server
nc firstserver.local    5555        # or nc <firstserver-ip> 5555

If you type a message, and press enter, it will show on both host. 
To close the connection, type CTRL+C
nmap-ncat

Ncat comes with the nmap package, and is supposedly a lot better as compared to netcat in terms of the features. It does miss out on a few basic features like port scanning, but it actually doesn’t need it, as the nmap command by itself is capable enough for that purpose. Otherwise, the features that Ncat offers over netcat are surely quite exceptional. Here are a few of the features of Ncat that netcat does not have.

  1. Support for IPv6
  2. Support for SSL
  3. Connection Broker
verify package installed
which <package name>
whereis <package name>

Network protocol troubleshooting

ss -l             # display an open network port
netstat -tulpn
netstat -nat      # tcp sockets
netstat -nau      # udp sockets
ss -s             # list all establish, closed, or orphaned waiting
netstat -s        # statistics

SELinux troubleshooting

ls  -ldZ    # Uppercase "z"

Journalctl

journalctl  -l  | grep <what to fiter>

NetCat nc command

https://www.varonis.com/blog/netcat-commands

https://ioflood.com/blog/nc-linux-command/

The Netcat (nc) command is a command-line utility for reading and writing data between two computer networks. The communication happens using either TCP or UDP. The command differs depending on the system (netcat, nc, ncat, and others).

# symtax
nc  [option]  <ip>  <port>
man  netcat       # To see all the options

nc  -zvw
client and server connection
# On device 1, run nc command in listen mode and provide a port
nc -lv  1234

# On device 2, run nc command with the ip address of device 1 and port
nc -v  <device-1 ip>  <device-1 listening port>
      nc -v  10.0.2.4  1234

# Send a message from either device, the same message shows up on the other device. Press CTRL+C to end the connection

Ping specific port
nc  -zv  <remote-ip>  <port>
      nc -zv www.google.com  443          # ping google.com on port 443
Scanning ports
nc -zv  <ip>  <port | port range>
      nc -zv 10.0.2.4   1234        # Verify whether port 1234 is open
      nc -zv 10.0.2.4  1230-12345

      nc -zv  10.0.2.4  1230-1235  2>&1   | grep 'succeeded'

Transfer file

# On device 1
nc -lv  <port> < <local-file>             # Create a listening connection on device 1, and redirect the file to the nc commmand
      nc  -lv  1234  <  file.txt

# On device 2
nc -zv <device-1-ip>  <listening port>  >  <local-file>
      nc -zv 10.0.2.4  1234  >   file.txt

list content of directory

ls -ld     <path-to-directory>
      ls  -ld  /etc/

ps

ps  -aux  | grep nginx        # verify whether nginx is running

dnf

dnf povides  /usr/bin/semanage
dnf whatprovides semanges

sudo !!

shred

If you ever wanted a file to be almost impossible to recover, shred can help you with this task.

shred file-to-be-shared.txt
shred -u file-to-be-shared.txt      # shred and delete the file right away

less

less (opposite of more) is a program that lets you inspect files backward and forward

less  <path-to-file>

tail

tail -f  <file-to-print>      # last 10 lines
tail -n 20  <file-to-print>
head  longfile.txt            # first 10 lines
head -n 20                    # first 20 lines

grep

Grep is one of the most powerful utilities for working with text files. It searches for lines that match a regular expression and print them

grep "match-text"  /path-to-file
grep -c "match-text"  /path-to-file       # "-c"      count the number of times the patterns repeats

whatis

whatis prints a single-line description of any other command, making it a helpful reference

whatis python

wc

Wc stands for “word count,” and as the name suggests, it returns the number of words in a text file

wc longfile.txt

# output
. lines
. words
. byte size

wc -w longfile.txt      # -w        only number of words

uname Command

uname(short for “Unix name”) prints the operative system information, which comes in handy when you know your current Linux version.

uname
uname -a    # "-a"      all

neofetch

Neofetch is a CLI (command-line interface) tool that displays information about your system — like kernel version, shell, and hardware — next to an ASCII logo of your Linux distro

neofetch

curl and wget

wget (World Wide Web get) is a utility to retrieve content from the internet. It has one of the largest collections of flags out there.

find currently login users in xrdp session

ps -e --format user:20,tty | uniq -id | sort -u

# Normal ssh or rdp session
who
who -a
who -a -H
last
last -p w
last -p now

du

du -sh  <directory>

env

System environment variables

env | grep -i proxy     # find proxy
env | grep http         # find http variables

xrander

Linux display utility

curl and wget

# curl and wget differenct
. wget's major strong side compared to curl is its ability to download recursively.
. wget is command line only. There's no lib or anything, but curl's features are powered by libcurl.
. curl supports FTP, FTPS, GOPHER, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, POP3, IMAP, SMTP, RTMP and RTSP. wget supports HTTP, HTTPS and FTP.
. curl builds and runs on more platforms than wget.
. wget is released under a free software copyleft license (the GNU GPL). curl is released under a free software permissive license (a MIT derivate).
. curl offers upload and sending capabilities. wget only offers plain HTTP POST support.

Also
. wget is a tool to download files from servers
. curl is a tool that let's you exchange requests/responses with a server

Another interesting feature of curl not possible with wget is communicating with UNIX sockets (i.e., communication even without a network). For instance we can use curl to talk to Docker Engine using its socket in /var/run/docker.sock to get a list of all pulled docker images in JSON format (useful for "programming", in contrast to the docker images CLI command which is good for "readability").

curl --unix-socket /var/run/docker.sock http://localhost/images/json | jq

curl

Curl is a command line tool that enables data exchange between a device and a server through a terminal, using any of the supported protocols (HTTP, FTP, IMAP, POP3, SCP, SFTP, SMTP, TFTP, TELNET, LDAP, or FILE)

# syntax
curl [option] [url]

# http
curl http://<url-site>

# URLs with numeric sequence series can be written as: 
curl http://site.{one, two, three}.com

# FTP
curl ftp://ftp.example.com/file[1-20].jpeg

# Progress meter
curl displays a progress meter during use to indicate the transfer rate, amount of data transferred, time left, etc. 
     curl -# -O ftp://ftp.example.com/file.zip
           # If you like a progress bar instead of a meter, you can use the -# option
     curl --silent ftp://ftp.example.com/file.zip

# Save file
-o: saves the downloaded file on the local machine with the name provided in the parameters. 
     curl -o [file_name] [URL...]

     curl -o hello.zip ftp://speedtest.tele2.net/1MB.zip

-O    # This option downloads the file and saves it with the same name as in the URL. 
     curl -O ftp://speedtest.tele2.net/1MB.zip

-C -  # This option resumes download which has been stopped due to some reason. 
           This is useful when downloading large files and was interrupted.
     curl -C - -O ftp://speedtest.tele2.net/1MB.zip

# Limit rate
–limit-rate: This option limits the upper bound of the rate of data transfer and keeps it around the given value in bytes
     curl --limit-rate [value] [URL]

     curl --limit-rate 1000K -O ftp://speedtest.tele2.net/1MB.zip

# User login credential
-u: curl also provides options to download files from user authenticated FTP servers. 

curl -u {username}:{password} [FTP_URL]
     curl -u demo:password -O ftp://test.rebex.net/readme.txt

# Upload file
-T: This option helps to upload a file to the FTP server.
curl -u {username}:{password} -T {filename} {FTP_Location}

If you want to append an already existing FTP file you can use the -a or –append option

# –libcurl: 
This option is very useful from a developer’s perspective. 
If this option is appended to any cURL command, it outputs the C source code that uses libcurl for the specified option. 
It is a code similar to the command line implementation. 
Syntax: 
     curl [URL...] --libcurl [filename]

     curl https://www.geeksforgeeks.org > log.html --libcurl code.c

# Proxy
-x, –proxy: curl also lets us use a proxy to access the URL. 

curl -x [proxy_name]:[port] [URL...]

Note: If the proxy requires authentication, it can be used with the command: 
curl -u [user]:[password] -x [proxy_name]:[port] [URL...]

# Send mail
Sending mail: As curl can transfer data over different protocols, including SMTP, we can use curl to send mails. 
curl –url [SMTP URL] –mail-from [sender_mail] –mail-rcpt [receiver_mail] -n –ssl-reqd -u {email}:{password} -T [Mail text file] 

iotop

https://www.cyberciti.biz/hardware/linux-iotop-simple-top-like-io-monitor/

Options       Description
------------------------------------------------------------------
--version         show program’s version number and exit
-h, --help        show this help message and exit
-o, --only        only show processes or threads actually doing I/O
-b, --batch       non-interactive mode
-n NUM, --iter=NUM  number of iterations before ending [infinite]
-d SEC, --delay=SEC delay between iterations [1 second]
-p PID, --pid=PID         processes/threads to monitor [all]
-u USER, --user=USER    users to monitor [all]
-P, --processes       only show processes, not all threads
-a, --accumulated         show accumulated I/O instead of bandwidth
-k, --kilobytes       use kilobytes instead of a human friendly unit
-t, --time              add a timestamp on each line (implies –batch)
-q, --quiet             suppress some lines of header (implies –batch)

Important keyboard shortcuts for iotop command
-------------------------------------------------------------------
Hit the left and right arrow keys         change the sorting.
Press  r          reverse the sorting order.
Hit    o          only to see processes or threads actually doing I/O, 
                  instead of showing all processes or threads.
Use    p          only show processes. 
                  Normally iotop shows all threads.
Hit   a           display accumulated I/O instead of bandwidth. 
                  In this mode, iotop shows the amount of I/O processes have done since iotop started.
Type  i           change the priority of a thread or a process’ thread(s) i.e. ionice.
Press q           quit iotop.


# iotop - Check only processes or threads actually doing the I/O
https://www.cyberciti.biz/hardware/linux-iotop-simple-top-like-io-monitor/

sudo  iotop  --only     # sudo  iotop  -o

iostat

https://www.geeksforgeeks.org/iostat-command-in-linux-with-examples/

The iostat command in Linux is used for monitoring system input/output statistics for devices and partitions. It monitors system input/output by observing the time the devices are active in relation to their average transfer rates. The iostat produce reports may be used to change the system configuration to raised balance the input/output between the physical disks. iostat is being included in sysstat package. If you don’t have it, you need to install first.

list file and folder with time, and reverse order

sudo ls -ltR  ---time-style="%Y-%m-%d   <dir-pah/dir-name>"
      | grep  -v  '^'
      | sort  k6,7
      | cut  -d' '   -f6-

List top level folder usage

# list top level folder usage
sudo du -h /<path>/     -d 1        # depth 1   

list tree view of directory and show modify time

sudo  tree -D  <path/folder>

How to directory and file usage or size

# Check the top sub-directory disk usage
sudo du -h /data/backup -d1

# Check which directory use the highest disk space
sudo du -h <dir> 2>/dev/null | grep '[0-9\.]\+G'
      2 >/dev/null      # discard output of non directories
      grep '[0-9\.]\+G  # sort out directories that is 1+ GBs and up

# Identify the large files in the current directory
sudo du . | sort -nr | head -n10

# Identify the large directories in the current directories
sudo du -s * | sort -nr | head -n10

# Display large files with size
find <dir> -type f -size +<size> -exec ls -lh {} \; 2>/dev/null | awk '{ print $NF ": " $5 }' | sort -hrk 2,2

Note: -exec ls -lh {} \;      
            # for each file that matches the criteria, the find command executes the ls -lh command,
              the  {} placeholder is replaced with the current file's name during execution
      awk '{ print $NF ": " $5 }'
            command processes the output from the previous command. It extracts the file name ($NF), 
            and the file size ($5), and prints them in the format "filename: size"
      sort -hrk 2,2
            -h    # human-readable sort, which is useful for sizes with suffixes like "K" or "M"
            -r    # Specfies a reverse order sort, listing the largest files first
            -k 2,2      # Instructs sort to use second file (the size) as key for sorting

# Display large files with size
find <dir> -type f -printf '%b %p\0' | sort -rzn | head -zn 20 | tr '\0' '\n'

      -printf '%b %p\0'
            # for each file found, to format the output as follows:
                  %b    Represents the number of 512-byte blocks allocated for the file
                  %p    Represents the file’s path
                  \0    Terminates each output entry with a null character (\0). 
                        The null character is used to separate file entries, 
                        and is especially important when dealing with file paths that contain spaces or special characters

      | sort -rzn
            -r    Performs a reverse order sort, arranging the entries from largest to smallest
            -z    Informs sort that the input entries are null-terminated, 
                  which is why the null character (\0) was added at the end of each entry by the find command

            -n    Specifies a numerical sort, ensuring that file sizes are sorted based on their numeric values rather than lexicographically

      | head -zn 20
            -z    Informs head that the input entries are null-terminated
            -n 20       Limits the output to the first 20 entries, which are the 20 largest files based on size

      | tr '\0' '\n'    
            The tr (translate) command is used to replace the null characters (\0) in the output with newline characters (\n). 
            This is done to format the output in a more human-readable way, with each file and its size on a separate line

# Assessing directory usage
      cd <dir>
      du -sh * | grep G       # Measure in Gigabytes


# Show maximum depth or level of directories disk usage
du -h -x -d <level> <dir>
      -x    Exclude <dir itself>
      Example:    
            du -h -x -d 1 /         
                  # Show the top sub-directories usage under /
                  -x    exclude / directory itself
                  -d 1   max depth 1     

ls commands

ls -l -l          # long listing, show all the details
ls -R             # list all files, and all the files inside the directories, or just list the folder recursively
ls -S             # list all file, sorted by size
ls -d             # list only directories
ls -h             # human readable, in MB, GB
ls -X             # sort in alphabetical order by the file extension
ls -alh           # list all files, in human readable format in long detail format

ls -ltR           # list all files include timestamp

Third party tools - Disk usage and I/O performance

  1. iotop This is a valuable tool for system administrators, DevOps professionals, and anyone responsible for system performance management. It helps diagnose performance bottlenecks related to disk I/O, to identify resource-hungry processes, and take appropriate actions to optimize system performance.
# command options
iotop -oPa
      -o    sorts the output by disk read or write
      -P    display paths
      -a    shows accumulated values, which can be useful for identifying processes with high disk I/O over time.
  1. gdu - Go Disk Usage This is a fast and efficient disk usage analyzer written in Go, suitable for both SSD and HDD disks. The tool allows users to analyze disk usage on their Linux system.

  2. Dua - Disk Usage Analyzer This is designed to help users conveniently assess and manage disk space usage in a given directory on a Linux system. It’s optimized for performance, using parallel processing to quickly provide relevant information about disk space usage.

Additionally, Dua offers a safe and efficient way to delete unnecessary data from your storage.