Docker Container Troubleshooting

Reference

https://www.docker.com/blog/how-to-fix-and-debug-docker-containers-like-a-superhero/

Useful commands

List docker containers

# list all the docker containers
docker container ls --all

#******** change CLI output formatting for visibility and readability
# If "jq" installed
docker container ls --all --format ‘{{ json . }}’ | jq -C       

# without jq installed
docker container ls --all --format ‘{{ json . }}’ | python3 -m json.tool --json-lines

Run interactive terminal within the container

# Open an interactive terminal within the container
docker run --rm -it --name <container-name> <image> bash

-- rm       # Remove the container on exit of bash shell
-it         # interactive terminal

Leverage container logs - docker logs

docker logs <container-id>      # see all the logs
docker logs --tail 100 <container-id>   # see the last 100 lines of logs

View all active processes within the running container

docker top <container-id>

Tackle issues with ENTRYPOINT

The entrypoint refers to the default executable that's invoked when you run a container.

The --entrypoint flag expects a string value, representing the name or path of the binary that you want to invoke when the container starts.

When running applications, you’ll need to run executable files within your container. The ENTRYPOINT portion of your Dockerfile sets the main command within a container and basically assigns it a task. These ENTRYPOINT instructions rely on executable files being in the container.

A scenario where improper permissions can prevent Docker from successfully mounting and running an entrypoint.sh executable.

Use the ls -l $PWD/examples/v6/entrypoint.sh command to view your file’s permissions, which may be inadequate.
Confirm that permissions are incorrect.
Run a chmod 774 command to let this file read, write, and execute for all users.
Use docker run to spin up a container v7 from the original entrypoint, which may work briefly but soon stop running.
Inspect the entrypoint.sh file to confirm our desired command exists.
We can confirm this again by entering docker container inspect v7-exiting to view our container definition and parameters. While the Entrypoint is specified, its Cmd definition is null. That’s what’s causing the issue:

"Cmd" : null,
"Image": "httpd:2.4",
"Volumes": null,
"WorkingDir": "/usr/local/apache2",
"Entrypoint": [
     "/entrypoint.sh"
],

Many don't know that by setting --entrypoint, any image with a default command will empty that command automatically. You will need to redefine your command for your container to work properly.

# Proper command
docer run -d -v $PWD/example/v7/entrypoint.sh:/entrypoint.sh --entrypoint /entrypoint.sh --name v7-running httpd:2.4 httpd-foreground

You can reset a containers entrypoint by passing an empty string, for example:

docker run -it --entrypoint="" mysql bash

Note:
Passing --entrypoint clears out any default command set on the image. That is, any CMD instruction in the Dockerfile used to build it.

Exposed ports

By default, when you run a container, none of the container's ports are exposed to the host. This means you won't be able to access any ports that the container might be listening on. To make a container's ports accessible from the host, you need to publish the ports.

You can start the container with the -P or -p flags to expose its ports:

The -P (or --publish-all) flag publishes all the exposed ports to the host. Docker binds each exposed port to a random port on the host.

The -P flag only publishes port numbers that are explicitly flagged as exposed, either using the Dockerfile EXPOSE instruction or the --expose flag for the docker run command.

The -p (or --publish) flag lets you explicitly map a single port or range of ports in the container to the host.

The port number inside the container (where the service listens) doesn't need to match the port number published on the outside of the container (where clients connect). For example, inside the container an HTTP service might be listening on port 80. At runtime, the port might be bound to 42800 on the host. To find the mapping between the host ports and the exposed ports, use the docker port command.

Docker run command and options

https://docs.docker.com/engine/reference/run/

# A docker run command takes the following form:
docker run [OPTIONS] IMAGE [:TAG|@DIGEST] [COMMADND] [ARG...]

docker run IMAGE[:TAG][@DIGEST]
docker create IMAGE[:TAG][@DIGEST]

An image tag is the image version, which defaults to latest when omitted. Use the tag to run a container from specific version of an image. For example, to run version 23.10 of the ubuntu image:

docker run ubuntu:23.10

Options

Option let you configure options for the container. For example, you can give the container a name (--name), or run it as a background process (-d). You can also set options to control things like resource constraints and networking.

Commands and arguments

You can use the COMMAND and ARG positional arguments to specify commands and arguments for the container to run when it starts up. For example, you can specify sh as the command, combined with the -i and -t flags, to start an interactive shell in the container (if the image you select has an sh executable on PATH).

docker run -it IMAGE sh

Foreground and background

When you start a container, the container runs in the foreground by default. If you want to run the container in the background instead, you can use the --detach (or -d) flag. This starts the container without occupying your terminal window.

docker run -d IMAGE

While the container runs in the background, you can interact with the container using other CLI commands. For example, docker logs lets you view the logs for the container, and docker attach brings it to the foreground.

# Example
docker run -d nginx
docker ps   # find nginx container ID
docker logs -n 100 <container-id>

docker run --detach <container>
docker run --attach <container>
docker run --tty
docker run --interactive

Open a shell in a container

# Open a shell in a container based on the nginx:alpine image
docker exec nginx:alpine sh 

Note: docker exec expects a container identifier (name or ID), not an image

To find out the IDs of containers using an image by using --filter flag
# The command gets the IDs of all running containers based on the nginx:alpine image
docker ps -q --filter ancestor=nginx:alpine

Container Networking

Containers have networking enabled by default, and they can make outgoing connections. If you're running multiple containers that need to communicate with each other, you can create a custom network and attach the containers to the network.

When multiple containers are attached to the same custom network, they can communicate with each other using the container names as a DNS hostname.

# Example
docker network create test-network      # create container network namely "test-network"
docker run -d --name web --network test-network nginx:alpine    # create a container running on the container network
docker run --rm -it --network test-network testcontainer
    # ping web      # should have response from web

Volume mounts

To create a volume mount

docker run --mount source=<volume-name>,target=</path/> [IMAGE] [COMMAND...]

Example:
docker run --rm --mount source=test_volume,target=/data testcontainer \
  echo "Testing mount volume." > /data/test.txt

docker run --mount source=test_volume,target=/data testcontainer cat /bar/hello.txt

Note:
1. The target must always be an absolute path, such as /data/testing
2. Volume name must start with alphanumeric character, followed by a-z0-9, _(underscore), .(period) or -(hyphen)

Bind mounts

# Create a bind mount
docker run -it --mount type=bind,souce=[path],target=[path]  testcontainer

Exit status

The exit code from docker run gives information about why the container failed to run or why it exited.

docker run <container-name>; echo $?

# Error code from docker run
125     # Exit code 125     # indicate the error is with docker daemon itself

docer run <container-name>  /etc; echo $?
126     # Exit code 126     # indicate the specified contained command can't be invokded

docker run <container-name> foo; echo $?
127     # Exit code 127     # indicate the contained command can't be found

echo $?     # find out the exit code

Environment variables

Docker automatically sets some environment variables when creating a Linux container. Docker doesn't set any environment variables when creating a Windows container.

The following environment variables are set for Linux containers:

Variable    Value
------------------------------------------
HOME        Set based on the value of USER
HOSTNAME    The hostname associated with the container
PATH        Includes popular directories, such as /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TERM        xterm if the container is allocated a pseudo-TTY

Additionally, you can set any environment variable in the container by using one or more -e flags. You can even override the variables mentioned above, or variables defined using a Dockerfile ENV instruction when building the image.

If the you name an environment variable without specifying a value, the current value of the named variable on the host is propagated into the container's environment:

export today=Wednesday
docker run -e "deep=purple" -e today --rm alpine env

User

The default user within a container is root (uid = 0). You can set a default user to run the first process with the Dockerfile USER instruction. When starting a container, you can override the USER instruction by passing the -u option

# The followings examples are all valid:
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]

Working directory

The default working directory for running binaries within a container is the root directory (/). The default working directory of an image is set using the Dockerfile WORKDIR command. You can override the default working directory for an image using the -w (or --workdir) flag for the docker run command:

$ docker run --rm -w /my/workdir alpine pwd
/my/workdir

docker node ls

https://docs.docker.com/reference/cli/docker/node/ls/

Lists all the nodes that the Docker Swarm manager knows about.

Option          Description
------------------------------------------------------------------------------
-f, --filter        Filter output based on conditions provided
--format            Format output using a custom template:
'table':        Print output in table format with column headers (default)
'table TEMPLATE': Print output in table format using the given Go template
'json':         Print in JSON format
'TEMPLATE':     Print output using the given Go template.
                Refer to https://docs.docker.com/go/formatting/ for more information about formatting output with templates
-q, --quiet     Only display IDs

# Filtering (--filter)
The filtering flag (-f or --filter) format is of "key=value". 

The currently supported filters are:
    id
    label
    node.label
    membership
    name
    role

$ docker node ls

ID                           HOSTNAME        STATUS  AVAILABILITY  MANAGER STATUS
1bcef6utixb0l0ca7gxuivsj0    swarm-worker2   Ready   Active
38ciaotwjuritcdtn9npbnkuz    swarm-worker1   Ready   Active
e216jshn25ckzbvmwlnh5jr3g *  swarm-manager1  Ready   Active        Leader

How to troubleshooting container can't start

1. can't download the image from registry
2. constraints that can't be fullfilled
  docker service ps <service name> --no-trunc

3. Not all errors can be found in this way, another useful tool - journalctl
  journalctl -u docker.service | tail -n 50

4. Look at log files
  tail -F *.log
  tail -F /var/log/*.log
  tail -f /var/log/*.log | grep docker
  tail -f /var/log/*.log | grep error


Also try
1. ssh to worker node
2. scale container=0
3. start another ssh session to the same worker node
  a. docker service logs -f <service name> --raw        # look at logs
  b. scale container = 1
    docker service scale <service name>=<number>      # number: 0, 1
4. analysis any error in container log

Run on controller node
  docker service ps <service name>    # locate the worker node which hosting the container service
  docker service ls --filter name=<service name>

# Inspect individual node
  docker node inspect self --pretty
  docker ndoe inspect <node name> --pretty

docker container ls   # list the container services
docker node ls        # run on the controller to list all nodes

How to try to run docker with docker network

docker run --network myNetwork -it --rm --entrypoint bash my-service

How to clean up orphan containers

1. ssh to controller
docker node ps                        # list all nodes
docker node ps $(docker node ls -q)   # list all the containers and the running worker nodes

2. ssh to worker node
# remove or delete exited orphan containers
docker rm $(docker ps -a | grep "Exited" | awk '{print $1}' )

3. To stop all running containers
docker stop $(docker ps -a -q)
  Note:
    Docker swarm will start a new container if you stop the running container
    To properly stop the container, set scale=0