Containers - Behind the curtain

Containers are these fancy new thingies (technical term), that are made to make our lives as developers and operators easier. Or aren't they? This article shows what a container really is and why cgroups and the Linux Kernel are an elemental part of it.

Containers - Behind the curtain
Photo by team voyas / Unsplash

Containers are these fancy new thingies (technical term), that are made to make our lives as developers and operators easier. Or aren't they? This article shows what a container really is and why cgroups and the Linux Kernel are an elemental part of it.

Docker, Podman and more

Docker, Podman and Kubernetes have made packaging and shipping software really easy. In addition, container technologies are adding some security and management layers to our deployments. The whole technology behind containers is so convenient that even Flatpak uses it to provide sandboxing and permission control for the packaged software.

But why is this the case? What is provided by Linux that makes containers so convenient, and what is happening behind the scenes?

Namespaces

Under the hood, most container software uses two major technologies: namespaces and cgroups. For this article, I want to demonstrate how these work and how you can make it work for you, too.

If you start a container, you are basically creating a new namespace, which holds some data and executes a binary. Sounds weird? Let's see this in an actual example.

For demonstration purposes, we can start with a simple example.

# List all processes and count them
$ ps auxww | wc -l
335

In my case, 335 processes are running on my workstation. This is pretty typical for a desktop with a couple of applications opened.

Next, let's create a new namespace, where we want to execute bash. This can be done with the unshare command.

# Create a new PID namespace and run bash in it
$ sudo unshare --fork --pid --mount-proc /usr/bin/bash

Let's also count the processes here:

# List all processes
$ ps auxww
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.7  0.0 226896  8296 pts/0    S    23:41   0:00 /usr/bin/bash
root          37  0.0  0.0 225880  4052 pts/0    R+   23:41   0:00 ps auxww

As you can see, even without the wc -l command, we can easily count the processes. This is the power of namespaces.

Above, we started a PID namespace. But, the Linux kernel offers way more namespaces. Let's have a look at these, too.

  • Process
    The process (PID) namespace creates a new process branch. All processes started in this namespace cannot access processes higher than where the branch was created.
  • Mount
    Processes running in a separated mount (MNT) namespace cannot access files outside of it. This is somewhat similar to the chroot command, but works on the kernel level.
  • Network
    In a network (NET) namespace, you can limit the access to network devices and network features. You still need to create these network devices outside the namespace.
  • User
    The user namespace branches virtual UIDs and GIDs. This allows to have root privileges inside the namespace, but not outside. Even a regular user can create a user namespace, where the user inside the namespace is privileged.
  • UTS
    The UTS namespace controls hostname and domain information, and allows processes to think they’re running on differently named machines.
  • Inter Process Communication
    The IPC namespace controls which processes can talk to each other.
  • Control Group
    The cgroup namespace is somewhat special. There is a dedicated set of tools available to control resources like CPU, memory, disk space, network traffic, etc.

As you might guess already, the combination of the above allows you to create a thing, where you only see some processes, can act as a different user, have access to another filesystem, etc.

This is exactly what containers are about. Docker or Podman are (basically) a set of tools, which combine these namespaces to create slices of your system.

Working with Namespaces

After learning about namespaces, we should give it a shot and play a bit with them. Most of the time, you don't need to do this, but for me this is quite interesting stuff.

First, I want to list all namespaces. Let's see what my workstation does by running lsns.

$ lsns
        NS TYPE   NPROCS   PID USER    COMMAND
4026531834 time      141  1884 dschier /usr/lib/systemd/systemd --user
4026531835 cgroup    141  1884 dschier /usr/lib/systemd/systemd --user
4026531836 pid       117  1884 dschier /usr/lib/systemd/systemd --user
4026531837 user      114  1884 dschier /usr/lib/systemd/systemd --user
4026531838 uts       141  1884 dschier /usr/lib/systemd/systemd --user
4026531839 ipc       141  1884 dschier /usr/lib/systemd/systemd --user
4026531840 net       141  1884 dschier /usr/lib/systemd/systemd --user
4026531841 mnt       114  1884 dschier /usr/lib/systemd/systemd --user

Seems like systemd has created some namespaces for me, already. What, if I create another one? This can be done with the unshare command. To check from inside and outside the namespace, it is a good idea to have two twerminal sessions open.

# Create new namespace (terminal 1)
$ unshare --fork --pid --user --mount-proc /usr/bin/bash

# Check existing namespaces (terminal 1)
$ lsns
        NS TYPE   NPROCS PID USER   COMMAND
4026531834 time        2   1 nobody /usr/bin/bash
4026531835 cgroup      2   1 nobody /usr/bin/bash
4026531838 uts         2   1 nobody /usr/bin/bash
4026531839 ipc         2   1 nobody /usr/bin/bash
4026531840 net         2   1 nobody /usr/bin/bash
4026532752 user        2   1 nobody /usr/bin/bash
4026532756 mnt         2   1 nobody /usr/bin/bash
4026532757 pid         2   1 nobody /usr/bin/bash

# Check existing namespaces (terminal 2)
4026532752 user        2 69002 dschier unshare --fork --pid --user --mount-proc /usr/bin/bash
4026532756 mnt         2 69002 dschier unshare --fork --pid --user --mount-proc /usr/bin/bash
4026532757 pid         1 69003 dschier └─/usr/bin/bash

This way, you can create all kind of namespaces. But, you can also enter namespaces, which are already created.

You just need to know the PID of the processes namespace, you want to enter. In our case above, this is 69002.

# Enter namespace
$ nsenter -t 69002 --user --preserve-credentials

This allows you to debug a process in this namespace, but also see if something can be used from within it. For now, this should be sufficient to introduce namespaces.

A technology like namespaces comes with a vast documentation and lots of articles across the web. The below links may be interesting for you.

Namespaces — The Linux Kernel documentation
namespaces(7) - Linux manual page
The 7 most used Linux namespaces
Check out this brief overview of what the seven most used Linux namespaces are.

Conclusion

As you might guess, using namespaces can be interesting and powerful. In fact, it is so powerful, that Docker, Podman, Kubernetes and other container based technologies like Flatpak are making use of it.

Were you aware of namespaces? How have you used them? Please let me know if you want to learn more about this topic.