Nowadays, many applications and the environments where these applications are located have started to evolve quickly in accordance with the container architecture. When we look at it from a large frame, this technology which is in all other sectors, especially in the IT sector, has become a trend in the meeting process that started especially with Docker. When this is the case, it has become important to understand where today’s features of Docker come from. Knowing the history and development processes of this technology, it is possible to understand what we expect, to combine the missing parts, to see how the technologies we use today have evolved to their current state, or to develop new tools by adding them when necessary. Here is the history of “Container Architecture”…
As you can easily see from the diagram above, container technology has undergone many evolution. Let’s examine them in order:
> Chroot(UnixV7) — 1979
Everything was actually a cloud of dust… “chroot” was introduced and added to UNIX systems. Thus, unique positions were created in the file system for processes and the first steps were taken for process isolation. With this technology, it was possible to isolate one process and its subprocesses from all other systems. To do this, the libraries to be used would be moved under a separate “root directory” and run under this folder. Thus, you could do whatever you want by acting as root in this specified environment without affecting the global system. However, root processes could easily leave the chromed environment. So it was a little lacking technology. In March 1982, Bill Joy put the chroot on Unix v7.0. You can better understand by looking at the simple picture below.
> Jail — 1990
In the 1990s, security and network expert Bill Cheswick was researching how crackers could gain access by entering a computer system. He tried to solve how crackers reached their goals by creating a chroot environment that follows the key traces of crackers to do their research and follow them. Thus, the technology we call jail was created. In addition to the capabilities of the chroot, Jail created and used his own filesystem. You can often see the term “chrooted jail” or “chroot jail” as the term.
> FreeBSD Jails — 2000
It is the technology created by adding chroot to Free BSD systems and adding a few extra features. Although it is largely similar to chroot; The file system made a difference by isolating users and the network and creating environments for custom applications. Moreover, thanks to these features, security has been further enhanced on this technology.
> Linux vServer — 2001
Linux vServer was introduced in 2001, with operating system level virutalization features available in Linux kernel. Linux vServer uses the same features of chroot and in addition CPU time, memory, network addresses, etc. is a technology that can safely partition resources on the system. Each partition is called “security partition” and the virtualized system in it is called “virtual private server”.
> Oracle Solaris Containers — 2004
Oracle Solaris Containers has started to be used to divide system resources, limit their use and divide them into areas. In addition, thanks to Oracle Solaris Containers, cloning and snapshot features have been introduced over ZFS (ZetaByte Filesystem). The concept of “zone” is used to define resource limitations. With zones, you can access operating system resources or filesystem for application. However, the application can only see what is happening in its region.
> OpenVZ(Open Virtuozzo) — 2005
OpenVZ uses OS-level virtualizing, similar to Linux vSever. Since it uses OS-level virtualization, there are some restrictions. Since the containers share the same architecture and kernel version, this creates a disadvantage if the host needs another kernel version.
> Google Containers — 2006 / Cgroups — 2007
In 2006, Google engineers announced “process containers” that they designed and developed for the isolation and limitation of resources (CPU, RAM, network, I / O etc.). In 2007, this definition of “process containers” was renamed “cgroup(control groups)”.
Let’s open a parenthesis for “cgroups” here and examine it a little more deeply…
Different definitions can be made for Cgroups. For example “control groups”, “subsystems” or “resource controller” etc.… The important thing here is that we understand what we are doing. Cgroups control a process’s access to resources, isolate it, and limit it. This subject is really detailed and, as far as I can see, is a bit difficult to understand. It may be a completely different article. But we can briefly say:
Cgroups came with Linux kernel v2.624 and Linux kernel v3.0 was announced as Cgroups v2. We shouldn’t think of Cgroups as a single structure. As you can see below, it is included in more than one cgroup Linux kernel. All of them have different tasks and in order to use them, it is necessary to examine in more detail. For now, let’s just look at what some of them do:
blkio: This cgroup is used to limit the amount of I / O for each process group.
CPU: This cgroup monitors the usage rates of the CPU, determines the usage weights and limits the access to the CPU resource.
cpuacct: This cgroup generates automatic reports on CPU usage of a process or process group.
cpuset: This cgroup takes over the task of fixing a CPU to a process or process groups.
devices: This cgroup is used to allow or restrict access of a task to the device. This means … With this cgroup, you can set permissions for any process, such as writing and reading, on a device.
freezer: This allows cgroup process to be suspend or resume.
memory: This cgroup allows us to keep track of how much memory a process is using. It also creates automatic reporting accordingly.
net_cls: This cgroup tags the network packets with “classid”, which enables the identification of packets originating from a specific process group task. This means that traffic control can be configured to assign different packages to different priorities. We can think of it as a kind of QoS (Ouality of Service).
net_prio: This cgroup provides a way to dynamically set the priority of network traffic for each network unit.
In order to be useful, I leave the links that I find nice about the subject below:
https://events.static.linuxfound.org/sites/events/files/slides/cgroup_and_namespaces.pdf
> LXC(Linux Containers) — 2008
In 2008, the first version of LXC (Linuxc Containers) was introduced. LXC; We can say similar to OpenVZ, Solaris Conatiners and Linux VServer. However, LXC already uses cgroups already in the Linux kernel. LXC is “operating system level (OS-Level Virtualization)” virtualization, which enables running multiple isolated Linux environments (containers) in a common Linux kernel.
> CloudFoundry Warden — 2011
Warden is an API technology used by CloudFoundry to manage isolated, temporary-rebuild, resource-controlled environments created in 2011. It used the LXC infrastructure in his early days and later developed its own implantation. It developed a client-server model to manage container and container groups. It also has a service to manage cgroups, namespace and process life cycle.
> Google Let Me Contain For You(LMCTFY) — 2013
In 2013, Google once again changed the open source container technology history. This technology, which came out as an open source in Google’s container technologies, provided Linux application containers. Applications running in this architecture could be designed as “container aware” and gained the ability to manage by creating their own subcontainers. LMCTFY runs applications by isolating them on the same Linux Kernel and using “cgroups”, “namespaces”, and other Linux Kernel processes. The development of this technology was officially stopped in 2015. But after that, Google started to contribute by transferring the core principles of LMCTFY to “libcontainer”. Accordingly, it also contributed to Docker. Libcontainer is now part of the Open Container Initiative (OCI).
(P.S. libcontainer informations can be found in the Docker section.)
> Docker — 2013
Docker was introduced in 2013 by an San Francisco company that offers PaaS cloud services named dotCloud as an open-source project, and its founder is Solomon Hykes. When it first came out, it aimed to convert monolitich applications into image and container structure by using LXC (Linux containers). Later on, it started to develop his own container runtime, libcontainer, and after this stage, libcontainer was started to be used.
By the way, let’s talk about “libcontainer”…
Libcontainer is a library written by Docker in Go and C / C ++ languages. The purpose of writing was to create a broader isolation technology, to create a container using features such as cgroup, namespaces already in the Linux Kernel, and to give full control over the lifecycle of this container. It was supported by the OCI (Open Container Initiative).
Since the first versions of Libcontainer Docker (first Docker March 0.9–2014), it has been connected to Docker Engine. But later on, Docker developed a driver-library called runC, completely independent of Docker Engine, and included libcontainer. The aim of such a development was the efforts of containers to use a common driver and to achieve standardization. RunC was also supported by the OCI (Open Container Initiative).
As I mentioned earlier, actors developing container technologies (Docker, Redhat, Vmware, Oracle, AWS etc.) have always tried to achieve standardization. So, how are we last by looking at Docker ..?
Finally, after giving the runC project it developed to OCI, Docker started using containerd since 2016. Meanwhile, runC OCI started to be used as a standard runtime. Containerd works like an interface for runC, which runs low-level. It also has control over other OCI runtimes. Containerd uses runC to run containers, but it also provides other top-level features such as image management and high-level APIs. At the final point, with the adoption of Docker, containerd is the industry standard for the implementation of OCI. It is currently available for Linux and Windows.
You can see the runtime you have as a “Default runtime:” parameter using the command on the network:
docker info| grep -i runtime
In general, we should understand well here: As can be seen, Docker did not stop using existing projects completely. This is not the case. Over time, only other projects have been replaced by existing projects and new projects have been produced by making improvements to the existing ones. Although it is a little difficult to understand and follow the changes, relationships and differences between the projects, the parts settle down in time.
To better understand the latest situation, we can see the high-low level runtime that Docker has today by looking at the following images:
To simplify, we can summarize all this as follows:
- LXC was used in the first versions of Docker. Later on, it started using themby making libcontainer, runC and containerd respectively.
- In versions prior to Docker 1.11.0, Docker was a monolith, and all jobs were run by Docker Engine. At the new point we have reached today, Docker is basically divided into 4 parts and the tasks of these parts are as follows:
1- Docker Engine → Creating containers over images and transferring their control over containerd.
2- containerd → It is responsible for areas such as managing containers’ life-cycle, Image push-pull, storage control, network management, namespace management and so on. It also has control over OCI runtime and runC. Today, it can be used in both Linux and Windows environments as an OCI standard.
3- runC → It can actually be seen as part of containerd. runC is a runtime that packs applications running in low level according to OCI format and can be implemented in these environments. When we want to configure the container, we do this with bundles. A bundle consists of a “config.json” file containing some config and a root filesystem.
4- conrainerd-shim→ It provides container operation by using runC. It also provides a “Daemonless container” environment. This means that there is no need for a long-running runtime process for containers. There are 2 benefits of running a Daemonless container container:
- runC stops after container starts and and it doesn’t have to work during the working container process.
- containerd-shim; It keeps file information such as stdin (standard input), stdout (standard output), stderr (standard error), even if Docker or containerd becomes inoperable for any reason.
Let’s solve the confusion a little more … When we look at it, there are many runtime definitions in our story. For example, although containerd and runC are both runtime, why did Docker do something to separate them? Couldn’t it use one for the other? Our answer is actually simple: runtime are divided into high-level and low-level. Each runtime has its own unique features and uses. As we call low-level, lightweight, like runC, should come to mind a runtime that works fast and in harmony with high-level runtime. In order to standardize the container ecosystem, low-level runtime only allows running containers.
When we say high-level, we can talk about “containerd”. We can better understand the Runtime levels with the following image:
If desired, other runtime that are OCI standard or not can be used instead of any runtime. It all depends on the prons and cons, what is expected. To do this, the following command should be run on Docker:
sudo dockerd --add-runtime=<runtime-name>=<runtime-path>
For example:
sudo apt-get install nvidia-container-runtime
sudo dockerd --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
Also, if you want to know more, you can take a look at:
When we look at the history of containers in general, we can say that the popularization and rapid development of containers has been achieved thanks to Docker. Thanks to the new features and environment it provides, Docker has assumed the leadership position among container technologies.
> Rocket — 2014
Rkt is a secure and lightweight Docker alternative container system developed by CoreOS. It is built on a container standard known as “App Container” or “appc”. For this reason, rkt images can be run on container systems that support the “appc” format. It can be examined in more detail here.
So what are the differences from other container technologies?
→ It is designed to be much more suitable for production systems with security in the foreground:
As it is known, containers are process groups that can be created by granting some rights to users on the system or by processing with root. In addition, the operation of a user in one container is not seen by the other container. Users are safe in this way as long as there is no abuse on the Linux kernel. However, in some systems such as Docker, malicious users who can get out of the container through an abuse on the kernel can ruin everything. Such a risk exists despite measures.
Unlike Docker, rkt runs containers with un-privileged users (unlike priority… Unlike Docker…). Thus, even if there is a kernel level deficit and the user can get out of the container, this does not affect other containers and users.
In addition, rkt allows cryptographic signature control over downloaded images. As a result, only reliable containers can run on the server.
→ It has a lightweight structure suitable for packaging applications:
It packs the imaged applications more effectively and makes them executable.
→ Portable image format… Image structure that can be easily moved to other container systems:
In rapidly developing container technology, it uses the “appc” format, which makes it easier to move images to a better container technology in the future. Thus, all kinds of images actually become an image that can be run for rkt.
→ Manageable features… Different features from different sources:
According to the work done, the features needed in one container system may not be needed in another. In such a situation, Docker has some problems. The first is that this large application developed by a provider is likely to have bugs. The second is the possibility that the features in Docker remain limited depending on the perspective of the provider that developed the system. In such a case, alternatives such as rkt will always have more features. Because different providers can develop this system by adding additions to modify different features themselves.This is an advantage of rkt with “pluggable architecture”. For example, rkt can enable “Intel Clear Container” support or enable a feature on the VMware that will provide hardware-level isolation. In this way, features suitable for the work to be done are provided. In addition, the risk of vulnerabilities is also reduced.
> Windows Containers — 2016
In 2015, Microsoft attempted to develop windows-based applications for Windows Server and called this container technology Microsoft Containers. With the latest developments in 2016, Microsoft announced the overall availability of Docker containers on Windows Server 2016, so that Docker Engine containers became native to Windows.
With this application, Docker has been able to run Docker containers for Microsoft without having to install a virtual machine to run Docker. There are also many additional features and components on Azure that can run Docker containers.
The following sources can be examined to have more information:
> What’s Next: Podman and Buildah
If you remember, we have seen that some runtime can work like this by explaining the concept of “daemonless” in Docker related article. However, Docker works as a native connected to a daemon (docker daemon-dockerd). Its job is to transmit all basic operations to the sub-units with the commands it receives over CLI and to make sure that they are done. If the daemon does not work somehow, it means that you cannot perform basic operations. We can better understand this from the picture below:
When we look at this architecture, we see places that will cause a few problems. A few of them are as follows:
- At the point where no news is received from Daemon, there will be no access to the processes.
- All Docker operations are performed by one or more users with the same root privileges. This could create a vulnerability.
Podman, a new container platform that has eliminated these risks, has been developed. Podman works with the “runC” we mentioned earlier so it works in accordance with the “daemonless” concept.
In addition, Podman image creation, etc. it does not look at having a user’s root privilege for transactions. To do this, it uses namespaces in the Linux kernel. Namespace is a kernel feature that briefly controls the access of resources running on Linux to resources. So how does Podman use namespaces? Namespaces do UID / GID mapping for users. The UID / GID range defined for a non-privileged user is kept on the host. The user with root privilege in the container can use the privileges required to create an image by mapping it to the namespace (UID / GID = 0). Also, since this user’s privileges are defined and limited on the host, the root user in the container can actually only use certain commands (eg build command). Thus, a security vulnerability that may arise due to the container on the host is prevented.
In addition, when using Podman, build, stop etc. used in Docker. CLI commands can be used exactly. Podman commands are available to non-root users. For this, there is a repo under the directory “~/.local /share / containers”.
docker images --> podman images
For more information, I recommend you to review the link below:
What’s the Buildah?
Buildah is a common containerize tool for container systems that comply with the OCI (Open Container Initiative) standards and is actually one of the reasons for its development, and perhaps the most important is its power in building container images. In addition, it also has many different commands and thus gives freedom, which makes it powerful. So, why is this tool really necessary? Docker and Podman can also build. We can explain the difference between them and their relationship with Podman as follows …
As you know, we can build images with Dockerfile. It has features that make it possible to use Bash script in addition to Dockerfile while building Buildah images. It is possible to do this just like Dockerfile writer. In this way, you can also convert and use your scripted works with Buildah.
It is much more sensitive and powerful, as it has many different commands for creating and managing container images. The build commands in Podman are actually a subset of Buildah commands and they use the same codes.
buildah bud -t hello
It is possible to produce a new image by making changes on the existing container with Buildah.
With "buildah mount" or "buildah config" commands
Buildah can also produce images as scratches. This means producing an image that is completely hallow. If we look inside, we will see a completely empty folder. It is possible to do this with the following command:
buildah run from scratch
Flags such as port mapping and volume mounting are not used for the “run” command on the buildah command set. For this, if we want to make a change, we need to give Linux commands with the command “buildah run”. For example:
buildah run dnf -y install nginx
Buildah also works as rootless and daemonless.
You can have CLI uses, examples and more information from the link below:
That’s all about containers for now…
See you in next stories…
Goodbye..!