1. Process Isolation
In this demo, we’ll illustrate:
-
What containerized process IDs look like inside versus outside of a kernel namespace
-
How to impose control group limitations on CPU and memory consumption of a containerized process.
Exploring the PID Kernel Namespace
Start a simple container we can explore:
[user@node ~]$ docker container run -d --name pinger centos:7 ping 8.8.8.8
Use docker container exec
to launch a child process inside the container’s namespaces:
[user@node ~]$ docker container exec pinger ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.1 0.0 24860 1884 ? Ss 02:20 0:00 ping 8.8.8.8
root 5 0.0 0.0 51720 3504 ? Rs 02:20 0:00 ps -aux
Run the same ps
directly on the host, and search for your ping process:
[user@node ~]$ ps -aux | grep ping
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 11622 0.0 0.0 24860 1884 ? Ss 02:20 0:00 ping 8.8.8.8
centos 11839 0.0 0.0 112656 2132 pts/0 S+ 02:23 0:00 grep --color=auto ping
The ping process appears as PID 1 inside the container, but as some higher PID (11622 in this example) from outside the container.
List your containers to show this ping container is still running:
[user@node ~]$ docker container ls
CONTAINER ID IMAGE COMMAND ... STATUS ... NAMES
bb3a3b1cbb78 centos:7 "ping 8.8.8.8" ... Up 6 minutes pinger
Kill the ping process by host PID, and show the container has stopped:
[user@node ~]$ sudo kill -9 [host PID of ping]
[user@node ~]$ docker container ls
CONTAINER ID IMAGE COMMAND ... STATUS ... NAMES
Killing the ping process on the host also kills the container - all a running container is is its PID 1 process, and the kernel tooling that isolates it from the host. Note using kill -9
is just for demonstration purposes here; never stop containers this way.
Imposing Resource Limitations With Cgroups
Start a container that consumes two full CPUs:
[user@node ~]$ docker container run -d training/stress:3.0 --vm 2
Here the --vm
flag starts 2 dummy processes that allocate and free memory as fast as they can, each consuming as many CPU cycles as possible.
Check the CPU consumption of processes in the container:
[user@node ~]$ docker container top <container ID>
UID PID PPID C ... CMD
root 5806 5789 0 ... /usr/bin/stress --verbose --vm 2
root 5828 5806 99 ... /usr/bin/stress --verbose --vm 2
root 5829 5806 99 ... /usr/bin/stress --verbose --vm 2
That C column represents CPU consumption, in percent; this container is hogging two full CPUs! See the same thing by running ps -aux
both inside and outside this container, like we did above; the same process and its CPU utilization is visible inside and outside the container:
[user@node ~]$ docker container exec <container ID> ps -aux
USER PID %CPU %MEM ... COMMAND
root 1 0.0 0.0 ... /usr/bin/stress --verbose --vm 2
root 5 98.9 6.4 ... /usr/bin/stress --verbose --vm 2
root 6 99.0 0.4 ... /usr/bin/stress --verbose --vm 2
root 7 2.0 0.0 ... ps -aux
And on the host directly, via the PIDs we found from docker container top
above:
[user@node ~]$ ps -aux | grep <PID>
USER PID %CPU %MEM ... COMMAND
root 5828 99.3 4.9 ... /usr/bin/stress --verbose --vm 2
centos 6327 0.0 0.0 ... grep --color=auto 5828
Kill off this container:
[user@node ~]$ docker container rm -f <container ID>
This is the right way to kill and remove a running container (not kill -9
).
Run the same container again, but this time with a cgroup limitation on its CPU consumption:
[user@node ~]$ docker container run -d --cpus="1" training/stress:3.0 --vm 2
Do docker container top
and ps -aux
again, just like above; you’ll see the processes taking up half a CPU each, for a total of 1 CPU consumed. The --cpus="1"
flag has imposed a control group limitation on the processes in this container, constraining them to consume a total of no more than one CPU.
Find the host PID of a process running in this container using docker container top
again, and then see what cgroups that process lives in on the host:
[user@node ~]$ cat /proc/<host PID of containerized process>/cgroup
12:memory:/docker/31d03...
11:freezer:/docker/31d03...
10:hugetlb:/docker/31d03...
9:perf_event:/docker/31d03...
8:net_cls,net_prio:/docker/31d03...
7:cpuset:/docker/31d03...
6:pids:/docker/31d03...
5:blkio:/docker/31d03...
4:rdma:/
3:devices:/docker/31d03...
2:cpu,cpuacct:/docker/31d03...
1:name=systemd:/docker/31d03...
Get a summary of resources consumed by processes in a control group via systemd-cgtop
:
[user@node ~]$ systemd-cgtop
Path Tasks %CPU Memory Input/s Output/s
/ 68 112.3 1.0G - -
/docker - 99.3 301.0M - -
/docker/31d03... 3 99.3 300.9M - -
...
Here again we can see that the processes living in the container’s control group (/docker/31d03…
) are constrained to take up only about 1 CPU.
Remove this container, spin up a new one that creates a lot of memory pressure, and check its resource consumption with docker stats
:
[user@node ~]$ docker container rm -f <container ID>
[user@node ~]$ docker container run -d training/stress:3.0 --vm 2 --vm-bytes 1024M
[user@node ~]$ docker stats
CONTAINER CPU % MEM USAGE / LIMIT MEM % ...
b29a6d877343 198.94% 937.2MiB / 3.854GiB 23.75% ...
Kill this container off, start it again with a memory constraint, and list your containers:
[user@node ~]$ docker container rm -f <container ID>
[user@node ~]$ docker container run \
-d -m 256M training/stress:3.0 --vm 2 --vm-bytes 1024M
[user@node ~]$ docker container ls -a
CONTAINER ID IMAGE ... STATUS
296c8f76af5c training/stress:3.0 ... Exited (1) 26 seconds ago
It exited immediately this time.
Inspect the metadata for this container, and look for the OOMKilled
key:
[user@node ~]$ docker container inspect <container ID> | grep 'OOMKilled'
"OOMKilled": true,
When the containerized process tried to exceed its memory limitation, it gets killed with an Out Of Memory exception.
Conclusion
In this demo, we explored some of the most important technologies that make containerization possible: kernel namespaces and control groups. The core message here is that containerized processes are just processes running on their host, isolated and constrained by these technologies. All the tools and management strategies you would use for conventional processes apply just as well for containerized processes.
2. Creating Images
In this demo, we’ll illustrate:
-
How to read each step of the image build output
-
How intermediate image layers behave in the cache and as independent images
-
What the meanings of 'dangling' and
<missing>`
image layers are
Understanding Image Build Output
Make a folder demo`
for our image demo:
[user@node ~]$ mkdir demo ; cd demo
And create a Dockerfile therein with the following content:
FROM centos:7
RUN yum update -y
RUN yum install -y which
RUN yum install -y wget
RUN yum install -y vim
Build your image from your Dockerfile, just like we did in the last exercise:
[user@node demo]$ docker image build -t demo .
Examine the output from the build process. The very first line looks like:
Sending build context to Docker daemon 2.048kB
Here the Docker daemon is archiving everything at the path specified in the docker image build
command (.
or the current directory in this example). This is why we made a fresh directory demo to build in, so that nothing extra is included in this process.
The next lines look like:
Step 1/5 : FROM centos:7
---> 49f7960eb7e4
Do an image ls:
[user@node demo]$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
demo latest 59e595750dd5 10 seconds ago 645MB
centos 7 49f7960eb7e4 2 months ago 200MB
Notice the Image ID for centos:7
matches that second line in the build output. The build starts from the base image defined in the FROM`
command.
The next few lines look like:
Step 2/5 : RUN yum update -y
---> Running in 8734b14cf011
Loaded plugins: fastestmirror, ovl
...
This is the output of the RUN
command, yum update -y
. The line Running in 8734b14cf011
specifies a container that this command is running in, which is spun up based on all previous image layers (just the centos:7 base at the moment). Scroll down a bit and you should see something like:
---> 433e56d735f6
Removing intermediate container 8734b14cf011
At the end of this first RUN
command, the temporary container 8734b14cf011
is saved as an image layer 433e56d735f6
, and the container is removed. This is the exact same process as when you used docker container commit
to save a container as a new image layer, but now running automatically as part of a Dockerfile build.
Look at the history of your image:
[user@node demo]$ docker image history demo
IMAGE CREATED CREATED BY SIZE
59e595750dd5 2 minutes ago /bin/sh -c yum install -y vim 142MB
bba17f8df167 2 minutes ago /bin/sh -c yum install -y wget 87MB
b9f2efa616de 2 minutes ago /bin/sh -c yum install -y which 86.6MB
433e56d735f6 2 minutes ago /bin/sh -c yum update -y 129MB
49f7960eb7e4 2 months ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 2 months ago /bin/sh -c #(nop) LABEL org.label-schema.... 0B
<missing> 2 months ago /bin/sh -c #(nop) ADD file:8f4b3be0c1427b1... 200MB
As you can see, the different layers of demo
correspond to a separate line in the Dockerfile and the layers have their own ID. You can see the image layer 433e56d735f6
committed in the second build step in the list of layers for this image.
Look through your build output for where steps 3/5 (installing which
), 4/5 (installing wget
), and 5/5 (installing vim
) occur - the same behavior of starting a temporary container based on the previous image layers, running the RUN
command, saving the container as a new image layer visible in your docker iamge history
output, and deleting the temporary container is visible.
Every layer can be used as you would use any image, which means we can inspect a single layer. Let’s inspect the wget layer, which in my case is bba17f8df167 (yours will be different, look at your docker image history
output):
[user@node demo]$ docker image inspect bba17f8df167
Let’s look for the command associated with this image layer by using --format
:
[user@node demo]$ docker image inspect \
--format='{{.ContainerConfig.Cmd}}' bba17f8df167
[/bin/sh -c yum install -y wget]
We can even start containers based on intermediate image layers; start an interactive container based on the wget
layer, and look for whether wget
and vim
are installed:
[user@node demo]$ docker container run -it bba17f8df167 bash
[root@a766a3d616b7 /]# which wget
/usr/bin/wget
[root@a766a3d616b7 /]# which vim
/usr/bin/which: no vim in
(/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
wget
is installed in this layer, but since vim
didn’t arrive until the next layer, it’s not available here.
Managing Image Layers
Change the last line in the Dockerfile from the last section to install nano
instead of vim
:
FROM centos:7
RUN yum update -y
RUN yum install -y which
RUN yum install -y wget
RUN yum install -y nano
Rebuild your image, and list your images again:
[user@node demo]$ docker image build -t demo .
[user@node demo]$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
demo latest 5a6aedc1feab 8 seconds ago 590MB
<none> <none> 59e595750dd5 23 minutes ago 645MB
centos 7 49f7960eb7e4 2 months ago 200MB
What is that image named <none>
? Notice the image ID is the same as the old image ID for demo:latest
(see your history output above). The name and tag of an image is just a pointer to the stack of layers that make it up; reuse a name and tag, and you are effectively moving that pointer to a new stack of layers, leaving the old one (the one containing the vim
install in this case) as an untagged or 'dangling' image.
Rewrite your Dockerfile one more time, to combine some of those install steps:
FROM centos:7
RUN yum update -y
RUN yum install -y which wget nano
Rebuild using a new
tag this time, and list your images one more time:
[user@node demo]$ docker image build -t demo:new .
...
[user@node demo]$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
demo new 568b29a0dce9 20 seconds ago 416MB
demo latest 5a6aedc1feab 5 minutes ago 590MB
<none> <none> 59e595750dd5 28 minutes ago 645MB
centos 7 49f7960eb7e4 2 months ago 200MB
Image demo:new
is much smaller in size than demo:latest
, even though it contains the exact same software - why?
Conclusion
In this demo, we explored the layered structure of images; each layer is built as a distinct image and can be treated as such, on the host where it was built. This information is preserved on the build host for use in the build cache; build another image based on the same lower layers, and they will be reused to speed up the build process. Notice that the same is not true of downloaded images like centos:7
; intermediate image caches are not downloaded, but rather only the final complete image.
3. Basic Volume Usage
In this demo, we’ll illustrate:
-
Creating, updating, destroying, and mounting docker named volumes
-
How volumes interact with a container’s layered filesystem
-
Usecases for mounting host directories into a container
Using Named Volumes
Create a volume, and inspect its metadata:
[user@node ~]$ docker volume create demovol
[user@node ~]$ docker volume inspect demovol
[
{
"CreatedAt": "2018-11-03T19:07:56Z",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/demovol/_data",
"Name": "demovol",
"Options": {},
"Scope": "local"
}
]
We can see that by default, named volumes are created under /var/lib/docker/volumes/<name>/_data
.
Run a container that mounts this volume, and list the filesystem therein:
[user@node ~]$ docker container run -it -v demovol:/demo centos:7 bash
[root@f4aca1b60965 /]# ls
anaconda-post.log bin demo dev etc home ...
The demo
directory is created as the mountpoint for our volume, as specified in the flag -v demovol:/demo
. This should also appear in your container filesystem’s list of mountpoints:
[root@f4aca1b60965 /]# cat /proc/self/mountinfo | grep demo
1199 1180 202:1 /var/lib/docker/volumes/demovol/_data /demo
rw,relatime - xfs /dev/xvda1 ...
Put a file in this volume:
[root@f4aca1b60965 /]# echo 'dummy file' > /demo/mydata.dat
Exit the container, and list the contents of your volume on the host:
[user@node ~]$ sudo ls /var/lib/docker/volumes/demovol/_data
You’ll see your mydata.dat
file present at this point in the host’s filesystem. Delete the container:
[user@node ~]$ docker container rm -f <container ID>
The volume and its contents will still be present on the host.
Start a new container mounting the same volume, attach a bash shell to it, and show that the old data is present in your new container:
[user@node ~]$ docker container run -d -v demovol:/demo centos:7 ping 8.8.8.8
[user@node ~]$ docker container exec -it <container ID> bash
[root@11117d3de672 /]# cat /demo/mydata.dat
Exit this container, and inspect its mount metadata:
[user@node ~]$ docker container inspect <container ID>
"Mounts": [
{
"Type": "volume",
"Name": "demovol",
"Source": "/var/lib/docker/volumes/demovol/_data",
"Destination": "/demo",
"Driver": "local",
"Mode": "z",
"RW": true,
"Propagation": ""
}
],
Here too we can see the volumes and host mountpoints for everything mounted into this container.
Build a new image out of this container using docker container commit
, and start a new container based on that image:
[user@node ~]$ docker container commit <container ID> demo:snapshot
[user@node ~]$ docker container run -it demo:snapshot bash
[root@ad62f304ba18 /]# cat /demo/mydata.dat
cat: /demo/mydata.dat: No such file or directory
The information mounted into the original container is not part of the container’s layered filesystem, and therefore is not captured in the image creation process; volume mounts and the layered filesystem are completely separate.
Clean up by removing that volume:
[user@node ~]$ docker volume rm demovol
You will get an error saying the volume is in use - docker will not delete a volume mounted to any container (even a stopped container) in this way. Remove the offending container first, then remove the volume again.
Mounting Host Paths
Make a directory with some source code in it for your new website:
[user@node ~]$ mkdir /home/centos/myweb
[user@node ~]$ cd /home/centos/myweb
[user@node myweb]$ echo "<h1>Hello Wrld</h1>" > index.html
Start up an nginx container that mounts this as a static website:
[user@node myweb]$ docker container run -d \
-v /home/centos/myweb:/usr/share/nginx/html \
-p 8000:80 nginx
Visit your website at the public IP of this node, port 8000.
Fix the spelling of 'world' in your HTML, and refresh the webpage; the content served by nginx gets updated without having to restart or replace the nginx container.
Conclusion
In this demo, we saw two key points about volumes: they exist outside the container’s layered filesystem, meaning that not only are they not captured on image creation, they don’t participate in the usual copy on write procedure when manipulating files in the writable container layer. Second, we saw that manipulating files on the host that have been mounted into a container immediately propagates those changes to the running container; this is a popular technique for developers who containerize their running environment, and mount in their in-development code so they can edit their code using the tools on their host machine that they are familiar with, and have those changes immediately available inside a running container without having to restart or rebuild anything.
4. Single Host Networks
In this demo, we’ll illustrate:
-
Creating docker bridge networks
-
Attaching containers to docker networks
-
Inspecting networking metadata from docker networks and containers
-
How network interfaces appear in different network namespaces
-
What network interfaces are created on the host by docker networking
-
What iptables rules are created by docker to isolate docker software-defined networks and forward network traffic to containers
Following Default Docker Networking
Switch to a fresh node you haven’t run any containers on yet, list your networks:
[centos@node-1 ~]$ docker network ls
NETWORK ID NAME DRIVER SCOPE
7c4e63830cbf bridge bridge local
c87d2a849036 host host local
902af00d5511 none null local
Get some metadata about the bridge
network, which is the default network containers attach to when doing docker container run
:
[centos@node-1 ~]$ docker network inspect bridge
Notice the IPAM
section:
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
}
]
}
Docker’s IP address management driver assigns a subnet (172.17.0.0/16
in this case) to each bridge network, and uses the first IP in that range as the network’s gateway.
Also note the containers
key:
"Containers": {}
So far, no containers have been plugged into this network.
Have a look at what network interfaces are present on this host:
[centos@node-1 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP qlen 1000
link/ether 12:eb:dd:4e:07:ec brd ff:ff:ff:ff:ff:ff
inet 10.10.17.74/20 brd 10.10.31.255 scope global dynamic eth0
valid_lft 2444sec preferred_lft 2444sec
inet6 fe80::10eb:ddff:fe4e:7ec/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:e2:c5:a4:6b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
We see the usual eth0
and loopback interfaces, but also the docker0
linux bridge, which corresponds to the docker software defined network we were inspecting in the previous step; note it has the same gateway IP as we found when doing docker network inspect
.
Create a docker container without specifying any networking parameters, and do the same docker network inspect
as above:
[centos@node-1 ~]$ docker container run -d centos:7 ping 8.8.8.8
[centos@node-1 ~]$ docker network inspect bridge
...
"Containers": {
"f4e8f3f1b918900dd8c9b8867aa3c81e95cf34aba7e366379f2a9ade9987a40b": {
"Name": "zealous_kirch",
"EndpointID": "f9f246a...",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
}
}
...
The Containers
key now contains the metadata for the container you just started; it received the next available IP address from the default network’s subnet. Also note that the last four digits of the container’s MAC address are the same as its IP on this network - this encoding ensures containers get a locally unique MAC address that linux bridges can route traffic to.
Look at your network interfaces again:
[centos@node-1 ~]$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP qlen 1000
link/ether 12:eb:dd:4e:07:ec brd ff:ff:ff:ff:ff:ff
inet 10.10.17.74/20 brd 10.10.31.255 scope global dynamic eth0
valid_lft 2188sec preferred_lft 2188sec
inet6 fe80::10eb:ddff:fe4e:7ec/64 scope link
valid_lft forever preferred_lft forever
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
link/ether 02:42:e2:c5:a4:6b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::42:e2ff:fec5:a46b/64 scope link
valid_lft forever preferred_lft forever
5: vethfbd45f0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
master docker0 state UP
link/ether 6e:3c:e4:21:7b:e2 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::6c3c:e4ff:fe21:7be2/64 scope link
valid_lft forever preferred_lft forever
A new interface has appeared: interface number 5 is the veth connection connecting the container’s network namespace to the host’s network namespace. But, what happened to interface number 4? It’s been skipped in the list.
Look closely at interface number 5:
5: vethfbd45f0@if4
That @if4
indicates that interface number 5 is connected to interface 4. In fact, these are the two endpoints of the veth connection mentioned above; each end of the connection appears as a distinct interface, and ip addr
only lists the interfaces in the current network namespace (the host in the above example).
Look at the interfaces in your container’s network namespace (you’ll first need to connect to the container and install iproute
):
[centos@node-1 ~]$ docker container exec -it <container ID> bash
[root@f4e8f3f1b918 /]# yum install -y iproute
...
[root@f4e8f3f1b918 /]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
4: eth0@if5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default
link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.17.0.2/16 scope global eth0
valid_lft forever preferred_lft forever
Not only does interface number 4 appear inside the container’s network namespace connected to interface 5, but we can see that this veth endpoint inside the container is getting treated as the eth0
interface inside the container.
Establishing Custom Docker Networks
Create a custom bridge network:
[centos@node-1 ~]$ docker network create my_bridge
[centos@node-1 ~]$ docker network ls
NETWORK ID NAME DRIVER SCOPE
7c4e63830cbf bridge bridge local
c87d2a849036 host host local
a04d46bb85b1 my_bridge bridge local
902af00d5511 none null local
my_bridge
gets created as another linux bridge-based network by default.
Run a couple of containers named c2
and c3
attached to this new network:
[centos@node-1 ~]$ docker container run \
--name c2 --network my_bridge -d centos:7 ping 8.8.8.8
[centos@node-1 ~]$ docker container run \
--name c3 --network my_bridge -d centos:7 ping 8.8.8.8
Inspect your new bridge:
[centos@node-1 ~]$ docker network inspect my_bridge
...
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
...
"Containers": {
"084caf415784fb4d58dc6fb4601321114b93dc148793fd66c95fc2c9411b085e": {
"Name": "c3",
"EndpointID": "8046005...",
"MacAddress": "02:42:ac:12:00:03",
"IPv4Address": "172.18.0.3/16",
"IPv6Address": ""
},
"23d2e307325ec022ce6b08406bfb0f7e307fa533a7a4957a6d476c170d8e8658": {
"Name": "c2",
"EndpointID": "730ac71...",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}
},
...
The next subnet in sequence (172.18.0.0/16
in my case) has been assigned to my_bridge
by the IPAM driver, and containers attached to this network get IPs from this range exactly as they did with the default bridge network.
Try to contact container c3
from c2
:
[centos@node-1 ~]$ docker container exec c2 ping c3
It works - containers on the same custom network are able to resolve each other via DNS lookup of container names. This means that our application logic (c2 ping c3
in this simple case) doesn’t have to do any of its own service discovery; all we need to know are container names, and docker does the rest.
Start another container on my_bridge
, but don’t name it:
[centos@node-1 ~]$ docker container run --network my_bridge -d centos:7 ping 8.8.8.8
[centos@node-1 ~]$ docker container ls
CONTAINER ID IMAGE ... STATUS PORTS NAMES
625cb95b922d centos:7 ... Up 2 seconds competent_leavitt
084caf415784 centos:7 ... Up 5 minutes c3
23d2e307325e centos:7 ... Up 5 minutes c2
f4e8f3f1b918 centos:7 ... Up 21 minutes zealous_kirch
As usual, it got a default name generated for it (competent_leavitt
in my case). Try resolving this name by DNS as above:
[centos@node-1 ~]$ docker container exec c2 ping competent_leavitt
ping: competent_leavitt: Name or service not known
DNS resolution fails. Containers must be explicitly named in order to appear in docker’s DNS tables.
Find the IP of your latest container (competent_leavitt
in my case) via docker container inspect
, and ping it from c2
directly by IP:
[centos@node-1 ~]$ docker network inspect my_bridge
...
"625cb95b922d2502fd016c6517c51652e84f902f69632d5d399dc38f3f7b2711": {
"Name": "competent_leavitt",
"EndpointID": "2fdb093d97b23da43023b07338a329180995fc0564ed0762147c8796380c51e7",
"MacAddress": "02:42:ac:12:00:04",
"IPv4Address": "172.18.0.4/16",
"IPv6Address": ""
}
...
[centos@node-1 ~]$ docker container exec c2 ping 172.18.0.4
PING 172.18.0.4 (172.18.0.4) 56(84) bytes of data.
64 bytes from 172.18.0.4: icmp_seq=1 ttl=64 time=0.083 ms
64 bytes from 172.18.0.4: icmp_seq=2 ttl=64 time=0.060 ms
The ping succeeds. While the default-named container isn’t resolvable by DNS, it is still reachable on the my_bridge
network.
Finally, create container c1
attached to the default network:
[centos@node-1 ~]$ docker container run --name c1 -d centos:7 ping 8.8.8.8
Attempt to ping it from c2
by name:
[centos@node-1 ~]$ docker container exec c2 ping c1
ping: c1: Name or service not known
DNS resolution is scoped to user-defined docker networks. Find c1
's IP manually as above (mine is at 172.17.0.3
), and ping this IP directly from c2
:
[centos@node-1 ~]$ docker container exec c2 ping 172.17.0.3
The request hangs until it times out (press CTRL+C
to give up early if you don’t want to wait for the timeout). Different docker networks are firewalled from each other by default; dump your iptables rules and look for lines similar to the following:
[centos@node-1 ~]$ sudo iptables-save
...
-A DOCKER-ISOLATION-STAGE-1 -i br-dfda80f70ea5
! -o br-dfda80f70ea5 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -i docker0 ! -o docker0 -j DOCKER-ISOLATION-STAGE-2
-A DOCKER-ISOLATION-STAGE-1 -j RETURN
-A DOCKER-ISOLATION-STAGE-2 -o br-dfda80f70ea5 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -o docker0 -j DROP
-A DOCKER-ISOLATION-STAGE-2 -j RETURN
...
The first line above forwards traffic originating from br-dfda80f70ea5
(that’s your custom bridge) but destined somewhere else to the stage 2 isolation chain, where if it is destined for the docker0
bridge, it gets dropped, preventing traffic from going from one bridge to another.
Forwarding a Host Port to a Container
Start an nginx
container with a port exposure:
[centos@node-1 ~]$ docker container run -d -p 8000:80 nginx
This syntax asks docker to forward all traffic arriving on port 8000 of the host’s network namespace to port 80 of the container’s network namespace. Visit the nginx landing page at <node-1 public IP>:8000
.
Inspect your iptables rules again to see how docker forwarded this traffic:
[centos@node-1 ~]$ sudo iptables-save | grep 8000
-A DOCKER ! -i docker0 -p tcp -m tcp --dport 8000
-j DNAT --to-destination 172.17.0.4:80
Inspect your default bridge network to find the IP of your nginx container; you should find that it matches the IP in the network address translation rule above, which states that any traffic arriving on port tcp/8000 on the host should be network address translated to 172.17.0.4:80
- the IP of our nginx container and the port we exposed with the -p 8000:80
flag when we created this container.
Clean up your containers and networks:
[centos@node-1 ~]$ docker container rm -f $(docker container ls -aq)
[centos@node-1 ~]$ docker network rm my_bridge
Conclusion
In this demo, we stepped through the basic behavior of docker software defined bridge networks, and looked at the technology underpinning them such as linux bridges, veth connections, and iptables rules. From a practical standpoint, in order for containers to communicate they must be attached to the same docker software defined network (otherwise they’ll be firewalled from each other by the cross-network iptables rules we saw), and in order for containers to resolve each other’s name by DNS, they must also be explicitly named upon creation.
5. Docker Compose
In this demo, we’ll illustrate:
-
Starting an app defined in a docker compose file
-
Inter-service communication using DNS resolution of service names
Exploring the Compose File
Please download the DockerCoins app from Github and change directory to ~/orchestration-workshop/dockercoins.
[user@node ~]$ git clone -b ee3.0 \
https://github.com/docker-training/orchestration-workshop.git
[user@node ~]$ cd ~/orchestration-workshop/dockercoins
Let’s take a quick look at our Compose file for Dockercoins:
version: "3.1"
services:
rng:
image: training/dockercoins-rng:1.0
networks:
- dockercoins
ports:
- "8001:80"
hasher:
image: training/dockercoins-hasher:1.0
networks:
- dockercoins
ports:
- "8002:80"
webui:
image: training/dockercoins-webui:1.0
networks:
- dockercoins
ports:
- "8000:80"
redis:
image: redis
networks:
- dockercoins
worker:
image: training/dockercoins-worker:1.0
networks:
- dockercoins
networks:
dockercoins:
This Compose file contains 5 services, along with a bridge network.
When we start the app, we will see the service images getting downloaded one at a time:
[user@node dockercoins]$ docker-compose up -d
After starting, the images required for this app have been downloaded:
[user@node dockercoins]$ docker image ls | grep "dockercoins"
Make sure the services are up and running, as is the dedicated network:
[user@node dockercoins]$ docker-compose ps
[user@node dockercoins]$ docker network ls
If everyting is up, visit your app at <node-0 public IP>:8000
to see Dockercoins in action.
Communicating Between Containers
In this section, we’ll demonstrate that containers created as part of a service in a Compose file are able to communicate with containers belonging to other services using just their service names. Let’s start by listing our DockerCoins containers:
[user@node dockercoins]$ docker container ls | grep 'dockercoins'
Now, connect into one container; let’s pick webui
:
[user@node dockercoins]$ docker container exec -it <Container ID> bash
From within the container, ping rng
by name:
[root@<Container ID>]# ping rng
Logs should be outputted resembling this:
PING rng (172.18.0.5) 56(84) bytes of data.
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=1 ttl=64 time=0.108 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=2 ttl=64 time=0.049 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=3 ttl=64 time=0.073 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=4 ttl=64 time=0.067 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=5 ttl=64 time=0.057 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=6 ttl=64 time=0.074 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=7 ttl=64 time=0.052 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=8 ttl=64 time=0.057 ms
64 bytes from dockercoins_rng_1... (172.18.0.5): icmp_seq=9 ttl=64 time=0.080 ms
Use CTRL+C
to terminate the ping. DNS lookup for the services in DockerCoins works because they are all attached to the user-defined dockercoins
network.
After exiting this container, let’s navigate to the worker
folder and take a look at a section of worker.py
:
[user@node dockercoins]$ cd worker
[user@node ~ worker]$ cat worker.py
import logging
import os
from redis import Redis
import requests
import time
DEBUG = os.environ.get("DEBUG", "").lower().startswith("y")
log = logging.getLogger(__name__)
if DEBUG:
logging.basicConfig(level=logging.DEBUG)
else:
logging.basicConfig(level=logging.INFO)
logging.getLogger("requests").setLevel(logging.WARNING)
redis = Redis("redis")
def get_random_bytes():
r = requests.get("http://rng/32")
return r.content
def hash_bytes(data):
r = requests.post("http://hasher/",
data=data,
headers={"Content-Type": "application/octet-stream"})
hex_hash = r.text
return hex_hash
As we can see in the last two stanzas, we can direct traffic to a service via a DNS name that exactly matches the service name defined in the docker compose file.
Shut down Dockercoins and clean up its resources:
[user@node dockercoins]$ docker-compose down
Conclusion
In this exercise, we stood up an application using Docker Compose. The most important new idea here is the notion of Docker Services, which are collections of identically configured containers. Docker Service names are resolvable by DNS, so that we can write application logic designed to communicate service to service; all service discovery and load balancing between your application’s services is abstracted away and handled by Docker.