Linux & Container Basics
Prerequisites
To try out the demos of the following sections you need a Linux system, i.e., using the Ubuntu desktop or server. The easiest way is to install Linux as a separate VM (e.g. using VirtualBox, VMWare or Parallels).
In addition to that the following tools have to be installed:
Tools for getting and setting Linux capabilities, in Ubuntu/Debian just install it using
sudo apt install libcap2-bin
File permissions
In Linux, everything is a file, like:
Binary application code
Data
Configuration
Logs
Devices
Permissions on such files determine which users are allowed to access those files and what actions they can perform on the files.
Each file and directory has three user based permission groups:
u - owner – The Owner permissions apply only to the owner of the file or directory, they will not impact the actions of other users.
g - group – The Group permissions apply only to the group that has been assigned to the file or directory, they will not affect the actions of other users.
all users – The All Users permissions apply to all other users on the system, this is the permission group that you want to watch the most.
The Permission Types that are used are:
r – Read
w – Write
x – Execute
The permissions are displayed as: -rwxrwxrwx 1 owner:group
. Using ls -l test.txt
would result in the following:
The first character is the special permission flag
The following set of three characters (rwx) is for the owner permissions
The second set of three characters (rwx) is for the group permissions
The third set of three characters (rwx) is for the all users permissions
Following the grouping the number displays the number of hard links to the file
The last piece is the owner and group assignment
The file owner and group can be changed using the chown
command. By performing chown myuser:mygroup test.txt
the owner of the file test.txt would be myuser and the group would be set to mygroup.
The file permissions are edited by using the command chmod
. You can assign the permissions explicitly or by using a binary reference. You may add the read and write permission to the group using chmod g+rw text.txt
. To remove the same permissions for all other users you would type chmod o-rw text.txt
.
You may also specify the complete file permissions using a binary reference instead: The numbers are a binary representation of the rwx string.
r (read) = 4
w (write) = 2
x (execute) = 1
So you could also perform chmod 644 test.txt
instead.
Special permissions using setuid and setgid
When executing a file, usually the process that gets started inherits your user ID. If the file has the setuid/setgid bit set, the process will have the user/group ID of the file’s owner/group instead.
We will try that using the sleep
command. Because we will change permissions first we will copy the binary to our own one experiment with. To check the installation path of the sleep
file perform a which sleep
. With this path perform the copy command:
Now let's check the file permissions for the mysleep file:
This should return something like this:
Normally, when you execute a file, the process that gets started inherits your user ID. If you now execute it as root user in a terminal with
And then execute this in another terminal:
Then this will run with the root user id:
Now with setuid bit set:
Check again with
If you now execute this in one terminal:
And execute this in another terminal:
Then you will see that even when executing as root the command is run using the other user id:
This bit is typically used to give a program privileges that it needs but are not usually extended to regular users. Because setuid provides a dangerous pathway to privilege escalation, some container image security scanners report on the presence of files with the setuid bit set.
Linux capabilities
Back in the old days the only way in Linux has been to either execute a process in privileged (root) or unprivileged mode (all other users).
With linux capabilities you can now break down privileges used by executing processes/threads to just grant the least privileges required to successfully run a thread.
Just look up the detailed docs for linux capabilities by
CAP_CHOWN
Make arbitrary changes to file UIDs and GIDs
CAP_NET_ADMIN
Perform network operations like modify routing tables
CAP_NET_BIND_SERVICE
Bind a socket to Internet domain privileged ports (port numbers less than 1024)
CAP_NET_RAW
use RAW and PACKET sockets
CAP_SETUID
Make arbitrary manipulations of process UIDs
CAP_SYS_ADMIN
Perform system admin operations like mount, swapon, sethostname or perform privileged syslog
CAP_SYS_BOOT
Use reboot
CAP_SYS_CHROOT
Use chroot
CAP_SYS_TIME
Set system clock
CAP_SYSLOG
Perform privileged syslog operations
If you want to query capabilities for a process use this command
You should use this for root processes, processes for non-root users usually do not have any capabilities set.
You may query capabilities for a file:
This will return
As you can see the ping
command requires the net_raw
capability to access the network socket.
You may also set capabilities for a file:
Docker runs with a balanced set of capabilities between security and usability of containers. You can print the default capabilities set by docker by using this command:
If you even run the container in privileged mode (you should usually never do that) then you get full privileged root access with all linux capabilities set:
This (among other stuff) prints out the standard capabilities set by docker
In privileged mode you can for example list and change partition tables or directly mount file systems:
Usually you even don't need the default capabilities defined by docker. A common use case is to run a container listening on a privileged tcp port (below 1024), e.g. using a http server. For this you just need the capability CAP_NET_BIND_SERVICE:
For more details on docker security consult the docker security docs.
Privilege Escalation
Privilege escalation happens when a user is extending his/her privileges beyond the privileges hi/she was supposed to have. Then a user can take actions that shouldn’t be permitted to take. To escalate privileges, attackers takes advantage of a system vulnerability or misconfiguration.
Usually, the attacker starts as a non-privileged user and wants to gain root privileges on the machine. A common method of escalating privileges is to look for software that’s already running as root and then take advantage of known vulnerabilities in this software.
Even when running a container as a non-root user, there is potential for privilege escalation based on the Linux permissions mechanisms in cases such as
Container images including with a setuid binary
Additional capabilities granted to a container running as a non-root user
You can also prohibit such privilege escalation in docker by adding --security-opt="no-new-privileges:true" to your docker run command.
See the docker run security section for more details.
Linux Namespaces
CGroups (see next section) control the resources that a process can use, namespaces control what a process can see.
Linux currently provides the following namespaces:
Unix Timesharing System (UTS): This namespace is responsible for the hostname and domain names.
Process IDs
Mount points
Network
User and group IDs
Inter-process communications (IPC)
Control groups (cgroups)
You can easily see all namespaces on your machine using the lsns command. Try also to run this command using root, then you will see more details.
By using the tool unshare you may run a process with some namespaces unshared from the parent (i.e. simulating a linux container).
So let's try to use isolating the hostname:
Now in the new shell we have our own hostname isolated by the UTS namespace. Just open a new terminal and check the hostname, then you will see that the host still has the original name.
This basically is the isolation mechanism used by linux containers (combined with linux capabilities).
Linux CGroups
Docker uses the Linux cgroups (one of the linux namespaces) to limit resource usage of containers.
To limit the container to use a maximum of 200MiB and only one half of a cpu use this command:
You will recognize that the spring boot application start up is much slower in this container due to less cpu power.
You can always check the state of the app by issuing the logs:
To see the actual resource consumption of the container use the docker stats command:
All details on limiting resources can be found in docker resource constraints.
Looking inside containers
Just run the following container:
There are different possibilities to look inside a running container.
Docker exec
By using the docker exec command you can just open a shell inside the container (if bash is installed in the container):
Nsenter
Another option is the nsenter tool that basically is intended to run a program with namespaces of other processes.
First we need to check for the pid of the container process using the container identifier using docker inspect. Then we can use nsenter to open a shell inside the container.
Next
Last updated