Manual decryption of drives via SSH on headless systemd machines

published by Christian Seiler on Sun, 06/28/2015 - 19:40

This article describes how to setup a headless machine that contains encrypted partitions where the administrator can log in via SSH after boot, enter the decryption password and then continue to start the remaining services. It assumes Debian Jessie and systemd as the init system.

To summarize the scenario:

The main partitions containing the operating system itself are not encrypted
At boot most system services are not automatically started, only SSH and syslog are.
After booting, the administrator may log in via SSH and enter the password for the encrypted partitions.
After mounting the encrypted partitions, all installed services will be started up properly.

A note on the scenario itself

A better way of having an encrypted headless server is to encrypt the entire disk and decrypt it already in the initrd. However, in some cases one might not be able to do this - and this scenario presents an opportunity to explain many concepts of systemd via a specific use case.

Note that if the headless system is housed elsewhere, and one does not have complete physical control over it, using encryption in this manner will always be susceptible to manipulation by people with physical access - a possible attack vector would be to install a rootkit when the system is offline, to gain access after the administrator enters the password to decrypt the data. This also applies to other systems, e.g. laptops, but there it is at least somewhat more likely to be noticed if something like this occurs - as compared to a completely headless system one does not have physical control over.

Some basic systemd concepts

Before presenting a solution, here is a short recapitulation of some basic systemd concepts that will help in understanding how the solution works.

Units

A unit is anything that systemd manages and/or monitors. Relevant for this use case are service, target and mount units, but systemd supports more than just those (see man:systemd.unit(8) and the references therein for more details).

A service unit is the simplest type to understad, it corresponds to services and executables run at boot. This is somewhat equivalent to /etc/init.d scripts in sysvinit/LSB.

A mount unit represents any mount point in the system. If one manually mounts something via the mount(8) commant, systemd will track that mount point by synthesizing a mount unit for it. So for example:

mkdir /mnt/a /mnt/b
mount --bind /mnt/a /mnt/b

Of course, one may umount the mount point via the command umount /mnt/b, but the following is also possible:

systemctl stop mnt-b.mount

For mount points that were manually created, this might not be that interesting (probably even most systemd developers will use umount in that case), but /etc/fstab is also handled in this way via systemd: a so-called generator creates a mount unit for each entry there. When the system is booted, systemd will then mount those filesystems automatically (by calling mount(8) itself), because the generator also hooks those mount units up to the system startup. (It's also possible to explicitly write mount unit files that systemd will read in addition to /etc/fstab, but usually that's not necessary.)

A target unit is the simplest kind of unit: it can only have dependencies (and things such as a description), but nothing else. It is in some sense very similar to a trivial service unit that just runs /bin/true. Targets are useful for grouping other units together and for providing synchronization points. For example, as is documented in man:bootup(7), there's a target called local-fs.target that has the following semantics: every local filesystem in /etc/fstab is ordered before that target, so that every service that orders after it can be sure that all local filesystems are already mounted at that point. (Most services are implicitly ordered after that target, because by default units are ordered after basic.target, which itself is ordered after local-fs.target.)

Booting the system

For people used to sysvinit, at first glance it might appear to be the case that a lot of the bootup logic is hardcoded into systemd. That is not quite true: there is some logic in systemd that mounts some necessary kernel filesystems (mainly /proc and /sys), and it also does some very elementary things like setting the hostname from /etc/hostname, but beyond that the entire boot process is very customizable.

After the very few initial things that are hard-coded, systemd will just try to start a specific unit. By default this unit is default.target, but it can be overridden by the kernel command line option systemd.unit=. Also, default.target is just a link to the actual target that is to be started (usually graphical.target), and the destination to where that points can be overridden.

The default target then pulls in all of the other units required to boot the system. If one tells systemd to start a unit that doesn't pull in all or even any of the default dependencies, those will not be started. This is how emergency.target and rescue.target work. But this also means that one can heavily customzie the boot process by supplying an alternative boot target that can then pull in anything one wants.

Implementing the scenario

There are multiple possible solutions for implementing this scenario, but the focus here will be on the following logic:

Tell systemd not to decrypt the encrypted partition at boot and to not automatically mount it during the boot process.
Create a target unit that will serve as the new boot target. It will only pull in the most basic required system services, and also SSH and syslog.
Create a second target unit that will pull in the encrypted partition and all other services, so that when one manually tells systemd to start that new target, the partition will be decrypted and all other daemons will be started.

Don't automatically mount encrypted partitions

First of all, the encrypted partition needs to be added to /etc/crypttab. For this article, this will be implemented in a KVM, so /dev/vda is the main disk - and /dev/vda5 will be the encrypted partition. It will be assumed that it is already formatted, that LUKS is used, and that there is already a filesystem in the encrypted container. The device that contains the plaintext filesystem will be called /dev/mapper/crypto here. The entry in /etc/crypttab should look something like this:

crypto /dev/vda5 none luks,noauto

It is very important that the noauto setting is there: it means that the drive should not be automatically decrypted at boot - otherwise systemd would ask for the password during boot, and it would do so before starting the SSH daemon.

Internally, every entry in /etc/crypttab is processed by a generator that dynamically transforms them into systemd units. It is called once very early at boot and then every time systemd is told to reload its configuration. For this specific entry in /etc/crypttab the generator creates the unit systemd-cryptsetup@crypto.service. If there had not been the noauto here, the unit would have been marked to be automatically started, but since it's present, it will only be started manually or if another unit declares it as a dependency.

One may now check if systemd picks up on the /etc/crypttab entry by running

systemctl daemon-reload

This will tell systemd to re-run all its generators (those for /etc/crypttab, /etc/fstab, ...) and then re-read all systemd units (both static and dynamically generated). Important: any time something that affects systemd units directly is changed, systemd should be told about this via this command.

Now one may check that the unit was properly picked up:

systemctl status systemd-cryptsetup@crypto.service

It should show that the unit exists (loaded), but it should also show that it's inactive, so it wasn't started automatically (noauto keyword).

The filesystem should now also be added to /etc/fstab:

/dev/mapper/crypto /srv ext4 rw,noauto 0 0

The noauto is also present here, since it shouldn't be mounted automatically at boot. This is really important here, since systemd considers local mounts that fail to be a fatal error by default. And because the device will not be decrypted automatically at boot, systemd would wait until its default timeout of 90 seconds for the device to appear before dropping the system into an emergency shell, if the noauto option were missing.

Alternative boot target

Now that this is done, the next step is to create the target unit for booting the system with just the selected minimal set of daemons. There is good reason not to interfere with the early boot process, so one still wants to pull in basic.target for basic system initialization. But beyond that only SSH and syslog should be pulled in.

systemd knows two directories where static units are stored: /lib/systemd/system and /etc/systemd/system. The former in /lib is the domain of the distribution and the installed software - it's for all distribution/vendor-supplied units. The latter directory in /etc is purely for the administrator: any file in /etc/systemd/system will override any file with the same name in /lib/systemd/system, so that the administrator has the ability to customize any unit file of their choosing.

Therefore, all changes should happen in /etc/systemd/system. The new boot target will be called before-decrypt.target in this article (but the name is freely choosable, as long as it's not a unit that already exists), so one should now create a file /etc/systemd/system/before-decrypt.target, and it should have the following contents:

[Unit]
Description=System before Decryption
Requires=basic.target
Conflicts=rescue.service rescue.target
After=basic.target rescue.service rescue.target
AllowIsolate=yes

Its contents - apart from Description= and Documentation= - is identical to the default multi-user.target unit in /lib/systemd/system. (graphical.target is just an additional layer that pulls in multi-user.target and the display manager, so that multi-user.target is the target all non-GUI services are hooked up to.)

To understand this file, one needs to understand systemd dependencies: there are quite a few different types of dependencies, but the four relevant dependencies here are:

Requires=: if a unit A requires the unit B, it means that if A is to be started, B will also be pulled in to be started (unless it's already running). If B fails to start, A will then not be started. If B is to be stopped, A will be stopped as well. (Note that the latter is only true for explicit actions, if B segfaults or dies unexpectedly otherwise, A will not be stopped unless one tells systemd to do so explicitly with an additional setting, BindsTo=.)
Wants=: a weaker version of Requires=: if A is to be started, B will also be pulled in to be started. But if B fails to start or B is to be stopped, this will have no effect on A.
Conflicts=: if a unit B conflicts with a unit A, unit B will be stopped if A is to be started or otherwise.
After= and Before=: declares ordering within a single transaction (i.e. if a bunch of units are to be started/stopped at the same time). Note that ordering is orthogonal to pulling things in in systemd: Requires/Wants= does not imply ordering by default (except for target units in some cases, see below)

In the case of before-decrypt.target, the following dependencies are specified:

Requires=basic.target: Pull in all basic system initialization (mounting of local and remote filesystems, starting of early-boot services, etc.)
Conflicts=rescue.service rescue.target: If one was dropped into an emergency shell at boot (or requested it automatically over the kernel command line), this conflict will have systemd to stop the emergency shell once one tells it to continue with the regular boot process.
After=basic.target rescue.service rescue.target: Makes sure that the basic system initialization is done before before-decrypt.target should be started.

This may look a bit odd at least, because multi-user.target is supposed to pull in all the installed system services (and the settings were copied from there), but they are not listed in the unit above. The reason is that it would be tedious to have to modify the target unit every time a new service is installed, so systemd has additional ways of augmenting unit dependencies.

In this specific case, symlinks in the directories /lib/systemd/system/multi-user.target.wans and /etc/systemd/system/multi-user.target.wans to other units will imply a Wants= dependency on those units - so services just have to install the proper symlinks and the dependencies will be added automatically.

The directory in /lib contains all services that shouldn't be accidentally disabled by the administrator (of course, there are still ways to explicitly do so if one really wants to, but systemctl disable won't work on them). These are mainly systemd-internal services. The directory in /etc, on the other hand, contains all units that can be disabled by the administrator (by calling systemctl disable or directly removing the symlinks).

For this use case, the systemd-internal services that should always be run will be kept, even for the new target:

mkdir /etc/systemd/system/before-decrypt.target.wants
cd /etc/systemd/system/before-decrypt.target.wants
for i in /lib/systemd/system/multi-user.target.wants/* ; do
   ln -s /lib/systemd/system/$(basename $i) .
done

Also, SSH and syslog should be explicitly enabled:

cd /etc/systemd/system/before-decrypt.target.wants
ln -s /etc/systemd/system/syslog.service .
ln -s /lib/systemd/system/ssh.service .

A couple of comments:

syslog.service is taken from /etc instead of /lib, because syslog.service is an alias to the currently installed syslog daemon (there are multiple possibilities in Debian). Unfortunately, Debian currently has the packaging bug Bug #760426, so that the symlink is not updated if one changes syslog daemons (and syslog will not work properly at all). The first syslog daemon installed (typically rsyslog because it's Debian's default) will set the proper symlink, however.
Both syslog and SSH have native service files, so one can link them directly to the .wants/ directory.
One can also enable additional services if they are required for the given scenario. In this article, things like cron, atd or the default MTA are not included, because they might rely on data sitting in the encrypted partition. (It will depend on the specific setup.)
The .wants/ symlinks only work for services with native unit files. For sysvinit scripts (/etc/init.d/*) executed in compatibility mode, one needs to explicitly add them via Wants=INITSCRIPT.service (e.g. Wants=exim4.service) to before-decrypt.target. (One may wonder why systemd starts init scripts at all at boot, but that's because systemd's generator that processes the sysvinit scripts handles multi-user.target explicitly, which it won't do for any custom target.)

The system is no nearly ready for a test boot into the new target. To summarize again what has been done so far:

/etc/crypttab and /etc/fstab now have noauto for the encrypted partition
the unit file /etc/systemd/system/before-decrypt.target has been created
the directory /etc/systemd/system/before-decrypt.target.wants has been created and filled with symlinks to all the essential systemd-internal services, SSH and syslog

As a final step, one needs to make the new target the default boot target, via

systemctl set-default before-decrypt.target

Now one should be able to reboot the system. It should boot again, all early-boot initialization should have taken place properly, and one should only see SSH and syslog as services that were started. All other services should be stopped.

Decrypting the partition

The final step is now to provide a mechanism for decrypting the partition and starting the rest of the services. This can be achieved with an additional target decrypt.target, stored in /etc/systemd/system/decrypt.target with the following contents:

[Unit]
Description=Decrypted System
Requires=before-decrypt.target
After=before-decrypt.target
Conflicts=systemd-ask-password-console.path systemd-ask-password-console.service systemd-ask-password-plymouth.path systemd-ask-password-plymouth.service
Requires=systemd-cryptsetup@crypto.service srv.mount start-full-system.service

The interesting settings in the file are:

Conflicts=...: This is to work around a quirk in systemd's handling of asking passwords for encrypted partitions. The problem is that systemd's service to ask password from logged in root users, systemd-ask-password-wall.service, contains a line to tell systemd to stop all services that ask the password during boot. But because systemd serializes transactions, if systemd-ask-password-wall.service is started as part of an automated transaction, things will deadlock and fail after a timeout. This Conflicts= line makes sure that those services are explicitly stopped, circumventing that problem.
Requires=systemd-cryptsetup@crypto.service: Decrypt the partition when starting this target
Requires=srv.mount: Mount the encrypted partition when starting this target
Requires=start-full-system.service: Pull in a service that causes the full system to be started

The service to cause the full system to be started, /etc/systemd/system/start-full-system.service, looks like this:

[Unit]
Description=Start full system after decryption
After=decrypt.target

[Service]
Type=oneshot
ExecStartPre=/bin/systemctl is-active --quiet decrypt.target
ExecStart=/bin/systemctl --no-block start multi-user.target

It calls systemctl to start the original multi-user.target (one could also put graphical.target here, but this is a headless system). In principle, one could also pull in multi-user.target into decrypt.target directly, but that has the problem that all services in multi-user.target are not ordered explicitly after decrypt.target, so systemd would start them immediately.

But by just having an additional service that then calls back out to systemd again, one can achieve proper ordering. Note the --no-block, which causes systemctl to immediately return and not wait for the target to be started: this is because systemd serializes transactions, and will only begin the transaction to start multi-user.target after the current transaction to start decrypt.target is done - but that can't happen if a service in the original transaction is waiting on the second one (causing a deadlock). It has the unfortunate disadvantage that the second transaction starting all the services occurs in the background and that the original transaction starting the decryption process is completed before all services are done starting.

Alternatively, one can manually add Before=all services started by multi-user.target to decrypt.target and then pull in multi-user.target directly instead of start-full-system.service.

A word on target unit ordering: unless one sets DefaultDependencies=no, target units are always ordered After= all of its other dependencies (Requires=, Wants=, Conflicts=, ...), except for those with explicitly ordering. (This is only true for targets, not other unit types!) This means that:

systemd-crypto.service and srv.mount are implicitly ordered before decrypt.target
start-full-system.service is ordered after decrypt.target, because it has an explicit ordering in there

The ordering of those four units can be visualized as:

   systemd-cryptsetup@crypto.service             srv.mount
                  |                                  |
                  \-------------------+--------------/
                                      |
                                      v
                                decrypt.target
                                      |
                                      v
                            start-full-system.service

There is no need to order srv.mount relative to systemd-cryptsetup@crypto.service since mount units by default wait for the corresponding devices to appear, so that ordering is established implicitly.

The ExecStartPre= line in start-full-system.service is to make sure that decryption has taken place before the full system has started (because of the directionality of the Requires= dependency, systemd doesn't take care of this automatically). Alternatively, instead of starting decrypt.target explicitly, one could also start start-full-system.service and declare the Requires= dependency the other way around, then ExecStartPre= would not be required.

Once thse changes have been made, one can tell systemd to reload its configuration:

systemctl daemon-reload

Afterwards, one can start decrypt.target to decrypt the partition and explicitly boot the entire system:

systemctl start decrypt.target

systemctl will then ask for the encrypted partition's password. Once that has been correctly entered, the filesystem will be mounted, systemctl start will return to the command line and in the background the rest of the services will be started.

A reboot and then the same command should also work.

It might be counter-intuitive how the password prompt works, so here's an explanation:

systemd-cryptsetup@crypto.service tells system that it needs a password to decrypt the partition
systemd has two ways of getting passwords: either via a so-called ask-password agent (typically used at boot or if the administrator plugs in a known encrypted drive dynamically), or systemctl will ask the password directly if the device that requires the password is to be unlocked as part of the transaction that systemctl was used for

This has the nice consequence that simply telling systemd to start the target that pulls in the encrypted drive will immediately give a password prompt for that drive - and that same command will later start all the system services automatically. No need for any shell script in this case.

Implementation summary

/etc/crypttab and /etc/fstab: use noauto

/etc/systemd/system/before-decrypt.target with the following contents:

[Unit]
Description=System before Decryption
Requires=basic.target
Conflicts=rescue.service rescue.target
After=basic.target rescue.service rescue.target
AllowIsolate=yes

enable required services for before-decrypt.target:

mkdir /etc/systemd/system/before-decrypt.target.wants
cd /etc/systemd/system/before-decrypt.target.wants
for i in /lib/systemd/system/multi-user.target.wants/* ; do
   ln -s /lib/systemd/system/$(basename $i) .
done
ln -s /etc/systemd/system/syslog.service .
ln -s /lib/systemd/system/ssh.service .

create /etc/systemd/system/decrypt.target with the following contents:

[Unit]
Description=Decrypted System
Requires=before-decrypt.target
After=before-decrypt.target
Conflicts=systemd-ask-password-console.path systemd-ask-password-console.service systemd-ask-password-plymouth.path systemd-ask-password-plymouth.service
Requires=systemd-cryptsetup@crypto.service srv.mount start-full-system.service

create /etc/systemd/system/start-full-system.service with the following contents:

[Unit]
Description=Start full system after decryption
After=decrypt.target

[Service]
Type=oneshot
ExecStartPre=/bin/systemctl is-active --quiet decrypt.target
ExecStart=/bin/systemctl --no-block start multi-user.target

reboot
to decrypt after logging in per SSH: systemctl start decrypt.target, then enter the password
system services will then be started automatically

Some comments about the implementation

Multiple partitions

If one has multiple encrypted partitions, one just needs to add them to the Requires= line of decrypt.target. systemd mangles mount point names a bit for unit files, but what name systemd uses can be easily determined via systemctl status /mount/point (regardless of whethere there's currently something mounted), it will print the name of the mount unit for that directory.

System services

Any system service that's installed will be hooked up by Debian's maintainer scripts to multi-user.target, so that any system service not explicitly hooked up to before-decrypt.target will only be started when the partition has been decrypted. In case a specific service is needed also before decrypting the partition, it may be added manually to the list of services pulled in by before-decrypt.target.

dbus

When having systemd installed, it is recommended (but strictly speaking not required) to also have dbus installed. That is the only service besides internal systemd services (and some internal plymouth services that are not relevant on headless systems) that is statically enabled in /lib/systemd/system/multi-user.target.wants. If dbus is installed after performing these modifications, it is a good idea to also have it started by before-decrypt.target. If it was already installed when the modifications described in this article have been performed, it will already have been picked up and no action is required.

Using LVM

Often one wants to use multiple partitions; but entering multiple pass phrases can be tedious. So instead of having lots of encrypted partitions, one can also use a single encrypted partition, and use LVM to subdivide it further. It will not be discussed here how to set up LVM, but let's assume that there's a volume group vg_crypto with its only PV being the encrypted drive (/dev/mapper/crypto). The /etc/fstab entries are straight-forward (just use /dev/vg_crypto/... as source device, but remember the noauto!), and the /etc/crypttab entry for the device is the same as in the rest of this guide.

However, one now also needs to take care that the volume group is activated after decryption of the partition. Ideally, one would use lvmetad to automatically activate the volume groups and everything would just work[tm]. However, Debian Jessie's lvmetad doesn't work properly, see Bug #774082. Therefore, the volume group needs to be activated manually. (The early-boot LVM scripts that do that normally cannot be used for this, unfortunately.)

Therefore, one should create a unit /etc/systemd/system/decrypt-activate-lvm.service with the following contents (replace the volume group name appropriately):

[Unit]
Description=Activate encrypted LVM VGs
After=systemd-cryptsetup@crypto.service

[Service]
Type=oneshot
ExecStart=/sbin/vgchange -ay vg_crypto

Then, edit /etc/systemd/system/decrypt.target and add the following line:

Requires=decrypt-activate-lvm.service

Then the LVM volume group will automatically be activated and all mount points can then be properly mounted.

Information leaks

Making sure that no information leaks into the unencrypted partition in this kind of setup is hard. In this section a couple of possible leaks will be discussed, but this list is far from complete. One should know what one is doing before running such a setup!

/var/lib, /var/spool

Many services store their databases in subdirectories of /var/lib (and some in /var/spool). Unfortunately, encrypting /var/lib fully won't work in this scenario, since some early-boot services might also need it. In that case, it is best to identify those services and encrypt just /var/lib/$service. bind mounts from an encrypted storage may help here (those should then also have noauto and be pulled in by decrypt.target). If one uses rsyslog and wants to start that before decryption, directories beneath /var/spool also need to be treated individually (otherwise /var/spool could be encrypted entirely).

swap

Swap (if used at all) should be encrypted, either by a fixed key/passphrase (according to the same logic as mount points; it's also configured in /etc/crypttab and /etc/fstab), or with a random key at each boot, in which case the crypttab entry could look something like:

cswap /dev/vda6 /dev/urandom swap,cipher=aes-xts-plain,size=256

(This is supposed to happen automatically, because no password is needed to generate a new random key.)

/tmp, /var/tmp

Temporary files are also information leaks. /tmp should either be its own encrypted partition or a tmpfs (i.e. ramdisk). systemd provides an easy way of enabling /tmp in RAM:

systemctl enable tmp.mount

(As long as there's no /tmp entry in /etc/fstab, otherwise that will take precedence.)

/var/tmp is a bit more complicated, since that is supposed to be preserved between boots. In certain scenarios this requirement may be relaxed and it could also be on a tmpfs (that would then need an explicit /etc/fstab entry), but ideally it would be its own encrypted partition.

/var/log (no persistent journal)

The problem with enabling syslog before decrypting is that /var/log needs to be available already then, potentially leaking information. Alternatively, /var/log could be put on an encrypted partition, and syslog simply not started by the service. If one still needs to have access to logs, systemd's journal will provide access to them (see man:journalctl(8)).

Note that some files in /var/log are also relevant for logins and runlevel changes (specifically wtmp, btmp and lastlog). Since it's unwise to disable the usage of those files (and probably really difficult to catch all programs doing that), in that case one will have to live with the fact that once the encrypted /var/log is mounted, the files that were created there before decryption will be shadowed by the files on the encrypted partition.

Also note that all syslog messages received before decrypting the partition are forwarded by the journal to the already active syslog socket, but they won't be consumed by syslog until that has been started. The kernel buffer has a limited size, so some log messages that are generated before decrypting the might be dropped in syslog (they will appear in the journal, however).

/var/log (persistent journal)

If one activates systemd's persistent journal (by creating /var/log/journal), then one also needs take care of that. During boot, systemd's journal stores a copy of the journal in RAM under /run/log/journal, and once /var is mounted during the early-boot process, systemd-journal-flush.service sends SIGUSR1 to journald to flush them to /var/log/journal. (If that exists; on Debian by default it doesn't, so journald always keeps everything in RAM and never is persistent.)

If that service is started during boot before decrypting, it will not work properly. It is pulled in by sysinit.target.wants/, so one can't easily remove the dependency. It is probably easiest to mask it and create a copy that doesn't have that dependency and hook it up to be run after decrypting the drives:

systemctl mask systemd-journal-flush.service
cp /lib/systemd/system/systemd-journal-flush.service /etc/systemd/system/systemd-journal-flush-after-decrypting.service
ln -s /etc/systemd/system/systemd-journal-flush-after-decrypting.service /etc/systemd/system/multi-user.target.wants/

Final remarks

This use case has been a good way to delve into several aspects of systemd and provide a practical example of how to customize systemd-based systems.

But as already mentioned in the beginning: a better solution to the problem itself is to encrypt the entire system and decrypt it in the initrd.

Tags:

systemd

jessie

debian

luks

Christian's Blog

You are here