You are here

Bind mounts with options in /etc/fstab vs. systemd

Bind mounts under Linux can be tricky. While certain options (ro/rw, nosuid, nodev, ...) can differ between mountpoints of the same file system (set up via bind mounts), the kernel has the unfortunate property of ignoring all of them when initially establishing a bind mount. Only on remounting the given mountpoint will new options take effect.

For example, take the following command:

mount -o bind,ro /mnt/first /mnt/second

It will create a bind mount, but unless /mnt/first was read-only anyway, the mountpoint /mnt/second will be read-write. The canonical way of solving this is to first bind mount the filesystem and then set the options:

mount -o bind /mnt/first /mnt/second
mount -o remount,bind,ro /mnt/second

Take note of the fact the the second command also has the bind option set, otherwise all instances of the filesystem would be remounted read-only, not just the specific mountpoint. This behavior has been documented many times.

Unfortunately, if one wants to use /etc/fstab to create a bind mount, this will get in your way. For example, if you have the following line in /etc/fstab,

/mnt/first /mnt/second none bind,ro 0 0

the path /mnt/second will not be mounted read-only after boot.

On systems running sysvinit that call mount -a, this could be mitigated by adding two lines to /etc/fstab, such as:

/mnt/first /mnt/second none bind            0 0
/mnt/first /mnt/second none bind,remount,ro 0 0

This would cause mount -a to perform the operations for both lines at boot, so after boot the filesystem would be in a read-only state. (Note, however, that when calling mount /mnt/second directly, the command would stop after matching the first line and just mount it read-write.)

On systems running systemd as init system, this workaround doesn't work. The problem is that systemd tries to be more intelligent about processing /etc/fstab, such as processing it in parallel and being able to use mounts as dependencies. This means, however, that it has to enforce that there can only be one mount on a target directory, because that uniquely identifies the entry. There has been a feature request for systemd to support bind mount options directly, but it was closed with the response that the behavior should rather be fixed in the kernel or in the mount command.

And while that may be a long-term solution for this issue, it does not help in the short term. Fortunately, there has been an interesting proposal for working around this manually, by creating a service unit that is ordered after the mount unit for the bind mount and remounts the unit with the corresponding options. The unit shown there has some drawbacks, however:

  • It's not ordered before the corresponding filesystem target, in this case local-fs.target. On a typical system, local-fs.target means that all local filesystems have been mounted if that target is reached. But here, the lack of odering implies that the service unit might be run after that target is reached, breaking the typical semanticss.
  • Even worse, since DefaultDependencies=yes (implicit value), it's ordered after basic.target, which is ordered after local-fs.target, so it will always be run after that. In the scenario described there, this is not a huge issue, because there the readonlyness is only required after boot once the user has logged in, but for other bind mount scenarios, a service not seeing the mountpoint with the proper options might be an issue.
  • It's static configuration for a single mount, it only deals with read-only instead of all the other possible options.

But despite these drawbacks, it is a great starting point for further improvements.

Ideally, one would want to specify lines in /etc/fstab with the corresponding options and everything should just work[tm]. The good news: this is indeed possible, by using two features of systemd, templated units and generators. Generators are small programs that generate units and dependency information between units on the fly. They are run at early boot and every time systemd reloads its configuration. systemd itself already comes with a few generators. Most notably, /etc/fstab is parsed via a generator, which then creates the corresponding mount units out of it.

Templated units are units that are instantiated dynamically from a single unit file. The unit file itself is named NAME@.service (NAME being a unique name), and the instances are called NAME@INST.service, where INST can be anything (as long as it's escaped properly). Within the unit file definition itself one can reference the value after the @ sign by the special replacement characters %i and %I.

This provides a way to work around the limitation described above. First, a templated unit is needed, e.g. bindmount@.service:

[Unit]
Description=Remouning bind mount with proper fstab options
DefaultDependencies=no
After=%i.mount
Before=local-fs.target

[Service]
Type=oneshot
ExecStart=/bin/mount -o remount,bind /%I

Going through the settings:

DefaultDependencies=no
It should not be ordered after basic.target, so instead of relying on the default dependencies (which are good for most typical services), the unit will have to declare everything explicitly. Note that setting this also removes the default Conflicts=shutdown.target, but since this is a oneshot unit, i.e. it's stopped immediately after having completed the specified command, that conflict dependency is not needed anyway for this specific unit (but for other units without default dependencies, re-adding that shutdown conflict is advisable).
After=%i.mount
Obviously, it should be ordered after the mount itself, otherwise remounting with the proper options will fail. %i will be replaced by the template instance, which is the mountpoint in this case (to see how the template is instantiated, see below).
Before=local-fs.target
The unit should be ordered before the local filesystem target, so that the semantics that all filesystems have been set up properly after reaching that target remain intact.
Type=oneshot
oneshot is the best service type here (without RemainAfterExit) so that it's executed once when the unit is activated, but the unit immediately regresses back to the inactive state. So if for any reason one uses systemctl to stop and start the mount unit again, the settings will be applied again (since the unit is always in the stopped state and can thus be started again at any time).
Exec=/bin/mount -o remount,bind /%I
This command will remount the mountpoint with the options in /etc/fstab. Note that the mount command automatically reads the options from there, so no matter what the options are, if it's just called in this way, it will work. Note that the slash is here because mount unit names don't contain a leading slash, whereas the mount unit name is the template instance, so it should be added. (Technically, all commands are executed with / as working directory by systemd by default, so it would work without that, but it's better to be explicit about it.) Also, %I (capital i) is required here instead of %i, since mount expects the path in it's raw form, not escaped for systemd unit names.

The only thing that's missing now is how the templated unit is instantiated and hooked up to the mount units required, which is what the generator is for. In order to make everything work out of the box with no additional configuration, the generator should parse /etc/fstab, look for entries that contain bind mounts and add the appropriate dependencies. The latter can be done in multiple ways, the easiest in this case is via drop-ins, i.e. creating a directory MOUNTPOINT.mount.d in the generator runtime directory and dumping a configuration snippet there which contains the following:

[Unit]
Requires=bindmount@MOUNTPOINT.service

For example, for the following /etc/fstab line

/mnt/first /mnt/second none bind,ro 0 0

the generator should generate mnt-second.mount.d/bindmount.conf with the contents:

[Unit]
Requires=bindmount@mnt-second.service

The Requires=bindmount@mnt-second.service line does two things: by explicitly referencing it, this specific instance of the template will be created. Also, Requires= will pull in the remounting service as a dependency every time the mount unit is to be started (i.e. the bind mount to be mounted via systemd).

To summarize the logic: mnt-second.mount is to be started (either at boot implicitly, or explicitly by systemctl start). This pulls in bindmount@mnt-second.service as a dependency (via Requires= in the drop-in). Since require-type dependencies are independent of ordering dependencies in systemd, due to the After=%i.mount line in the unit template, the mount unit will be ordered before the service unit. So starting the mount unit (i.e. mounting the filesystem) implies that two units will be started: first the mount unit and thereafter the service unit. Once the latter has run, the mountpoint will have the correct flags set. Since the service unit is of type oneshot, it will immediately go back to the inactive state, awaiting its next activation the next time systemd wants to start the mount unit.

Actually writing the generator is non-trivial and touches a couple of internal details of systemd. Also, bind mounts can be created for network filesystems (i.e. NFS or standard filesystems on iSCSI etc.), which are mounted at a later point in time by systemd (remote-fs.target is the corresponding target). For bind mounts on network filesystems, one typically would specify the _netdev option for it to work properly (regardless of the init system), so depending on whether that option exists or not, the unit remounting the filesystem needs to have a different Before= dependency. Therefore, two unit templates are actually required, and the generator should hook them up depending on the fstab entry. And finally, the details of the contents of the services for bind mounts depends very much on the way the generator works, so it is advantageous to generate the service units from within the generator, to only have to care about one thing that has to be modified.

Note that the actual generator presented here doesn't generate a dependency if only bind is specified as an option, because then doing this is superfluous. For simplicity and transparency, both local and remote service templates are always generated.

Also note that technically there are now two generators that process /etc/fstab: systemd's own generator for creating the mount units from it and this generator for generating and hooking up the service units that set the proper bind mount options. This is not ideal, but one of the goals of this exercise was not to patch systemd.

Finally, there is a short time span between the initial mounting of the bind mount and the point at which the proper options get applied to it. If anything that relies on e.g. the readonlyness of the mountpoint is running at that moment, for this brief period of time its expectations are not met. Also, any service monitoring mounts during that time will see two events: first that a mount is established and shortly after that that its properties are altered. This cannot be avoided (it would need kernel support to do that atomically), although in most setups that want to use this, it is not a huge problem.

Using the generator

The complete generator can be downloaded here. It is available under the MIT license. Currently, it's written as a shell script, which is probably the worst way to write a generator, but has the distinct advantage that the installation is really easy, since the script can just be dropped in the right location and no compilation is required. It needs some standard utilities (such as sed and grep). It's been tested under Debian Jessie (systemd-215) and Fedora 19 (systemd-204). Just drop the generator in /usr/lib/systemd/system-generators (Fedoral, SuSE, ...) or /lib/systemd/system-generators (Debian, Ubuntu, ...), call it e.g. bindmount-generator (the name doesn't matter much, as long as no existing generator is touched), make it executable, and run systemctl daemon-reload (the daemon-reload only makes sense if you already have corresponding entries in /etc/fstab). With the example entry of

/mnt/first /mnt/second none bind,ro 0 0

after calling systemctl daemon-reload one should be able to see the effect of the generator in the dependencies:

$ systemctl show -p Before /mnt/second
Before=umount.target local-fs.target bindmount-local@mnt-second.service

If you try to unmount the mountpoint and then mount it again via systemd, it should be read-only:

$ systemctl restart /mnt/second
$ mount | grep /mnt/second
/dev/something on /mnt/second (ro,...)

Note that manually mounting it via mount will not make it read-only, but that was already the case with the sysvinit hack (see above).

After that, try to reboot the machine and you will see that options besides bind in /etc/fstab for these types of mounts will have been applied. And violĂ  - bind mount options in fstab now work.

Caution: some options can only be set filesystem-wide and not on a per-mountpoint basis (such as relatime/noatime/...). Others work on a per-mountpoint basis, such as ro, nosuid, nodev, etc. If the option can't be set for a mountpoint, the mountpoint will still be mounted, but no options specified in /etc/fstab will have been applied to it, although systemctl --failed will show which bindmount-{local,remote}@.service instances have failed. To check if an option would work, please try to set in manually first via mount -o remount,bind,OPTION /mountpoint, and if that works, it will also work in /etc/fstab.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer