Skip to main content Skip to page footer

Managing virtualenvs with dynamic libraries on NixOS (Part 2)

Erstellt von Maximilian Bosch | | Blog

In the previous part of this mini-series it has been demonstrated what pitfalls the common LD_LIBRARY_PATH approach has and an alternative implementation has been introduced. This article covers how this solution has been automated in our application deployments.

A brief introduction to batou

At the Flying Circus, we use our in-house deployment tool batou to manage application deployments. If you're curious, you may be interested in the introduction that Christian Theune gave at FrOSCon 2022 (German).

To make sure, the rest is understandable, I'll explain the parts being relevant for the rest. Please note that this is non-exhaustive, incomplete and does not reflect the visions the authors of batou had.

  • Everything is written in Python.
  • The tooling follows a convergent model.
  • A deployment is split into multiple components (across multiple hosts technically, but that's not relevant for the rest).
  • Each component can have sub-components. These are added with the += operator.
  • Each component can have a verify() method that checks if the state of the component is outdated.
  • Each component can have an update() method that mutates the system if verify() determined the component's state doesn't match the system's state.

Let's give a short example:

from batou.component import Component, Attribute
class File(Component):
    namevar = "path"
    content = Attribute(str)
    def verify(self):
        # Check if a file under `self.path` exists with content
        # as in `self.content`
        pass
    def update(self):
        # Write `self.content` to `self.path`
        pass
class AddFiles(Component):
    def configure(self):
        # Components can have sub-components
        self += File("/etc/local/nixos/foo.nix", content="…")
        self += File("/etc/local/nixos/bar.nix", content="…")

When we deploy the component AddFiles to a host, it will have the files /etc/local/nixos/foo.nix and /etc/local/nixos/bar.nix with the specified content after that.

Such a component now exists in our collection of reusable batou components called batou_ext and is called batou_ext.python.FixELFRunPath.

Deploying Python applications

Let's take a look at an example component. The snippets below are all parts of the configure method:

self += File("requirements.txt", content="numpy")
self += VirtualEnvRequirements(version="3.11", requirements_path=self._.path)

Each component has its own workdir in the HOME directory of the deployment user. In this workdir we create a requirements.txt that requests numpy and install a virtualenv with this library.

With nothing else, the import of numpy would fail since it'd fail to import libz.so.

Now, we need to install the dependencies, zlib and libgcc.lib (for libgcc_s.so) into the user's profile:

self += UserEnv(
    "numpy",
    packages=["libgcc.lib", "zlib"],
    channel="https://releases.nixos.org/nixos/24.05/nixos-24.05.4997.086b448a5d54/nixexprs.tar.xz"
)

And finally, we run patchelf over the virtualenv installation:

self += FixELFRunPath(
    path=self.map("."),
    env_directory=os.path.expanduser("~/.nix-profile/lib"),
)

The self.map(".") gives us the absolute path to the component's workdir in which we've installed the virtualenv before. The env_directory is the directory which contains the shared libraries that are required by the Python dependencies in the virtualenv.

For completeness sake, here's the code for the entire component:

import os.path
from batou.component import Component
from batou.lib.file import File
from batou_ext.nix import UserEnv
from batou_ext.python import FixELFRunPath, VirtualEnvRequirements
class VirtualEnvInstallation(Component):
    def configure(self):
        self += File("requirements.txt", content="numpy")
        self += UserEnv(
            "numpy",
            packages=["libgcc.lib", "zlib"],
            channel="https://releases.nixos.org/nixos/24.05/nixos-24.05.4997.086b448a5d54/nixexprs.tar.xz"
        )
        self += VirtualEnvRequirements(version="3.11", requirements_path=self._.path)
        self += FixELFRunPath(
            path=self.map("."),
            env_directory=os.path.expanduser("~/.nix-profile/lib")
        )

With this component deployed, let's take a look at one of the dynamic libraries in numpy:

$ readelf -d ./lib/python3.11/site-packages/numpy.libs/_core/_multiarray_umath.cpython-311-x86_64-linux-gnu.so
Dynamic section at offset 0x9f2000 contains 30 entries:
  Tag        Type                         Name/Value
 0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/../../numpy.libs:/srv/s-myservice/.nix-profile/lib/]
 0x0000000000000001 (NEEDED)             Shared library: [libscipy_openblas64_-ff651d7f.so]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [ld-linux-x86-64.so.2]
[…]

The DT_RPATH being set is the effect from FixELFRunPath: it has $ORIGIN/../../numpy.libs (another Python package) as search-path entry for libscipy_openblas64 and as mentioned above, the Nix profile for libgcc_s.so.1.

If you're curious, the code can be found in src/batou_ext/python.py in batou_ext.

This code is used in several deployments for customer applications written in Python. It improved the situation over the previous approach with LD_LIBRARY_PATH that was described above.

Caveats

Even though we fixed a bunch of Python installations with it and improved the overall situation, it's still a hack. As discussed above, it uses the technically deprecated DT_RPATH field and has a few additional downsides:

  • The dependencies that are actually needed must still be found by hand and then mapped to package names from nixpkgs.
  • When a library in nixpkgs gets e.g. a security update, all dependent packages will be rebuilt. This is necessary since programs from nixpkgs only use libraries from other store paths, not a global prefix such as /usr/lib. Since store-paths are unique, packages must be rebuilt if a dependency changes to produce a new store-path that links to the store-path of the patched program.

    To give an example: after e.g. an openssl security patch, all packages in nixpkgs depending on openssl get rebuilt after the patch gets applied. You only have to take care nixpkgs being up to date which is done by our regular platform releases.

    This is not the case for the local environment though: one has to manually update the channel of the environment and roll the update by hand.

    To be fair, this is an issue that also existed with the old LD_LIBRARY_PATH approach as well.

  • After each change to the virtualenv this component must be re-applied since it simply searches for dynamic libraries in the prefix.

    On security updates, the virtualenv must be explicitly re-deployed.

And in addition to that, there are a few pitfalls one must avoid:

  • Even though it's tempting, glibc must not be placed into such an environment. Another glibc was already loaded into memory when the Python interpreter gets started and this can lead to interesting mixups of different versions at runtime which is almost guaranteed to cause a crash at an incovenient time.

    To give an example, one case I hit during development was when I used an older Nix profile with a glibc that's older than 2.34 where some of the core components were split across multiple dynamic libraries. And as a result, the process tried to load another and older libpthread.

  • When installing Python packages like psycopg2 that build their C code on install, it's important to add gcc to the environment. That way, the build will use an up-to-date glibc on build already.
  • A lot of packages also need libgcc_s.so which should be taken from libgcc.lib. The .lib references a package output here. Nix packages can define multiple outputs to separate e.g. man pages, executables and libraries.

    However, there's a small trap that's easy to miss: the Nix environments are put together using a function called buildEnv.

    One of the attributes passed to this function is called extraOutputsToInstall. If this doesn't contain lib, libgcc.lib won't be installed such as libgcc.out.

    Installing libgcc.out is also a problem since it contains gcc, but the wrong one! When you install a gcc on NixOS, it's always a shell script that sets a bunch of compiler flags such as hardening flags and flags to find the correct glibc. The unwrapped gcc doesn't have that and the objects produced by this compiler produce errors like this when building the dynamic libraries from psycopg2:

    /nix/store/01bi48ga8vzbz4zgh7g3bsl620cj27kj-binutils-2.40/bin/ld: cannot find crti.o: No such file or directory

    The correct GCC is the one with gcc-wrapper in the derivation name.

So while the approach makes things easier for us, there's still room for improvement. Thoughts on that will be shared in the final part of this mini series.

 

Zurück