Skip to main content Skip to page footer

Downloading and extracting files with batou

Created by Frank Lanitz | | Blog

During our daily work with deployments it is kind of routine to customize a package downloaded from the internet. In most cases this means downloading a tar file, extracting it and finally to modify some of its content to fit our needs.

This blog post will show you how this can be done during the deployment using batou and without fuzzing around with command line options of tar.

Scenario

Let's assume we need to download Wordpress, extract the tarball and add a custom file.

Bootstrapping

First we start with bootstrapping of our batou environment you already know from the 1st part of deploying a Django project-series. As always we ensure it's all stored inside a version control system:

$ mkdir downloadtest
$ cd downloadtest
$ curl -L batou.readthedocs.io/en/latest/batou -o batou
$ chmod +x batou
$ git init
$ git add batou
$ git commit
$ ./batou

Now we can start to implement our little example. For doing this we create a file components/downloadwp/component.py and open it with our favorite editor. The folder components is storing all available components of a batou deployment. Best practice is to put each component into an own subfolder. A component — component.py — contains all declarations needed to set up a part of our application.

The minimal component

The real minimal component consists of only a few lines of code:

import batou.component

class DownloadWP(batou.component.Component):
    def configure(self):
        pass

We need to define a component, a Python class that inherits from Component, having at least one method: configure(). The method is defining the target status of your component. You can also use to add an update() or verify()-method for more complicated cases. If you want to learn more, have a look at our documentation!

For a deployment we need a configuration for the environment. We will deploy to localhost for this example so we create a file environments/local.cfg containing the basic environmental information:

[environment]
connect_method = local

[hosts]
localhost = downloadwp

Inside [hosts] section we find the mappings between the host — localhost — and the component assigned to that host — downloadwp. The component is written in all small letters and identified by their class name from inside the component.py we just wrote.

The foundation of the deployment is done.

Downloading a file

As downloading a file is a very common task, batou is shipping a component for this  purpose. It's called Download and takes at least the URL where it shall download from as well as some hash of the expected file. Batou checks whether the file already exists and the checksum is correct, downloading the file in case they are not correct. This is saving much time during a deployment as well as  traffic.

Download is part of the batou.lib.download module so you can import it via

import batou.lib.download

Once we have imported the module we can use it inside our configure() method by just adding it to our component with the += operator. The extended minimal example could now look like:

import batou.component
import batou.lib.download 

class DownloadWP(batou.component.Component): 
    def configure(self): 
        self += batou.lib.download.Download( 
            'https://wordpress.org/wordpress-4.4.2.tar.gz', 
            checksum=('sha256:c8a74c0f7cfc0d19989d235759e70' 
                      'cebd90f42aa0513bd9bc344230b0f79e08b'))

Let's run ./batou deploy local and see what happens. Rerun to see that wordpress-4.4.2.tar.gz does not get downloaded again. Now we commit the changes to the git repository.

Extract the file

We can extract files to a given target with the Extract component out of the batou.lib.archive module:

import batou.lib.archive

The Extract component take only needs a path to an archive, supporting most popular archive types like  tar.gz or zip.

Similar to Download, Extract recognizes whether there was a change so it's only extracting an archive if needed and skipping the job when there was no change.

Assuming we want to keep old versions as well as having different versions of the Wordpress download for different environments, we need to adjust our component a little to make it more flexible:

import batou.component
import batou.lib.archive
import batou.lib.download
class DownloadWP(Component):
    wordpress_checksum = batou.component.Attribute(str,
        ('sha256:'
         'c8a74c0f7cfc0d19989d235759e70cebd'
         '90f42aa0513bd9bc344230b0f79e08b'))

    def configure(self):
        wordpress_url = "https://wordpress.org/{}.tar.gz".format(
            self.wordpress_version)
        download = batou.lib.download.Download(
            wordpress_url,
            checksum=self.wordpress_checksum)
        self += download
        self += batou.lib.archive.Extract(
            download.target,
            target="orig/{}".format(self.wordpress_version))

Now wordpress_version and wordpress_checksum can be configured by setting them inside an environment file. We will come to this later.

Let's run ./batou deploy local without any configuration change again.

To adjust the version and corresponding checksum we are adding a new section to environments/local.cfg:

[component:downloadwp]
wordpress_version = wordpress-4.4.1
wordpress_checksum = sha256:4895ac8c2ee348513ead161103ae2d97d3e
      f3f684167e5ce602d3998370c05f4

(no linebreak at wordpress_checksum) and rerun your deployment. Do you see the difference?

The checksum consist of two parts: the hash-algorithm as e.g. sha256 or md5 and the actual value of the checksum. We can use every algorithm supported by your Python.

Syncing into another folder and adding your changes

Now it's time for some customizing. We will only add a file called hello.txt, but in real world you might want to add some database configuration or some static files at this point.

Batou is having also the right tools aboard: The component SyncDirectory is able to rsync folders and the component File is helping you deploying all kinds of files. We can import them from batou.lib.file with

 import batou.lib.file

Now we extend our configure() method with a rsync command copying every file from orig/wordpress-4.4.x folder to prepared/wordpress-4.4.x and using File to add a "hello world" file to the freshly synced folder:

self += batou.lib.file.SyncDirectory(
        'prepared/{}'.format(self.wordpress_version),
        source=self.map('orig/{}'.format(self.wordpress_version))
self += batou.lib.file.File(
        'prepared/{}/hello.txt'.format(self.wordpress_version),
        content='Hello world')

File cannot only add files, but also ensure symlinks:

self += batou.lib.file.File(
    'current',
    ensure='symlink',
    link_to='prepared/{}'.format(self.wordpress_version)) 

Adding this section to our deployment code we would add a current symlink which is always pointing to the current version of Wordpress. Having this at the end of the configure() method will do the switch after downloading, unpacking and templating was done. This is a good chance for smooth updates using the deployment.

Of course the symlink could also point to the fresh extracted folder. But in combination with templating a file into the extracted folder, batou would recognize a change and triggers a new extraction cycle. After this, the templated file might be gone, so batou will recreate the file. This would result in a non-convergent deployment.

Conclusion

Putting this together we

  • download Wordpress with a given version and prefix from wordpress.org, ensuring whether the archive is matching the checksum,

  • extracted it,

  • deployed some custom data into the extracted folder, and

  • set a symlink to latest deployed version.

All together our  component looks like this:

import batou.component
import batou.lib.archive
import batou.lib.download 
import File, SyncDirectory

class DownloadWP(batou.component.Component):
    wordpress_version = batou.component.Attribute(str, 'wordpress-4.4.2')
    wordpress_checksum = batou.component.Attribute(str,
        ('sha256:'
        'c8a74c0f7cfc0d19989d235759e70cebd'
        '90f42aa0513bd9bc344230b0f79e08b'))

    def configure(self):
        wordpress_url = "https://wordpress.org/{}.tar.gz".format(
            self.wordpress_version)
        download = batou.lib.download.Download(
            wordpress_url,
            checksum=self.wordpress_checksum)
        self += download
        self += batou.lib.archive.Extract(
            download.target,
            target="orig/{}".format(self.wordpress_version))
        self += batou.lib.file.SyncDirectory(
            'prepared/{}'.format(self.wordpress_version)
             source=self.map(
                 'orig/{}'.format(self.wordpress_version)))
        self += batou.lib.file.File(
            'prepared/{}/hello.txt'.format(self.wordpress_version),
            content='Hello world')
        self += batou.lib.file.File(
            'current',
            ensure='symlink',
            link_to='prepared/{}'.format(self.wordpress_version))
Back