Pull, Tag, Save, Delete Multiple Docker Containers

Posted on

Problem

First time posting and first real Python program that I use fairly frequently. I needed a way to pull a large amount of containers, re-tag them, save them (docker save) and clean up the recently pulled containers.

I did this pretty easily with bash, though depending on the container and depending how many containers I quickly did not like how slow it was and wanted a way to do it faster.

I am looking to make any improvements possible, it works, and it works well (at least I think). I tend to always come across more Pythonic ways to do things and would love the additional feedback.

Couple things that I am currently tracking:

  • I need docstrings (I know, small program, but I want to learn to do it
    right)
  • I tossed in the if __name__ == "__main__":, and if I am being honest, I am not confident I did that right.
  • I know, I am terrible at naming functions and variables 🙁
#!/usr/bin/env python3

import sys
import os
from os import path
import concurrent.futures  # https://docs.python.org/3/library/concurrent.futures.html
import docker  # https://docker-py.readthedocs.io/en/stable/

cli = docker.APIClient(base_url="unix://var/run/docker.sock")
current_dir = os.getcwd()
repository = sys.argv[2]
tar_dir = os.path.join(current_dir, "move")


if path.exists(tar_dir) is not True:
    os.mkdir(tar_dir)


def the_whole_shebang(image):
    img_t = image.split(":")
    img = img_t[0].strip()
    t = img_t[1].strip()
    image = f"{img}:{t}"

    print(f"Pulling, retagging, saving and rmi'ing: {image}")
    # Pulls the container
    cli.pull(image)
    # Tags the container with the new tag
    cli.tag(image, f"{repository}/{img}", t)

    new_image_name = f"{img}{t}.tar"
    im = cli.get_image(image)
    with open(os.path.join(tar_dir, new_image_name), "wb+") as f:
        for chunk in im:
            f.write(chunk)

    # Deletes all downloaded images
    cli.remove_image(image)
    cli.remove_image(f"{repository}/{image}")

if __name__ == "__main__":
    with concurrent.futures.ProcessPoolExecutor() as executor:
        f = open(sys.argv[1], "r")
        lines = f.readlines()
        executor.map(the_whole_shebang, lines)

Anyways, I assume there are many things that I can do better, I would love to have any input so that I can improve and learn.

Thank you!

Solution

Global variables

These:

cli = docker.APIClient(base_url="unix://var/run/docker.sock")
current_dir = os.getcwd()
repository = sys.argv[2]
tar_dir = os.path.join(current_dir, "move")

probably shouldn’t live in the global namespace. Move them to a function, and feed them into other functions or class instances via arguments.

Command-line arguments

Use the built-in argparse library, if not something fancier like Click, rather than direct use of sys.argv. It will allow nicer help generation, etc.

Boolean comparison

if path.exists(tar_dir) is not True:

should just be

if not path.exists(tar_dir):

Pathlib

Use it; it’s nice! For instance, if tar_dir is an instance of Path, then you can replace the above with if not tar_dir.exists().

It should also be preferred over manual path concatenation like f"{repository}/{img}"; i.e.

Path(repository) / img

there are other instances that should also use pathlib; for instance rather than manually appending .tar, use with_suffix.

Split unpack

img_t = image.split(":")
img = img_t[0].strip()
t = img_t[1].strip()
image = f"{img}:{t}"

can be

image, t = image.split(':')
image_filename = f'{image.strip()}:{t.strip()}'

Leave a Reply

Your email address will not be published. Required fields are marked *