Problem
First time posting and first real Python program that I use fairly frequently. I needed a way to pull a large amount of containers, re-tag them, save them (docker save) and clean up the recently pulled containers.
I did this pretty easily with bash, though depending on the container and depending how many containers I quickly did not like how slow it was and wanted a way to do it faster.
I am looking to make any improvements possible, it works, and it works well (at least I think). I tend to always come across more Pythonic ways to do things and would love the additional feedback.
Couple things that I am currently tracking:
- I need docstrings (I know, small program, but I want to learn to do it
right) - I tossed in the
if __name__ == "__main__":
, and if I am being honest, I am not confident I did that right. - I know, I am terrible at naming functions and variables 🙁
#!/usr/bin/env python3
import sys
import os
from os import path
import concurrent.futures # https://docs.python.org/3/library/concurrent.futures.html
import docker # https://docker-py.readthedocs.io/en/stable/
cli = docker.APIClient(base_url="unix://var/run/docker.sock")
current_dir = os.getcwd()
repository = sys.argv[2]
tar_dir = os.path.join(current_dir, "move")
if path.exists(tar_dir) is not True:
os.mkdir(tar_dir)
def the_whole_shebang(image):
img_t = image.split(":")
img = img_t[0].strip()
t = img_t[1].strip()
image = f"{img}:{t}"
print(f"Pulling, retagging, saving and rmi'ing: {image}")
# Pulls the container
cli.pull(image)
# Tags the container with the new tag
cli.tag(image, f"{repository}/{img}", t)
new_image_name = f"{img}{t}.tar"
im = cli.get_image(image)
with open(os.path.join(tar_dir, new_image_name), "wb+") as f:
for chunk in im:
f.write(chunk)
# Deletes all downloaded images
cli.remove_image(image)
cli.remove_image(f"{repository}/{image}")
if __name__ == "__main__":
with concurrent.futures.ProcessPoolExecutor() as executor:
f = open(sys.argv[1], "r")
lines = f.readlines()
executor.map(the_whole_shebang, lines)
Anyways, I assume there are many things that I can do better, I would love to have any input so that I can improve and learn.
Thank you!
Solution
Global variables
These:
cli = docker.APIClient(base_url="unix://var/run/docker.sock")
current_dir = os.getcwd()
repository = sys.argv[2]
tar_dir = os.path.join(current_dir, "move")
probably shouldn’t live in the global namespace. Move them to a function, and feed them into other functions or class instances via arguments.
Command-line arguments
Use the built-in argparse
library, if not something fancier like Click
, rather than direct use of sys.argv
. It will allow nicer help generation, etc.
Boolean comparison
if path.exists(tar_dir) is not True:
should just be
if not path.exists(tar_dir):
Pathlib
Use it; it’s nice! For instance, if tar_dir
is an instance of Path
, then you can replace the above with if not tar_dir.exists()
.
It should also be preferred over manual path concatenation like f"{repository}/{img}"
; i.e.
Path(repository) / img
there are other instances that should also use pathlib
; for instance rather than manually appending .tar
, use with_suffix
.
Split unpack
img_t = image.split(":")
img = img_t[0].strip()
t = img_t[1].strip()
image = f"{img}:{t}"
can be
image, t = image.split(':')
image_filename = f'{image.strip()}:{t.strip()}'