Problem
This is a pretty basic bash script (3.2 on Mac). I am downloading 584 images from a site in order to create an album.
#!/bin/bash
urlFirst="http://150.216.68.252:8080/adore-djatoka/resolver?url_ver=Z39.88-2004&rft_id=http://150.216.68.252/ncgre000/00000012/00011462/00011462_ac_0"
urlSecond=".jp2&svc_id=info:lanl-repo/svc/getRegion&svc_val_fmt=info:ofi/fmt:kev:mtx:jpeg2000&svc.format=image/jpeg&svc.level=6"
for i in {1..584}
do
printf -v j "%03d" $i
url=$urlFirst$j$urlSecond
wget $url -O $j".jpeg"
done
What improvements should be made to this?
I understand that I could remove the variables in most cases but for debug purposes it was helpful to have them, as I could add basic echo statements.
I’m not very well versed in Bash scripting though (lots of basic scripts) – any improvements or best practices from Bash I’m missing would be appreciated.
Solution
I would write it this way:
#!/bin/bash
url_template="http://150.216.68.252:8080/adore-djatoka/resolver?url_ver=Z39.88-2004&svc_id=info:lanl-repo/svc/getRegion&svc_val_fmt=info:ofi/fmt:kev:mtx:jpeg2000&svc.format=image/jpeg&svc.level=6&rft_id=http://150.216.68.252/ncgre000/00000012/00011462/00011462_ac_%04d.jp2"
for i in {1..584}; do
wget -O $(printf '%03d.jpeg' $i) "$(printf "$url_template" $i)"
done
Specifically,
- At first glance, I thought that
$urlFirst
and$urlSecond
were two URLs, until I figured out what was happening. - If there is another leading 0, then
%04d
is more appropriate than%03d
. - The web server isn’t going to care about the order of the parameters in the query string. For the benefit of humans, though, it would be nicer to put the parameter that changes either first or last.
- Fewer variables is nice. I do everything by interpolation. You can get a feel for what the
wget
command does, without tracing which variable came from where.
Nicely written. I especially like the printf -v
trick to write to a variable directly, as opposed to j=$(printf ...)
. I didn’t know this feature of printf
, very cool, so thanks for that.
Speaking of which, j
is a poor name, like most single-letter variable names. filename
would have been better, and the code would have been instantly easier to read.
ShellCheck gives a warning for the wget $url
, suggesting to double quote to prevent globbing and word splitting. For example if the URL contained a space, the script wouldn’t work. Although this is not a real concern in your example, I suggest to take the good advice and double-quote it anyway.
The double-quoting of ".jpeg"
is a bit odd. You didn’t need to quote that.