Problem
I wrote the following code for the question I asked here. Are there any catastrophic mistakes? (actually it did not really work with perltex and I am in doubt about having syntax mistake(s))
Also, I read it about lists
, hashs
(still list vs hash
is a huge question mark in my mind) and forearch
to improve my script. I am kinda confused about the best way to combine them (actually my attempt define variables with a list is failed for now). I am sure, I will find my way, but it will be nicer quicker jump with some help. You can see how I repeat same lines again and again.
#!/usr/bin/perl
#
# Strict and warnings are recommended.
use strict;
use warnings;
no warnings "uninitialized"; # Don't warn about unitialized variables#
use Term::ANSIColor; #Some fun
# use 5.010;
# print colored ['bold blue'], "Hey I am a Perl,n";
# print "I can find all the informations you need for a specific reference from regular";
# print colored ['bold blue'], " bibtex ";
# print "filen";
my $in = 0; #logical to check am I in a good line or not
my $refPattern='Wang2013'; #which reference I am trying to read
# my $refPattern='test2013';
# my $refPattern='',"$_[0]",''; #which reference I am trying to read
my $filename = '/home/solak/WORKDIR/ARTICLE/allReferences.bib'; #bibtex file storing informations
foreach (@ARGV) {
$refPattern = $_;
};
# print "My variable $refPatternn";
#possible to improve with list or hash, find them with forearch etc. (did not work yet)
my $abstract ; #lesson:perl scope has crazy/sensible way
my $author ; #lesson:perl scope has crazy/sensible way
my $doi ; #lesson:perl scope has crazy/sensible way
my $issn ; #lesson:perl scope has crazy/sensible way
my $journal ; #lesson:perl scope has crazy/sensible way
my $keywords ; #lesson:perl scope has crazy/sensible way
my $localfile ; #lesson:perl scope has crazy/sensible way
my $note ; #lesson:perl scope has crazy/sensible way
my $number ; #lesson:perl scope has crazy/sensible way
my $pages ; #lesson:perl scope has crazy/sensible way
my $title ; #lesson:perl scope has crazy/sensible way
my $url ; #lesson:perl scope has crazy/sensible way
my $volume ; #lesson:perl scope has crazy/sensible way
my $year ; #lesson:perl scope has crazy/sensible way
# my %hash = ( 'abstract','author','journal','keywords','localfile','note','title','url');
my $line; #raw line
my $field; #field name
my $content_temp; #raw content, without trims
my $content ; #content for all fields, with trims
open(my $bibfile, '<:encoding(UTF-8)', $filename)
or die "Could not open file '$filename' $!";
while (<$bibfile>) { #remember perl default variable is in use, $_
$in = 1 if /$refPattern/i;
last if (/^$/ && $in == 1); #break the while on the first following empty line after $refPattern
if ($in == 1) {
$line = $_ if($in);
$field = substr($line, 1, index($line, ' = ')); #it finds bibtex vartype
$field =~ s/^s+|s+$//g; #trim both ends, may be not necessary, do not know how to check if there are spaces around the $field
print "$fieldn";
$line =~ s/^s*S+s*//; #split field and content, field is the first word always
$content_temp = $line;
$content_temp =~ s/^s+|s+$//g; #trim both ends
$content_temp =~ s/^s*S+s*//; #like split field and content, to delete = on the beginning
$content_temp =~ s/,z//; #trim last character "," if exist
next if (length($content_temp) < 1); #Check if you really read something
$content = substr($content_temp, 1, -1); #If you know the first and last character are quotes, use it, no need to regec
# print "$contentn";
$abstract = $content if ($field eq 'abstract');
$author = $content if ($field eq 'author');
$doi = $content if ($field eq 'doi');
$issn = $content if ($field eq 'issn');
$journal = $content if ($field eq 'journal');
$keywords = $content if ($field eq 'keywords');
$localfile = $content if ($field eq 'localfile');
$note = $content if ($field eq 'note');
$number = $content if ($field eq 'number');
$pages = $content if ($field eq 'pages');
$title = $content if ($field eq 'title');
$url = $content if ($field eq 'url');
$volume = $content if ($field eq 'volume');
$year = $content if ($field eq 'year');
# print "$titlen";
}
}
# print "$journaln";
# print "$contentn";
# print "$titlen";
my $filename = 'bibREF.tex';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh '%HEYY This is generated by perl.pl, do not touch it, if you need changes go to mother file!!! ',"n";
print $fh 'documentclass[12pt]{standalone} ',"n";
print $fh 'usepackage{amsmath} ',"n";
print $fh 'usepackage{amssymb} ',"n";
print $fh 'usepackage{tikz} ',"n";
print $fh 'usepackage{gnuplot-lua-tikz} ',"n";
print $fh 'usepackage[shell]{gnuplottex} ',"n";
print $fh 'usepackage[]{tcolorbox} ',"n";
print $fh 'usetikzlibrary{backgrounds,calc,positioning} ',"n";
print $fh 'thispagestyle{empty} ',"n";
print $fh ' ',"n";
print $fh 'begin{document} ',"n";
print $fh ' ',"n";
print $fh 'begin{tikzpicture}[inner sep=0pt] ',"n";
print $fh '% block/.style={draw,fill=white,rectangle, minimum width={width("Magnetometer")+2pt},font=small},',"n";
print $fh '% blockl/.style={draw,fill=white,rectangle, minimum width={400pt},font=large}]',"n";
print $fh 'tikzstyle{styBIBR} = [draw,fill=black!60,minimum height={30pt},rectangle,text width=5.5cm ,text centered,font=bf,text=white,font=huge];',"n";
print $fh 'tikzstyle{styYEAR} = [draw,fill=blue!60,minimum height={20pt},rectangle,text width=5.5cm ,text centered,font=bf,text=white];',"n";
print $fh 'tikzstyle{styTITLE} = [draw,,rectangle,minimum height={60pt},text width=11.5cm, ,font=bf, font=Large];',"n";
print $fh 'tikzstyle{styATHR} = [draw,,rectangle,minimum height={30pt},text width=17cm, , font=large];',"n";
print $fh 'tikzstyle{styJRNL} = [draw,,minimum height={30pt},rectangle ,text width=5.5cm ,text centered,text=black,font=small];',"n";
print $fh 'tikzstyle{styKWRD} = [draw,rectangle,minimum height={30pt},text width=17cm, ,font=bf, font=small];',"n";
print $fh 'tikzstyle{styLCLF} = [draw,rectangle,minimum height={10pt},text width=17cm, text=blue, font=footnotesize]; %local file',"n";
print $fh 'tikzstyle{styURL} = [draw,fill=blue!20,rectangle,minimum height={20pt},text width=11.5cm, ,font=bf, font=Large]; %local file',"n";
print $fh 'tikzstyle{styNOTES} = [draw,rectangle,minimum height={380pt},text width=17cm,];',"n";
print $fh 'tikzstyle{styTODO} = [draw,rectangle,minimum height={110pt},text width=17cm];',"n";
print $fh 'tikzstyle{styCTD} = [draw,rectangle,minimum height={90pt},text width=8.5cm];',"n";
print $fh 'draw [draw,use as bounding box] (0cm,0cm) rectangle (17cm, 25cm);',"n";
print $fh ' ',"n";
print $fh ' ',"n";
print $fh ' ',"n";
print $fh '%%%%%%%%%%%%%%% ',"n";
print $fh '%top group ',"n";
print $fh '%%%%%%%%%%%%%%% ',"n";
print $fh ' node[styTITLE,left=0pt of current bounding box.north west, ,anchor=north west](nodeTITLE) {',"$title",'}; ',"n";
print $fh ' node[styATHR,left=0pt of nodeTITLE.south west, anchor=north west](nodeATHR) {',"$refPattern",'}; ',"n";
print $fh ' node[styKWRD,left=0pt of nodeATHR.south, anchor=north](nodeKWRD) {Keywords: ',"$author",'}; ',"n";
print $fh ' node[styBIBR,left=0pt of nodeTITLE.north east, anchor=north west] (nodeBIBR) {',"$refPattern",'}; ',"n";
print $fh 'node[styJRNL,left=0pt of nodeBIBR.south, anchor=north] (nodeJRNL) {',"$journal",'}; ',"n";
print $fh ' ',"n";
print $fh ' ',"n";
print $fh '%%%%%%%%%%%%%%% ',"n";
print $fh '%bottom group ',"n";
print $fh '%%%%%%%%%%%%%%% ',"n";
print $fh '%local file link ',"n";
print $fh 'node[styLCLF,left=0pt of current bounding box.south west,anchor=south west](nodeLCLF) {localfile link: ',"$refPattern",'}; ',"n";
print $fh 'node[styCTD,left=0pt of nodeLCLF.north west,anchor=south west](nodeCTD) {textbf{Cited:} ',"$refPattern",' }; ',"n";
print $fh 'node[styCTD,left=0pt of nodeLCLF.north east,anchor=south east](nodeCTDBY) {textbf{Cited by:} ',"$refPattern",'}; ',"n";
print $fh 'node[styTODO,left=0pt of nodeCTD.north west,anchor=south west](nodeCTD) {textbf{TODO:} ',"$refPattern",' }; ',"n";
print $fh ' ',"n";
print $fh '%%%%%%%%%%%%%%% ',"n";
print $fh '%mid group ',"n";
print $fh '%%%%%%%%%%%%%%% ',"n";
print $fh ' node[styNOTES,below=0.pt of nodeKWRD](NOTES) {NOTES:}; ',"n";
print $fh ' ',"n";
print $fh 'end{tikzpicture} ',"n";
print $fh ' ',"n";
print $fh 'end{document} ',"n";
close $fh;
print "donen";
Solution
There are some nice things about this script: It seems to work, you’re using strict
, and how you open your files is very good (you even consider encoding!).
Unfortunately, it is extremely difficult to follow. Reasons for this are:
-
This program has experienced “organic growth”. There are unnecessary
print
commands for debugging everywhere, and commented out code has not been removed. That will be the first thing we’ll get rid of. -
You are not very experienced with Perl. Your logic is encoded in a roundabout way, and you are not aware of techniques for better abstraction and simplification.
-
You seem to love fixed-width lines, without any need. Sometimes it may be useful to vertically align related items, but you take this to absurd lengths.
Let’s start with your while (<$bibfile>)
loop. The control flow in this loop is:
for each line:
set $in if a pattern matches
break if we encounter an empty line
if $in is set:
set $line to current line if $in is set
parse the $line
next iteration if $content is empty
set the correct variable to the $content
The complete second part of the loop body is an if
. There is no else. This can be replaced by going to the next iteration if the if-condition is false. Both have the same effect of skipping over the rest of the loop body. Also, the line $line = $_ if $in
is unnecessary once we already know that $in == 1
.
We can get rid of the $in
variable, by splitting the while
loop into two loops. The first loop only searches for the marker pattern. The second loop does the parsing.
We can also get rid of the $foo = $content if $field eq "foo"
copy&paste-madness. You correctly noted that a hash makes sense here. We do not have to declare the fields of a hash up front. Instead, we declare an empty hash like my %hash
, and fill in fields like $hash{$field} = $content
.
Your parsing code can be drastically simplified by building a larger regex. Your syntax seems to be $field = "$content",
. This can be expressed by a regex with two capture groups:
/A s* (w+) s*[=]*s "(.+)" [,]? s*z/x
If that regex matches, the field name is in $1
and the contents in $2
.
Putting all of this together, we can rewrite your loop as
my $marker = qr/$rePattern/i;
my %fields;
# find the marker
while (<$bibfile>) {
if (/$marker/) {
parse_line($_, %fields);
last;
}
}
while (my $line = <$bibfile>) {
last if not $line =~ /S/;
parse_line($line, %fields);
}
sub parse_line {
my ($line, $fields) = @_;
return if not $line =~ /A s* (w+) s*[=]*s "(.+)" [,]? s*z/x;
my ($field, $contents) = ($1, $2);
$fields->{$field} = $contents;
return;
}
In the above code I did two things that may be new to you:
-
I declared a subroutine called
parse_line
, which conveniently encapsulates all the line parsing logic. This is good because the first line is parsed separately from the other lines, and I didn’t want to repeat all the code. Theparse_line
subroutine takes two arguments: The line and the fields. -
While Perl subroutines cannot take a hash as argument, they can take a reference to a hash as argument. This is similar to pointers in C, except that Perl uses the
backslash operator to get a reference to something. Because we are dealing with a reference and not with a hash, we don’t access the field as
$hash{$field}
but as$hashref->{$field}
.
Now since we store all the fields in a hash, we don’t have variables like $title
any more. This is no problem, since we can substitute that with $fields{title}
in the remainder of your code.
The rest of your code is just a template. However, you print each line out separately, and explicitly add newlines. If you use 5.010
or use feature 'say'
, we unlock the say
function that behaves exactly like print
, but adds a newline to the end of its arguments. In this case, I’d prefer to underline the template character of this code instead, by defining a mini template language: s/[$][{](w+)[}]/$fields{$1}/eg
. That substitution replaces any word in braces preceded by a dollar sign by the contents of the field of the same name in %fields
.
The template itself can be stored in the __DATA__
section of the script. Everything after the __DATA__
marker is not interpreted as Perl code, but is available via the special file handle DATA
.
Implementing that might look like this:
# fill additional fields
$fields{refPattern} = $refPattern;
# process template
my $template = join '', <DATA>;
$template =~ s/[$][{](w+)[}]/$fields{$1}/eg;
# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;
__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
documentclass[12pt]{standalone}
usepackage{amsmath}
usepackage{amssymb}
usepackage{tikz}
usepackage{gnuplot-lua-tikz}
usepackage[shell]{gnuplottex}
usepackage[]{tcolorbox}
usetikzlibrary{backgrounds,calc,positioning}
thispagestyle{empty}
begin{document}
begin{tikzpicture}[inner sep=0pt]
tikzstyle{styBIBR} = [draw,fill=black!60,minimum height={30pt},rectangle,text width=5.5cm,text centered,font=bf,text=white,font=huge];
tikzstyle{styYEAR} = [draw,fill=blue!60,minimum height={20pt},rectangle,text width=5.5cm,text centered,font=bf,text=white];
tikzstyle{styTITLE}= [draw,,rectangle,minimum height={60pt},text width=11.5cm,,font=bf, font=Large];
tikzstyle{styATHR} = [draw,,rectangle,minimum height={30pt},text width=17cm,,font=large];
tikzstyle{styJRNL} = [draw,,minimum height={30pt},rectangle,text width=5.5cm,text centered,text=black,font=small];
tikzstyle{styKWRD} = [draw,rectangle,minimum height={30pt},text width=17cm,,font=bf, font=small];
tikzstyle{styLCLF} = [draw,rectangle,minimum height={10pt},text width=17cm,text=blue, font=footnotesize]; %local file
tikzstyle{styURL} = [draw,fill=blue!20,rectangle,minimum height={20pt},text width=11.5cm,,font=bf, font=Large]; %local file
tikzstyle{styNOTES}= [draw,rectangle,minimum height={380pt},text width=17cm,];
tikzstyle{styTODO} = [draw,rectangle,minimum height={110pt},text width=17cm];
tikzstyle{styCTD} = [draw,rectangle,minimum height={90pt},text width=8.5cm];
draw [draw,use as bounding box] (0cm,0cm) rectangle (17cm, 25cm);
%%%%%%%%%%%%%%%
% top group
%%%%%%%%%%%%%%%
node[styTITLE,left=0pt of current bounding box.north west, ,anchor=north west](nodeTITLE) {${title}};
node[styATHR,left=0pt of nodeTITLE.south west, anchor=north west](nodeATHR) {${refPattern}};
node[styKWRD,left=0pt of nodeATHR.south, anchor=north](nodeKWRD) {Keywords: ${author}};
node[styBIBR,left=0pt of nodeTITLE.north east, anchor=north west] (nodeBIBR) {${refPattern}};
node[styJRNL,left=0pt of nodeBIBR.south, anchor=north] (nodeJRNL) {${journal};
%%%%%%%%%%%%%%%
% bottom group
%%%%%%%%%%%%%%%
%local file link
node[styLCLF,left=0pt of current bounding box.south west,anchor=south west](nodeLCLF) {localfile link: ${refPattern};
node[styCTD,left=0pt of nodeLCLF.north west,anchor=south west](nodeCTD) {textbf{Cited:} ${refPattern} };
node[styCTD,left=0pt of nodeLCLF.north east,anchor=south east](nodeCTDBY) {textbf{Cited by:} ${refPattern};
node[styTODO,left=0pt of nodeCTD.north west,anchor=south west](nodeCTD) {textbf{TODO:} ${refPattern} };
%%%%%%%%%%%%%%%
% mid group
%%%%%%%%%%%%%%%
node[styNOTES,below=0.pt of nodeKWRD](NOTES) {NOTES:};
end{tikzpicture}
end{document}
Where can we go from here? We could make the script more Unix-y. Instead of opening specific files for the input and output, we could simply use STDIN
and STDOUT
instead. We can then use shell redirection to specify the files:
$ perl bibreport.pl 'some pattern' <~/WORKDIR/ARTICLE/allReferences.bib >bibREF.tex
This is good because now we don’t have to edit the script just to change the filenames, and because we can use our script more easily together with other tools.
Oh, and if you want to take arguments from the command line, you don’t need a loop like
my $refPattern = "default";
foreach (@ARGV) {
$refPattern = $_;
}
– that loops through all arguments just to assign the last argument, or to use the default value if no args were provided. This can be abbreviated to
my $refPattern = (@ARGV) ? shift @ARGV : "default";
# or:
my $refPattern = "default";
$refPattern = shift @ARGV if @ARGV;
(shift
removes and returns the first element of an array. To make this code behave equivalent to your original code, use pop
which removes and returns the last element).