Automatic report generation from BibTeX

Posted on

Problem

I wrote the following code for the question I asked here. Are there any catastrophic mistakes? (actually it did not really work with perltex and I am in doubt about having syntax mistake(s))

Also, I read it about lists, hashs (still list vs hash is a huge question mark in my mind) and forearch to improve my script. I am kinda confused about the best way to combine them (actually my attempt define variables with a list is failed for now). I am sure, I will find my way, but it will be nicer quicker jump with some help. You can see how I repeat same lines again and again.

#!/usr/bin/perl 
#
# Strict and warnings are recommended.
use strict;
use warnings;
  no warnings "uninitialized";  # Don't warn about unitialized variables#
use Term::ANSIColor; #Some fun
# use 5.010;

# print colored ['bold blue'], "Hey I am a Perl,n";
# print "I can find all the informations you need for a specific reference from regular";
# print colored ['bold blue'], " bibtex ";
# print "filen";


my $in = 0;                                                  #logical to check am I in a good line or not
my $refPattern='Wang2013';                                   #which reference I am trying to read
# my $refPattern='test2013'; 
# my $refPattern='',"$_[0]",'';                                   #which reference I am trying to read
my $filename = '/home/solak/WORKDIR/ARTICLE/allReferences.bib';                                #bibtex file storing informations

foreach (@ARGV) {
    $refPattern = $_;
};

# print "My variable $refPatternn";


#possible to improve with list or hash, find them with forearch etc. (did not work yet)
my $abstract  ;                                  #lesson:perl scope has crazy/sensible way
my $author    ;                                  #lesson:perl scope has crazy/sensible way
my $doi       ;                                  #lesson:perl scope has crazy/sensible way
my $issn      ;                                  #lesson:perl scope has crazy/sensible way
my $journal   ;                                  #lesson:perl scope has crazy/sensible way
my $keywords  ;                                  #lesson:perl scope has crazy/sensible way
my $localfile ;                                  #lesson:perl scope has crazy/sensible way
my $note      ;                                  #lesson:perl scope has crazy/sensible way
my $number    ;                                  #lesson:perl scope has crazy/sensible way
my $pages     ;                                  #lesson:perl scope has crazy/sensible way
my $title     ;                                  #lesson:perl scope has crazy/sensible way
my $url       ;                                  #lesson:perl scope has crazy/sensible way
my $volume    ;                                  #lesson:perl scope has crazy/sensible way
my $year      ;                                  #lesson:perl scope has crazy/sensible way
# my %hash = ( 'abstract','author','journal','keywords','localfile','note','title','url');

my $line;                                        #raw line
my $field;                                       #field name
my $content_temp;                                #raw content, without trims
my $content   ;                                  #content for all fields, with trims


open(my $bibfile, '<:encoding(UTF-8)', $filename)
  or die "Could not open file '$filename' $!";

while (<$bibfile>) {                                         #remember perl default variable is in use, $_
    $in = 1 if /$refPattern/i;
    last if (/^$/ && $in == 1);                              #break the while on the first following empty line after $refPattern
    if ($in == 1) {
      $line = $_ if($in);
      $field = substr($line, 1, index($line, ' = '));        #it finds bibtex vartype
      $field =~ s/^s+|s+$//g;                       #trim both ends, may be not necessary, do not know how to check if there are spaces around the $field
      print "$fieldn";
      $line =~ s/^s*S+s*//;                               #split field and content, field is the first word always
      $content_temp = $line; 
      $content_temp =~ s/^s+|s+$//g;                       #trim both ends
      $content_temp =~ s/^s*S+s*//;                       #like split field and content, to delete = on the beginning
      $content_temp =~ s/,z//;                              #trim last character "," if exist
      next if (length($content_temp) < 1);                   #Check if you really read something
      $content = substr($content_temp, 1, -1);               #If you know the first and last character are quotes, use it, no need to regec



#       print "$contentn";
      $abstract  = $content if ($field eq 'abstract');
      $author    = $content if ($field eq 'author');
      $doi       = $content if ($field eq 'doi');
      $issn      = $content if ($field eq 'issn');
      $journal   = $content if ($field eq 'journal');
      $keywords  = $content if ($field eq 'keywords');
      $localfile = $content if ($field eq 'localfile');
      $note      = $content if ($field eq 'note');
      $number    = $content if ($field eq 'number');
      $pages     = $content if ($field eq 'pages');
      $title     = $content if ($field eq 'title');
      $url       = $content if ($field eq 'url');
      $volume    = $content if ($field eq 'volume');
      $year      = $content if ($field eq 'year');           
#       print "$titlen";
    }                          
}

# print "$journaln";
# print "$contentn";
# print "$titlen";

my $filename = 'bibREF.tex';
open(my $fh, '>', $filename) or die "Could not open file '$filename' $!";
print $fh '%HEYY This is generated by perl.pl, do not touch it, if you need changes go to mother file!!!                                                                             ',"n";                                                                                                                                   
print $fh 'documentclass[12pt]{standalone}                                                                                                                                          ',"n";
print $fh 'usepackage{amsmath}                                                                                                                                                      ',"n";
print $fh 'usepackage{amssymb}                                                                                                                                                      ',"n";
print $fh 'usepackage{tikz}                                                                                                                                                         ',"n";
print $fh 'usepackage{gnuplot-lua-tikz}                                                                                                                                             ',"n";
print $fh 'usepackage[shell]{gnuplottex}                                                                                                                                            ',"n";
print $fh 'usepackage[]{tcolorbox}                                                                                                                                                  ',"n";
print $fh 'usetikzlibrary{backgrounds,calc,positioning}                                                                                                                             ',"n";
print $fh 'thispagestyle{empty}                                                                                                                                                     ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh 'begin{document}                                                                                                                                                          ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh 'begin{tikzpicture}[inner sep=0pt]                                                                                                                                        ',"n";
print $fh '% block/.style={draw,fill=white,rectangle, minimum width={width("Magnetometer")+2pt},font=small},',"n";
print $fh '% blockl/.style={draw,fill=white,rectangle, minimum width={400pt},font=large}]',"n";
print $fh 'tikzstyle{styBIBR}  = [draw,fill=black!60,minimum height={30pt},rectangle,text width=5.5cm   ,text centered,font=bf,text=white,font=huge];',"n";
print $fh 'tikzstyle{styYEAR}  = [draw,fill=blue!60,minimum height={20pt},rectangle,text width=5.5cm   ,text centered,font=bf,text=white];',"n";
print $fh 'tikzstyle{styTITLE}  = [draw,,rectangle,minimum height={60pt},text width=11.5cm,                            ,font=bf, font=Large];',"n";
print $fh 'tikzstyle{styATHR}  = [draw,,rectangle,minimum height={30pt},text width=17cm,                            , font=large];',"n";
print $fh 'tikzstyle{styJRNL}  = [draw,,minimum height={30pt},rectangle                       ,text width=5.5cm   ,text centered,text=black,font=small];',"n";
print $fh 'tikzstyle{styKWRD}  = [draw,rectangle,minimum height={30pt},text width=17cm,                            ,font=bf, font=small];',"n";
print $fh 'tikzstyle{styLCLF}  = [draw,rectangle,minimum height={10pt},text width=17cm,                             text=blue, font=footnotesize];  %local file',"n";
print $fh 'tikzstyle{styURL}  = [draw,fill=blue!20,rectangle,minimum height={20pt},text width=11.5cm,                            ,font=bf, font=Large];  %local file',"n";
print $fh 'tikzstyle{styNOTES} = [draw,rectangle,minimum height={380pt},text width=17cm,];',"n";
print $fh 'tikzstyle{styTODO} = [draw,rectangle,minimum height={110pt},text width=17cm];',"n";
print $fh 'tikzstyle{styCTD} = [draw,rectangle,minimum height={90pt},text width=8.5cm];',"n";
print $fh 'draw [draw,use as bounding box] (0cm,0cm) rectangle (17cm, 25cm);',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh '%%%%%%%%%%%%%%%                                                                                                                                                           ',"n";
print $fh '%top group                                                                                                                                                                ',"n";
print $fh '%%%%%%%%%%%%%%%                                                                                                                                                           ',"n";
print $fh '  node[styTITLE,left=0pt of current bounding box.north west, ,anchor=north west](nodeTITLE) {',"$title",'};                                                              ',"n";
print $fh '  node[styATHR,left=0pt of nodeTITLE.south west, anchor=north west](nodeATHR) {',"$refPattern",'};                                                                       ',"n";
print $fh '  node[styKWRD,left=0pt of nodeATHR.south, anchor=north](nodeKWRD) {Keywords: ',"$author",'};                                                                            ',"n";
print $fh '  node[styBIBR,left=0pt of nodeTITLE.north east, anchor=north west] (nodeBIBR) {',"$refPattern",'};                                                                      ',"n";
print $fh 'node[styJRNL,left=0pt of nodeBIBR.south, anchor=north] (nodeJRNL) {',"$journal",'};                                                                                      ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh '%%%%%%%%%%%%%%%                                                                                                                                                           ',"n";
print $fh '%bottom group                                                                                                                                                             ',"n";
print $fh '%%%%%%%%%%%%%%%                                                                                                                                                           ',"n";
print $fh '%local file link                                                                                                                                                          ',"n";
print $fh 'node[styLCLF,left=0pt of current bounding box.south west,anchor=south west](nodeLCLF) {localfile link: ',"$refPattern",'};                                               ',"n";
print $fh 'node[styCTD,left=0pt of nodeLCLF.north west,anchor=south west](nodeCTD) {textbf{Cited:} ',"$refPattern",' };                                                            ',"n";
print $fh 'node[styCTD,left=0pt of nodeLCLF.north east,anchor=south east](nodeCTDBY) {textbf{Cited by:} ',"$refPattern",'};                                                        ',"n";
print $fh 'node[styTODO,left=0pt of nodeCTD.north west,anchor=south west](nodeCTD) {textbf{TODO:} ',"$refPattern",' };                                                             ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh '%%%%%%%%%%%%%%%                                                                                                                                                           ',"n";
print $fh '%mid group                                                                                                                                                                ',"n";
print $fh '%%%%%%%%%%%%%%%                                                                                                                                                           ',"n";
print $fh ' node[styNOTES,below=0.pt of nodeKWRD](NOTES) {NOTES:};                                                                                                                  ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh 'end{tikzpicture}                                                                                                                                                         ',"n";
print $fh '                                                                                                                                                                          ',"n";
print $fh 'end{document}                                                                                                                                                            ',"n";

close $fh;                                                                                                                                    
print "donen";

Solution

There are some nice things about this script: It seems to work, you’re using strict, and how you open your files is very good (you even consider encoding!).

Unfortunately, it is extremely difficult to follow. Reasons for this are:

  • This program has experienced “organic growth”. There are unnecessary print commands for debugging everywhere, and commented out code has not been removed. That will be the first thing we’ll get rid of.

  • You are not very experienced with Perl. Your logic is encoded in a roundabout way, and you are not aware of techniques for better abstraction and simplification.

  • You seem to love fixed-width lines, without any need. Sometimes it may be useful to vertically align related items, but you take this to absurd lengths.

Let’s start with your while (<$bibfile>) loop. The control flow in this loop is:

for each line:
  set $in if a pattern matches
  break if we encounter an empty line
  if $in is set:
    set $line to current line if $in is set
    parse the $line
    next iteration if $content is empty
    set the correct variable to the $content

The complete second part of the loop body is an if. There is no else. This can be replaced by going to the next iteration if the if-condition is false. Both have the same effect of skipping over the rest of the loop body. Also, the line $line = $_ if $in is unnecessary once we already know that $in == 1.

We can get rid of the $in variable, by splitting the while loop into two loops. The first loop only searches for the marker pattern. The second loop does the parsing.

We can also get rid of the $foo = $content if $field eq "foo" copy&paste-madness. You correctly noted that a hash makes sense here. We do not have to declare the fields of a hash up front. Instead, we declare an empty hash like my %hash, and fill in fields like $hash{$field} = $content.

Your parsing code can be drastically simplified by building a larger regex. Your syntax seems to be $field = "$content",. This can be expressed by a regex with two capture groups:

/A s* (w+) s*[=]*s "(.+)" [,]? s*z/x

If that regex matches, the field name is in $1 and the contents in $2.

Putting all of this together, we can rewrite your loop as

my $marker = qr/$rePattern/i;
my %fields;

# find the marker
while (<$bibfile>) {
  if (/$marker/) {
     parse_line($_, %fields);
     last;
  }
}

while (my $line = <$bibfile>) {
  last if not $line =~ /S/;
  parse_line($line, %fields);
}

sub parse_line {
  my ($line, $fields) = @_;
  return if not $line =~ /A s* (w+) s*[=]*s "(.+)" [,]? s*z/x;
  my ($field, $contents) = ($1, $2);
  $fields->{$field} = $contents;
  return;
}

In the above code I did two things that may be new to you:

  1. I declared a subroutine called parse_line, which conveniently encapsulates all the line parsing logic. This is good because the first line is parsed separately from the other lines, and I didn’t want to repeat all the code. The parse_line subroutine takes two arguments: The line and the fields.

  2. While Perl subroutines cannot take a hash as argument, they can take a reference to a hash as argument. This is similar to pointers in C, except that Perl uses the backslash operator to get a reference to something. Because we are dealing with a reference and not with a hash, we don’t access the field as $hash{$field} but as $hashref->{$field}.

Now since we store all the fields in a hash, we don’t have variables like $title any more. This is no problem, since we can substitute that with $fields{title} in the remainder of your code.

The rest of your code is just a template. However, you print each line out separately, and explicitly add newlines. If you use 5.010 or use feature 'say', we unlock the say function that behaves exactly like print, but adds a newline to the end of its arguments. In this case, I’d prefer to underline the template character of this code instead, by defining a mini template language: s/[$][{](w+)[}]/$fields{$1}/eg. That substitution replaces any word in braces preceded by a dollar sign by the contents of the field of the same name in %fields.

The template itself can be stored in the __DATA__ section of the script. Everything after the __DATA__ marker is not interpreted as Perl code, but is available via the special file handle DATA.

Implementing that might look like this:

# fill additional fields
$fields{refPattern} = $refPattern;

# process template
my $template = join '', <DATA>;
$template =~ s/[$][{](w+)[}]/$fields{$1}/eg;

# output result
open my $fh, '>', $filename or die "Could not open file '$filename': $!";
print $fh $template;

__DATA__
%HEY This is generated by perl.pl, do not touch it, if you need changes go to mother file!
documentclass[12pt]{standalone}
usepackage{amsmath}
usepackage{amssymb}
usepackage{tikz}
usepackage{gnuplot-lua-tikz}
usepackage[shell]{gnuplottex}
usepackage[]{tcolorbox}
usetikzlibrary{backgrounds,calc,positioning}
thispagestyle{empty}

begin{document}

begin{tikzpicture}[inner sep=0pt]
    tikzstyle{styBIBR} = [draw,fill=black!60,minimum height={30pt},rectangle,text width=5.5cm,text centered,font=bf,text=white,font=huge];
    tikzstyle{styYEAR} = [draw,fill=blue!60,minimum height={20pt},rectangle,text width=5.5cm,text centered,font=bf,text=white];
    tikzstyle{styTITLE}= [draw,,rectangle,minimum height={60pt},text width=11.5cm,,font=bf, font=Large];
    tikzstyle{styATHR} = [draw,,rectangle,minimum height={30pt},text width=17cm,,font=large];
    tikzstyle{styJRNL} = [draw,,minimum height={30pt},rectangle,text width=5.5cm,text centered,text=black,font=small];
    tikzstyle{styKWRD} = [draw,rectangle,minimum height={30pt},text width=17cm,,font=bf, font=small];
    tikzstyle{styLCLF} = [draw,rectangle,minimum height={10pt},text width=17cm,text=blue, font=footnotesize];  %local file
    tikzstyle{styURL}  = [draw,fill=blue!20,rectangle,minimum height={20pt},text width=11.5cm,,font=bf, font=Large];  %local file
    tikzstyle{styNOTES}= [draw,rectangle,minimum height={380pt},text width=17cm,];
    tikzstyle{styTODO} = [draw,rectangle,minimum height={110pt},text width=17cm];
    tikzstyle{styCTD}  = [draw,rectangle,minimum height={90pt},text width=8.5cm];
    draw [draw,use as bounding box] (0cm,0cm) rectangle (17cm, 25cm);

    %%%%%%%%%%%%%%%
    % top group
    %%%%%%%%%%%%%%%
    node[styTITLE,left=0pt of current bounding box.north west, ,anchor=north west](nodeTITLE) {${title}};
    node[styATHR,left=0pt of nodeTITLE.south west, anchor=north west](nodeATHR) {${refPattern}};
    node[styKWRD,left=0pt of nodeATHR.south, anchor=north](nodeKWRD) {Keywords: ${author}};
    node[styBIBR,left=0pt of nodeTITLE.north east, anchor=north west] (nodeBIBR) {${refPattern}};
    node[styJRNL,left=0pt of nodeBIBR.south, anchor=north] (nodeJRNL) {${journal};

    %%%%%%%%%%%%%%%
    % bottom group
    %%%%%%%%%%%%%%%
    %local file link
    node[styLCLF,left=0pt of current bounding box.south west,anchor=south west](nodeLCLF) {localfile link: ${refPattern};
    node[styCTD,left=0pt of nodeLCLF.north west,anchor=south west](nodeCTD) {textbf{Cited:} ${refPattern} };
    node[styCTD,left=0pt of nodeLCLF.north east,anchor=south east](nodeCTDBY) {textbf{Cited by:} ${refPattern};
    node[styTODO,left=0pt of nodeCTD.north west,anchor=south west](nodeCTD) {textbf{TODO:} ${refPattern} };

    %%%%%%%%%%%%%%%
    % mid group
    %%%%%%%%%%%%%%%
    node[styNOTES,below=0.pt of nodeKWRD](NOTES) {NOTES:};
end{tikzpicture}

end{document}

Where can we go from here? We could make the script more Unix-y. Instead of opening specific files for the input and output, we could simply use STDIN and STDOUT instead. We can then use shell redirection to specify the files:

$ perl bibreport.pl 'some pattern' <~/WORKDIR/ARTICLE/allReferences.bib >bibREF.tex

This is good because now we don’t have to edit the script just to change the filenames, and because we can use our script more easily together with other tools.

Oh, and if you want to take arguments from the command line, you don’t need a loop like

my $refPattern = "default";
foreach (@ARGV) {
  $refPattern = $_;
}

– that loops through all arguments just to assign the last argument, or to use the default value if no args were provided. This can be abbreviated to

my $refPattern = (@ARGV) ? shift @ARGV : "default";
# or:
my $refPattern = "default";
$refPattern = shift @ARGV if @ARGV;

(shift removes and returns the first element of an array. To make this code behave equivalent to your original code, use pop which removes and returns the last element).

Leave a Reply

Your email address will not be published. Required fields are marked *