Problem
This Question now has a follow-up question
After learning a lot about applying OO principles, interfaces and a bit about argument-handling I post a revised version of my previous code. This is still an exercise for me to improve my general coding for a simple program. As such I would like to ask to pay special attention towards the application of OO programming principles and Exception handling, because those are the areas I feel weakest at.
The code is tested and – just like before – works properly.
Here a quote of the still relevant sections of the previous question:
The task itself is to go through a large .txt file that contains lines with “>” as their first character and to remove all spaces in these lines. I am using this program to modify files in FASTA format, a format often used in biology.
Algorithm:
- Read next line from input file and store in “line”
- If “line” contains “>” as first character, remove all spaces in “line”
- Print “line” to output file
- If next line is not null, go back to Step 1
Step 2 was changed from “If line contains “>” ” to “If Line contains “>” as first character”, since that reduces the number of comparisons the if-statement has to make – I think.
Code structure
The code structure changed a lot compared to last time and is now a lot more complex, which is why it has its own section.
Argument parsing is no longer handled by ArgumentHandler, which was split into a class ArgumentCollection
– which does the argument parsing now – and a class ArgumentHandler
– which triggers the argument parsing and contains the code that slowy previously suggested putting into a createArgumentHandler()
method. ArgumentCollection
implements an interface ArgumentManager
that provides a bunch of method to manipulate, set and test arguments of the various flags. ArgumentHandler
now also implements an Interface Configuration
that provides the methods getSourceFile()
and getSinkFile()
that returns the files to read from/write to. For the sake of some semblance of brevity ‘ArgumentCollection’ is not shown (It has over 250 lines of code).
Besides Configuration
, the Client also works with the classes ‘FileLineReader’ , and ‘FileLineWriter’ (both implementing AutoCloseable
) , which have the methods readLine()
/writeLine()
.
The Client
public interface Configuration {
/**
* Returns the name and path of a file to read from in form of a String.
*/
String getSourceFile();
/**
* Returns the name and path of a file to write to in form of a String.
*/
String getSinkFile();
/** Prints all arguments */
void printArguments();
}
public class RemoveSpaces_Client {
private static void printProgramProgress(int i) {
if (i % 1000000 == 0) {
System.out.println(i / 1000000 + " * 10^6 lines written.");
}
}
public static void main(String[] args) {
Configuration arguments = new ArgumentHandler(args);
System.out.println("Starting Program with the following arguments: ");
arguments.printArguments();
try (FileLineReader inputReader = new FileLineReader(arguments.getSourceFile());
FileLineWriter outputWriter = new FileLineWriter(arguments.getSinkFile());) {
/*
* Write every line from input file to output file. If the line is a
* name (contains ">"), remove all spaces in it before writing.
*/
int i = 0;
for (String line = inputReader.readLine(); line != null; line = inputReader.readLine()) {
if (line.charAt(0) == '>') {
line = line.replace(" ", "");
}
outputWriter.writeLine(line);
printProgramProgress(++i);
}
} catch (IOException e) {
System.out.println("Reader or Writer caused an IOException!");
e.printStackTrace();
throw new UncheckedIOException(e);
}
System.out.println("Finished!");
}
}
FileLineReader
public class FileLineReader implements AutoCloseable {
private BufferedReader reader;
public FileLineReader(String filename) throws FileNotFoundException {
this.reader = new BufferedReader(new FileReader(filename));
}
public String readLine() throws IOException {
return reader.readLine();
}
public void close() throws IOException {
reader.close();
}
}
FileLineWriter
public class FileLineWriter implements AutoCloseable {
private Writer writer;
public FileLineWriter(String filename) throws IOException {
try {
this.writer = new BufferedWriter(new FileWriter(filename));
} catch (IOException e) {
System.out.println(
"Output file name " + filename + " was not accessible or could not be created. Printing to "
+ (filename + ".nospace.txt instead"));
this.writer = new BufferedWriter(new FileWriter(filename + ".nospace.txt"));
}
}
public void writeLine(String line) throws IOException {
this.writer.write(line);
this.writer.write(System.lineSeparator());
}
public void close() throws IOException {
this.writer.close();
}
}
ArgumentHandler
public class ArgumentHandler implements Configuration {
private ArgumentManager arguments;
ArgumentHandler(String[] args) {
/*
* Define all allowed Flags and their associated arguments in an
* ArgumentCollection. If possible, assign arguments their default
* values. "null" are arguments that must be specified through parsing
* of args or the program can't run. "" are arguments that are optional
* or whose default value depends on an argument whose value needs to be
* parsed.
*/
String[] argumentList = { "-i", null, "-o", "", "-h", "Text to display if -h is called" };
arguments = new ArgumentCollectionImplementation1(argumentList);
arguments.parseArguments(args);
/*
* If flag "-o" does not have an argument associated with it after
* argument parsing, set its argument to its default value.
*/
if (!arguments.flagHasArguments("-o")) {
arguments.setFlagArgument("-o", 0, arguments.getFlagStringArgument("-i") + ".nospace.txt");
}
}
public String getSourceFile() {
return arguments.getFlagStringArgument("-i");
}
public String getSinkFile() {
return arguments.getFlagStringArgument("-o");
}
public void printArguments() {
arguments.printArguments();
}
}
Final Comment
One major thing this definitely taught me – If you can avoid it, don’t code easy tasks in java. This program totals over 300 lines of code (not counting comments and package/import statements) to do something, as Vogel612 pointed out, that you can do in 1 line in sed in a linux-terminal.
Solution
I’ll handle the big thing first.
Over-Abstraction
Your current code is quite a bit over-engineered (the rest of the answer elaborates on that). In my opinion, if something is meant to be used for only a very specific purpose, there is no reason to make it more general than it needs to be, adding complexity and reducing maintainability in the process. Take my word for it – I learnt it the hard way.
In your case, BufferedReader
is already sufficiently general for your needs – why bother implementing a class which just wraps it within another layer of indirection without providing any different or extra functionality?
BufferedReader
and BufferedWriter
implement AutoCloseable
, so it’s noise to abstract that behind classes which do not add significant functionality. (Yeah, I’m mostly an FP programmer now so I don’t really care about what OOP purists say about decoupling interface from implementation, etc., etc.). But, you know, those OOP purists are (mostly) right. So what are we missing here?
The Java Standard Library
PrintWriter
to the rescue!
However, there’s an easier way out. You want PrintWriter
. It’ll completely, utterly replace FileLineWriter
(of course it also implements AutoCloseable
and is buffered). You can even use outputWriter.println(...)
!
You want this constructor: public PrintWriter(File file, String csn)
. Why the csn
bit?
Always specify the output charset!
csn
is the output charset name. You want it, if, say, you move this code from macOS to Windows and the default charset changes from say MacRoman to CP-1252 or whatever. Don’t make your code essentially single-platform when it’s Java. Java will choose the default charset when outputting, so… you’ll probably want "UTF-8"
for csn
.
Interface-Implementation design
getSourceFile
and getSinkFile
should return java.io.File
(as their names indicate). You probably don’t want them to return String
s containing the file path (principle of least surprise). If you do, name them appropriately (getSourceFilePath
, etc.).
ArgumentHandler
should probably be a private static
inner class of RemoveSpaces_Client
, as it is specific to it (if otherwise, justify in the documentation, and name ArgumentHandler
more specifically (IOArgumentsHandler
should be enough, a help function can be assumed to be common)).
Core code:
Yes, you’re very much right that checking the initial character directly is faster (in fact, much faster). Just in case, here’s why:
contains
would have iterated the String
in order, and since the character in question is constrained by the format to always be at the start, the loop would have run for only 1 iteration before returning if the String
did indeed start with the character in question, but it would have run till the end if it didn’t.
When you’re measuring file sizes in millions of lines, these things add up. I project about a n× performance improvement on average, if each line had n characters.
Also, if you’re going to be using the source/sink
terminology, try to be consistent about it – the names should be sourceReader
and sinkwriter
.
I’d prefer a while
loop for this case, as “loop while progression condition is satisfied” is more like a while
loop. Of course, there’s nothing wrong with using a for
loop, it’s just a matter of personal preference. I use whichever is convenient under the circumstances. However, as always, better to be consistent.
Note:
Where is UncheckedIOException
implemented? I suspect it’s a subclass of RuntimeException
encapsulating the parent IOException
in its cause
field, and the message associated with the parent in its message
field.
Edit: I have since realised thanks to the comments that UncheckedIOException
is very much part of the library, and I retract my below statement and appreciate the OP simulating the logging of the occurrence of an exception.
However, I still think that the entire wrap-and-rethrow business seems convoluted in this context, however, it would benefit in the long run by reducing the burden on consumers of the OP’s API by not enforcing the throws
clause in all their method signatures. This is now an appreciated effort.
Then, if you’re going to notify the user that an exception has occurred, print a stack trace, and exit, why bother catching the exception at all and wrapping it? The default JVM uncaught exception handler does exactly what you do manually, minus a vague error message printed to STDOUT
.
Syntax
Nitpick: To keep in line with standard Java camelCase identifiers, RemoveSpaces_Client
should be named RemoveSpacesClient
, without the underscore.
All together now
A grand total of 89 lines (that’s not considering your supporting API, though; argument parsing in Java can be a pain – use List
s and Map
s judiciously to make your job easier), without much lost functionality:
RemoveSpacesClient.java
import java.io.*;
public class RemoveSpacesClient {
private static class ArgumentHandler implements Configuration {
private ArgumentManager arguments;
ArgumentHandler(String[] args) {
/*
* Define all allowed Flags and their associated arguments in an
* ArgumentCollection. If possible, assign arguments their default
* values. "null" are arguments that must be specified through parsing
* of args or the program can't run. "" are arguments who are optional
* or whose default value depends on an argument whose value needs to be
* parsed.
*/
String[] argumentList = { "-i", null, "-o", "", "-h", "Text to display if -h is called" };
arguments = new ArgumentCollectionImplementation1(argumentList);
arguments.parseArguments(args);
/*
* If flag "-o" does not have an argument associated with it after
* argument parsing, set its argument to its default value.
*/
if (!arguments.flagHasArguments("-o")) {
arguments.setFlagArgument("-o", 0, arguments.getFlagStringArgument("-i") + ".nospace.txt");
}
}
public File getSourceFile() {
return new File(arguments.getFlagStringArgument("-i"));
}
public File getSinkFile() {
return new File(arguments.getFlagStringArgument("-o"));
}
public void printArguments() {
arguments.printArguments();
}
}
private static void printProgramProgress(int i) {
// You can use underscores to group digits in numeric literals since Java 7
if (i % 1_000_000 == 0) {
System.out.println(i / 1_000_000 + " * 10^6 lines written.");
}
}
public static void main(String[] args) {
Configuration arguments = new ArgumentHandler(args);
System.out.println("Starting Program with the following arguments: ");
arguments.printArguments();
try (BufferedReader sourceReader = new BufferedReader(new FileReader(arguments.getSourceFile()));
PrintWriter sinkWriter = new PrintWriter(arguments.getSinkFile(), "UTF-8");) {
/*
* Write every line from input file to output file. If the line is a
* name (contains ">"), remove all spaces in it before writing.
*/
int i = 0;
for (String line = sourceReader.readLine(); line != null; line = sourceReader.readLine()) {
if (line.charAt(0) == '>') {
line = line.replace(" ", "");
}
sinkWriter.println(line);
printProgramProgress(++i);
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
System.out.println("Finished!");
}
}
Configuration.java
import java.io.File;
public interface Configuration {
/**
* Returns the name and path of a file to read from in form of a String.
*/
File getSourceFile();
/**
* Returns the name and path of a file to write to in form of a String.
*/
File getSinkFile();
/** Prints all arguments */
void printArguments();
}
Philosophy lessons with slowy
a priori: OP and me talked about the the solution in the initial question, let me explain why he came up with this solution.
a prior II: I call it philosophy lesson, because it’s less of a programming lesson 😉
I must highly disagree with the “over-engineered” statement of Tamoghna Chowdhury. Of course, it depends, but since it is a learning excercise, it’s the best opportunity to apply object oriented principles, right? If not applied to an easy problem, I doubt, OP will have it easy, applied to a more complex problem. I agree, make the solution, as easy as possible, but not easier. I draw the line between easy and easier earlier, it seems.
And my philosphy – and the philosophy of most of the developers I talk to, which are developing large enterprise applications – is: Good code runs its test. Good code has a decent code coverage. That’s how I made OP do this solution.
I highly disagree with the statement “reducing maintainability” / “adding complexity”, I state the opposite – of course, always with unit testing in mind: Having unit tests not only verifies, it also – and that’s imo more important – enforces good design – Because of one reason: If you can’t test it, you have a design problem. Now, if you have code, which is not tested and you have to refactor it, you do not know if it still works. Having a set of unit tests, which can be executed after every step of the refactoring, will always help and give you confidence. Unit tests also are a very good documentation. My experience has been: Code with decent test cases, which may seem “a bit over engineered” are easier to maintain, than code without tests – and therefore more classes/abstraction/etc.
The single responsibility principle’s goal – which we tried to apply here – is to reduce complexity: A class should only have one reason to change. I have catched myself too often, facing routines, which started easy as this one, but “historically grew”, without having test cases and putting new requirements without thinking too much about future self. It often ends up in reverse engineering, trying to test it, introducing bugs and also – the worst – staying late friday night. In my opinion and my experience, the initial investment will pay off, sooner or later, when on the other hand, not applying the principles, will be expensive.
code review
Since I’m whining about test cases, I’m gonna start with that one. I applied the tdd approach as best as I could. I didn’t care too much about naming (otherwise, I’d sit here all night) or exception handling, I try to explain my “ranting” from above. I used JUnit4 and mockito 2.7.22.
// This test class tests "two", the "remove spaces when line starts with >", and the "do not touch the line if not" aspect.
public class RemoveSpacesClientTest {
/* LineReader and Writer are mocked - this can be easily done, since those are interface.
* Unit tests by definition, do not touch file systems, services or other external components. */
@Mock
private LineReader lineReader;
@Mock
private LineWriter lineWriter;
private RemoveSpacesClient removeSpacesClient;
@Test
public void linesWithGTSignRemovesEmptySpaces() throws Exception {
String lineToRead = "> A A";
String readLineWithoutSpaces = ">AA";
when(lineReader.readLine()).thenReturn(lineToRead).thenReturn(null);
removeSpacesClient.start();
verify(lineWriter).writeLine(readLineWithoutSpaces);
}
@Test
public void noGTSignWritesSameLine() throws Exception {
String lineToRead = "A B C";
when(lineReader.readLine()).thenReturn(lineToRead).thenReturn(null);
removeSpacesClient.start();
verify(lineWriter).writeLine(lineToRead);
}
@Before
public void setUp() throws Exception {
initMocks(this);
removeSpacesClient = new RemoveSpacesClient(lineReader, lineWriter);
}
}
Here’s the actual program:
public class RemoveSpacesClient {
private final LineReader lineReader;
private final LineWriter lineWriter;
public RemoveSpacesClient(LineReader lineReader, LineWriter lineWriter) throws IOException {
// readers and writers get injected, so no need for a configuration here for now.
this.lineReader = lineReader;
this.lineWriter = lineWriter;
}
public void start() throws IOException {
int i = 0;
String line;
// I transformed the for loop to a while loop, since imo it's easier to read
while ((line = lineReader.readLine()) != null) {
if (line.charAt(0) == '>') {
line = line.replace(" ", "");
}
lineWriter.writeLine(line);
printProgramProgress(++i);
}
}
private void printProgramProgress(int i) {
if (i % 1000000 == 0) {
System.out.println(i / 1000000 + " * 10^6 lines written.");
}
}
public static void main(String[] args) {
// dedicate the main method to the creation and wiring of the needed components aka "dependency injection"
// I don't have the argument handler code, so this just a hard coded configuration impl
Configuration configuration = createArgumentHandlerConfiguration(args);
// The creation of the LineReader and Writer are done by a LineFactory (aka static factory: hide implementation, provide a name to the object generating method (!)
try (LineReader lineReader = LineFactory.createFileLineReader(configuration.getSourceFile());
LineWriter lineWriter = LineFactory.createFileLineWriter(configuration.getSinkFile())) {
new RemoveSpacesClient(lineReader, lineWriter).start();
} catch (Exception e) {
e.printStackTrace();
}
}
}
The LineFactory
is a plain static Factory which actually creates FileLineReader
‘s and FileWriteReader
‘s which use LineNumberReader
‘s and FileWriter
‘s – pretty straight forward.
You can call it overkill. I call it super sexy! 😛
Hope this helps…