Problem
I am working on a small custom Markup script in Java that converts a Markdown/Wiki style markup into HTML.
The below works, but as I add more Markup I can see it becoming unwieldy and hard to maintain. Is there is a better, more elegant, way to do something similar?
private String processString(String t) {
t = setBoldItal(t);
t = setBold(t);
t = setItal(t);
t = setUnderline(t);
t = setHeadings(t);
t = setImages(t);
t = setOutLinks(t);
t = setLocalLink(t);
return t;
}
And on top of it, passing in the string itself and setting it back to the same string just doesn’t feel right. But, I just don’t know of any other way to go about this.
Solution
You could create a StringProcessor
interface:
public interface StringProcessor {
String process(String input);
}
public class BoldProcessor implements StringProcessor {
public String process(final String input) {
...
}
}
and create a List
from the available implementations:
final List<StringProcessor> processors = new ArrayList<StringProcessor>();
processors.add(new ItalicProcessor());
processors.add(new BoldProcessor());
...
and use it:
String result = input;
for (final StringProcessor processor: processors) {
result = processor.process(result);
}
return result;
If you want to process a language, even a simple one like a Wiki Markup, you should eventually write a proper parser, not do step-by-step replacement, nor chain a number of individual processors, no matter how fancy their implementation.
You can go with the fully generic approach, generate an AST from the markup (this would look similar to @rolfl’s StyledString
), and then use an AST serializer to create the end result (but for efficiency’s sake, please append to a StringBuilder
instead of repeatedly creating new strings). This allows you to use multiple serializers; e.g. if at one point you want to create PDF instead of HTML, this gives you a huge advantage. Your AST nodes should implement the visitor pattern for this purpose. (The serializer would be the visitor.)
But that would probably be overkill here. A simple parser that outputs the HTML as it parses would be simpler and probably sufficient.
You can use parser generators like ANTLR to generate the parser, or you can hand-write a parser.
I like @palacsint’s approach but I just have one thing to add, you can probably do most of the processing with the same class.
public class TagProcessor implements StringProcessor {
private final String wrapWith;
public TagProcessor(String wrapWith) {
this.wrapWith = wrapWith;
}
@Override
public String process(String input) {
return "<" + wrapWith + ">" + input + "</" + wrapWith + ">";
}
}
processors.add(new TagProcessor("i"));
processors.add(new TagProcessor("b"));
I also believe that you can add generalize a lot of the functionality for other processors into a proper class and use it’s constructor to send proper parameters. (Wrapping in <div class="someclass">...</div>
for example).
This sounds like a case where you should encapsulate the data with a ‘Decorator Pattern’.
You should declare a simple interface such as:
public interface StyledString {
public String toFormatted();
public StyledString getSource();
}
Then create a concrete class for each style you have:
public class BoldStyle implements StyledString {
private final StyledString source;
public BoldStyle(StyledString source) {
this.source = source;
}
public String toFormatted() {
return "<b>" + source.toFormatted() + "</b>";
}
public StyledString getSource() {
return source;
}
}
You should also have a ‘NoStyle’ class that takes a raw String input, and returns a null getSource();
using this system you can easily add Styles, and you can have styles that join phrases, etc…..
Also, you can add the styles together in a way that makes decomposing the value easier at a later point, and you only need to add/wrap the styles that you want.