Problem
This is a component for a lexer which operates on a byte input stream that handles reading and decoding the contents of a string contained within double-quotes. The validity of the encoding is handled by CharsetDecoder#decode()
which will throw an exception on an invalid buffer.
The charset
argument defines the encoding of the string contents not the encoding of the quotation marks themselves (which is defined by the lexer). This allows the contents of the string to be defined as a character set other than the parent document. However, it must be a valid subset of Unicode such that its contents do not erroneously contain a byte value to match code point 34. Some form of readHeredoc()
could provide an alternative for this specific use case.
String readString(
Reader r,
Charset charset)
throws IOException
{
ByteArrayOutputStream ostream = new ByteArrayOutputStream();
int cp;
while ((cp = r.read()) > 0) {
if (cp == '"')
break;
ostream.write(cp);
}
return charset.newDecoder().decode(ByteBuffer.wrap(ostream.toByteArray()))
.toString();
}
Reaching the end the buffer before a matching quotation is not handled; I haven’t decided on the defined behavior for this and its absence is known.
Solution
the only advice, that I can give you, is to separate the logic in multiple methods. This will make your method shorter and allow the code to be reused.
Personally, I see two other methods.
String readString(Reader r, Charset charset) throws IOException {
ByteArrayOutputStream ostream = copyInputToStream(r);
final ByteBuffer byteBuffer = ByteBuffer.wrap(ostream.toByteArray());
return decodeBufferAsString(charset, byteBuffer);
}
private String decodeBufferAsString(Charset charset, ByteBuffer byteBuffer) throws CharacterCodingException {
return charset.newDecoder().decode(byteBuffer).toString();
}
private ByteArrayOutputStream copyInputToStream(Reader reader) throws IOException {
ByteArrayOutputStream ostream = new ByteArrayOutputStream();
int cp;
while ((cp = reader.read()) > 0) {
if (cp == '"') {
break;
}
ostream.write(cp);
}
return ostream;
}