# Javascript run length encoding algorithm

Posted on

Problem

I have implemented a custom run length encoding algorithm that converts an m x n array into an p x (n+1) array. The extra column is to record counts. Probably easier to show through examples:

``````encode = (array) => {
// crude array equality function
arrayEquals = (...arr) => {
return arr.map(x => {
return JSON.stringify(x);
}).every((x, _, a) => {
return x === a;
});
};

let result = [],
count = -1,
len = array.length;

array.reduce((acc, val, i) => {
if (!arrayEquals([acc, val])) {
// if current value differs from last
if (i !== len - 1) {
// push the count and data to the result
result.push([++count, acc]);
count = 0; // reset the count
} else { // last iteration
// push the (n - 1)th  value
result.push([++count, acc]);
// now push the nth value
result.push([1, val]);
}
} else {
// if current value equals the last, increment counter and continue
count++;
if (i === len - 1) { // last iteration
// push regardless
count === 1 ? result.push([++count, val]) : result.push([count, val]);
}
}
return val;
}, array);

return result;
};

console.log(encode([1, 1, 2, 2, 3, 3, 4]));
console.log(encode([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
]));
console.log(encode([
[1, 2, 3],
[1, 2, 3],
[4, 5, 6]
]));
console.log(encode([
,
,
,
[2, 3],
[4, 5, 6],
[],
[]
]));``````

I think I’ve tried the full range of common test cases and it has worked so far for all of them but I feel there must be a better way to implement this. The logic was awkward to write given the need to have knowledge of previous iterations and the current. I used reduce as it seemed to lend itself to this but even so, keeping track of when to increment was very difficult and doesn’t make for the most readable code. My hunch is that there’s a neat solution where one adds dummy elements to the beginning or end of the initial array and then can do away with the need for the (`if (i === len - 1)` logic)?

Any improvements would be much appreciated.

Solution

### Separation of concerns

The problem has two distinct, independent components:

• Compute run length
• Decide if two values are equal

I agree that using `reduce` makes sense here to compute the run length.
If you separate this computation from the equality decision,
the logic of the reducer can become a lot simpler:

``````  const reducer = (acc, val) => {
if (acc.length > 0 && equals(acc[acc.length - 1], val)) {
acc[acc.length - 1]++;
return acc;
}
acc.push([1, val]);
return acc;
};
return arr.reduce(reducer, []);
``````

That is:

• When the accumulator is not empty, and the last element has matching value,
then increment the count of the last element, and return the accumulator.

• Append `[1, val]` to the accumulator.

• The accumulator starts empty.

The equality condition is to be implemented.
You can even require the implementation to be passed in as parameter.

it’s good to get the run-length computation working without worrying about mixed value types.
The implementation of the `equals` function here is not interesting at all for run-length computation.
You can start with a simple `const equals = (a, b) => a === b`.
Having mixed values in the input doesn’t affect at all the run-length computation logic.

The converse is also true: it should be possible to implement the equality logic independently from run-length.
This reduces your mental burden and helps you think clearly,
focusing on one problem at a time.
I came up with this:

``````  const equals = (a, b) => {
if (a === b) {
return true;
}
if (typeof(a) !== typeof(b)) {
return false;
}
if (Array.isArray(a)) {
return arrayEquals([a, b]);
}
return false;
};
``````

This implementation handles some easy basic cases,
and delegates some non-trivial ones,
such as comparing arrays.
Your `arrayEquals` will work nicely for the purpose.
Looking at this implementation,
it’s also quite clear what kind of values are supported:
primitive and arrays only.
(As far as we can trust the externalized `arrayEquals` implementation…)

Output formatting is unreadable. It is extremely difficult to verify correct results.

`// crude array equality function` comment is not helpful because the function name tells me. But a comment about what `arrayEquality` equates on would be very helpful. I don’t see any point to saying “crude” rather what would be good is a comment to future you (and present-time me) of how you planned to crush the crudeness. Generally don’t diss the program or yourself with unhelpful put-down comments. We’re all learning. Reviewers here aren’t shy about offering up crude adjectives.

Yeah, we all memorized the `Array.reduce()` parameters, nonetheless that is no excuse for such bad names. Names should reflect the “array encoding problem domain.” For example `acc` is bewildering; what is it in the context of the “high level” idea of what this array represents?

Make informative, verbose output to aid development and troubleshooting. Put the “before” array with its encode result for example. Write independent functions so it’s easy to turn off if desired.

`++` and the other unary operators: I see a trend of the unary operator falling out of favor. I like them a lot but I’m just mentioning this because someone will tell you not to use them. My guideline is avoid ambiguous or confusing code.

`count = 0; // reset the count` – duuuhh.

``````array.reduce((acc, val, i) => {
``````

I think the program can be cleaned up a bit if the “acc” (accumulator) argument is the results array. This change will cause a chain reaction of program modification but I suggest you do a rewrite anyway to eliminate the end of array checks. Let the built-in iterator functions do that.

Rewrite thoughts

• Write a working program that handles only numbers at first. This will help you see how to structure the program and nest the iterations. Have faith that modifying for number-arrays is easily doable.

• Big picture: for each array element, iterate (forEach, every, et.al.) the array looking for duplicates – nested “for eaches”. ALWAYS ITERATE THE WHOLE ARRAY. Each number will find itself but starting the counter at -1 will take care of that. So there is no need for a “one too many” special case and no need to check for the array end.

• Get the program working correctly then refactor to handle number-array values.

• Wherever the actual equality compare happens is where you will check for a number-array vice a primitive number, I suspect. Make the equality code a separate function to keep the looping code clean.

• If you think of the primitive number values as single element arrays then the code overall may simplify even more. Also the function you just wrote above could now take two arrays to compare. Then there are no extra equality functions or extra equality code lying about or convoluted code in a single function. No specialized code for number/number, number/array, array/array, etc combinations. Just one simple, clean equality function.

• Might be able to use the spread operator to smooth handling of primitive/array combination?

• I assume “equal” means same length and same numbers in the same order. This makes it all dead simple.

• This equals method could be added to `Array.prototype` just for fun.

# Review and alternatives.

• Use strict mode. This should go without saying, however I add it as a point in reviews when I see bugs or potencial bugs that would be caught easily.
• Never use a variable that has not been declared. An example in your code is `encode` and if you were in strict mode you would have found it before you posted the review.
• Use constants when the variable does not change. In your example you declare `result` as a variable, and define it as an array. But you never change the variable. It should be declared as a constant. `const result = [];`
• Comments are bad. They clutter up the source and are seldom of any help. Only coders will read your code and stating the obvious helps no one, and creating ambiguity via comments is dangerous. See more on comments below
• Don’t clutter your source with redundant code, less is best.
• Don’t repeat. You have the variable `len` and the only time you use it you subtract 1 from it, repeating computations for no reason. Better to subtract the one when you define the value `const lastIdx = array.length;`
• When iterating use `for of` loops in favour of array methods as they are slightly quicker, can be exited early unlike many array methods, and tend to be more suited to the logic and need of the code.

## Don’t use JSON to compare.

• JSON.stringify is slow and should not be used unless you are saving or transporting data.
• JSON.stringify will throw on cyclic references, meaning your compare function can throw even though the code state is good.
• JSON.stringify converts undefined array `items` to null, meaning that is you compare `[undefined]` and `[null]` they will be equal.
• JSON.stringify can be customised and thus there is no guarantee that it accurately reflects the data content.

Run length encoding is about the equality of items. A more detailed comparison will make for a better encoder. See rewrites for two examples.

## Redundent code

``````   arrayEquals = (...arr) => {
return arr.map(x => {
return JSON.stringify(x);
}).every((x, _, a) => {
return x === a;
});
};
``````

The returns are not needed

``````const arrayEquals = (...arr) => arr.map(x => JSON.stringify(x)).every((x, _, a) => x === a);
``````

or

``````const arrayEquals = (...arr) => {
return arr
.map(x => JSON.stringify(x))
.every((x, _, a) => x === a);
};
``````

or

``````const arrayEquals = (...arr) => {
return arr.map(x => JSON.stringify(x)).every((x, _, a) => x === a);
};
``````

Comments are rare and only when the immediate code (within eye shoot) is lacking needed abstract context, or has side effects beyond the scope of the current context.

Comment are not checked or vetted, and can remain in source even when the code that is commented has changed.

This mean it is easy for comments to lie, and an old comment can make your intent ambiguous. This is very dangerous when you later make changes, what do you believe the comment or the code.

If you feel some code needs a comment, before adding a comment ask yourself “Can I change the naming to make the code more readable?” Most of the time the answer will be yes.

Example of a poor comment in your code

``````  // if current value differs from last
if (i !== len - 1) {
``````

The comment is confusing, `i` is an index, ` \ current val` infers a data value, and `\ last` is obscure so anyone reading the comment will still have to find `len` to workout what it is anyway

You could have commented

``````  if (i !== len - 1) { // is not last index

``````

or better let the code say it all

``````  if (i !== arrayLen - 1) {
``````

Personally I use `idx` for callback indexes, rather than `i` and resurve `i`, `j`, `k` for use in loops only.

Example of stating the obvious in your code

``````count = 0; // reset the count

``````

Does that code really need clarification?

## Rewrites

The following rewrites simplify the encoding, while making the comparison more robust.

There are two versions. The first is cyclic unsafe and will throw a range error if the array contains references to itself. The second uses a depth limit to stop endless recursion marking such items as equal.

Cyclic unsafe version

Example 2 cyclic safe.