Is this query-string parser bug-free?

Posted on

Problem

I tried to implement a definitive, reliable URL query string parser that handles every corner case:

  • it tries to be efficient by avoiding regex
  • it takes full URLs or just query strings (as long as the query string begins with a question mark)
  • it ignores the hash value
  • it handles multiple equal parameter names
  • it handles parameter names that equal built-in JavaScript methods and keywords

What do you think – did I miss something?

function parseURLParams(url) {
  if (url === null) return;

  var queryStart = url.indexOf("?") + 1,
      queryEnd   = url.indexOf("#") + 1 || url.length + 1,
      query      = url.slice(queryStart, queryEnd - 1);

  if (query === url || query === "") return;

  var params = {}, 
      nvPairs = query.replace(/+/g, " ").split("&");

  for (var i=0; i<nvPairs.length; i++) {
    var nv = nvPairs[i],
        eq = nv.indexOf("=") + 1 || nv.length + 1,
        n  = decodeURIComponent( nv.slice(0, eq - 1) ),
        v  = decodeURIComponent( nv.slice(eq) );
    if ( n !== "" ) {
      if ( !Object.prototype.hasOwnProperty.call(params, n) ) {
        params[n] = [];
      }
      params[n].push(v);
    }
  }
  return params;
}

It returns an object of arrays for parsed URLs with query strings and undefined if a query string could not be identified.

I used this in an answer over at SO.

Solution

Is it bug free? No.

These two corner-cases have been missed:

  1. parameter values containing ‘=’, i.e. ‘example.com?foo==bar’ (double equals) or ‘?foo=k=v’
  2. cannot handle parameters called ‘toString’ and ‘valueOf’ (amongst others.)

The first may well count as malformed URL, but Chrome handles it and pass through == unencoded in location.search. To handle this, go back to basic indexOf usage.

The second problem’s just pedantic really. You could try and work around it using !params.hasOwnProperty(n) instead of !(n in params), but you’ll still get stuck if someone passes a parameter called hasOwnProperty. The only way I see around this is to fall back to some dire array-based collection populated something like:

var keys = [], params = [];
for (...) {
    var n = ..., v = ...;
    var i = keys.indexOf(n);
    if (i >= 0) {
        if (!(params[i] instanceof Array)) {
            params[i] = [params[i]];
        }
        params.push(v);
    } else {
        params[i] = v;
        keys.push(n);
    }
}

I guess you’d then have to resort to returning an array of arrays rather than an object. i.e. each element of the array returned would either be [key, value] or [key, [values]], although client might find it easier to work with if you returned something like [key, value1, value2, ...] (which caters nicely for properties without values.)

you could do a null check on the url argument because the following will throw an exception.

parseURLParams(null);

Seems a tiny bit over-engineered. Something like this should work just as well, and addresses searlea’s points in his answer:

function parseURLParams(url) {
  var out = {}; 
  (url.split('?')[1] || url).split('#')[0].split('&').forEach(function(p) { 
    var kv = p.match(/([^=]*)=?(.*)/), 
        k = decodeURIComponent(kv[1]), 
        v = decodeURIComponent(kv[2] || ''); 
    hasOwnProperty.call(out, k) ? out[k].push(v) : out[k] = [v]; 
  });
  return out;
}

The regex match is only needed if you want to support equals signs in values, otherwise you can use split and the indices 0 and 1.

The main (only?) difference is that pretty much any string will be treated as a viable query — if there are no equals signs or ampersands, it’s a query with just one key and no value.

Leave a Reply

Your email address will not be published. Required fields are marked *