Problem
I made code to shuffle an array in a semi-random way. That is, it has to be random based on a string, but for the same string, it should return the same order.
Is there a better way of doing it?
function djbHash(s) {
let hash = 5381;
for (let c of s) {
hash = hash * 33 + c.charCodeAt(0);
}
return hash;
}
function mapToValues(s, values) {
return values[djbHash(s) % values.length];
}
var randomArray=[];
var myArray = [0,1,2,3,4,5,6,7,8,9];
var string = "bigcat";
var length = myArray.length;
for(var i=0;i<length;i++){
console.log(myArray)
var item = mapToValues(string, myArray);
var removed = myArray.splice(myArray.indexOf(item),1);
console.log(removed);
randomArray.push(removed[0]);
}
console.log(randomArray);
Solution
Random numbers, hashes and precision.
Good hasher.
The hash function you use is very poor with it only producing unique hash values for the first 8 characters of a string.
The following words all produce the same hash
"characters","charactesr","charactezz" = 8246171808294230000
The reason is that numbers when used in computers have a limited precision. Once you go over that precision you can no longer add numbers that are smaller.
For example adding 100 to the hash number does nothing the num+100 === num
returns true.
var num = 8246171808294230000
console.log(num + 100 === num); // >> true
To generate a good hash you need to limit the size of the hash number. This can easily be done by just using integer math and bitwise manipulation.
An improvement (not perfect) hasher
function stringTo32BitHash(str){
const a = 14827; // Use a prime
var v = 0;
for(var i = 0; i < str.length; i += 1){
v = ((v*a) + str.charCodeAt(i))%0xFFFFFFFF;
}
return v;
}
Pseudo random shuffle.
Next your shuffle is not working very well. You are not providing enough variance in the selection of which array items to reorder. With very different hashes producing the very same shuffle.
- For hash 8.97895922819539e+21 the result is 4,1,0,2,8,9,3,6,5,7
- For hash 8.978959228225448e+21 the result is 4,1,0,2,8,9,3,6,5,7
The best way to use a hash to scramble a list is to use the hash as a seed for a pseudo random number generator. The seed ensures that the sequence of random numbers will be the same for the same seed. Unfortunately javascript does not provide a seed value that we can access for it’s Math.random()
function so we have to create our own.
The most basic pseudo random number generator is seed = (A * seed + B) % M where A,B,M are large integers. Each time the function is called a new seed is generated and return as a random value. The seed is also used as the seed for the next value.
Once you have a good sequence of random values you can use them to create a better shuffle.
Putting it all together.
Thus a rewrite of your code with a random number generator to do the shuffling and an improved string to hash function.
// creates a random number generator function.
function createRandomGenerator(seed){
const a = 5486230734; // some big numbers
const b = 6908969830;
const m = 9853205067;
var x = seed;
// returns a random value 0 <= num < 1
return function(seed = x){ // seed is optional. If supplied sets a new seed
x = (seed * a + b) % m;
return x / m;
}
}
// function creates a 32bit hash of a string
function stringTo32BitHash(str){
var v = 0;
for(var i = 0; i < str.length; i += 1){
v += str.charCodeAt(i) << (i % 24);
}
return v % 0xFFFFFFFF;
}
// shuffle array using the str as a key.
function shuffleArray(str,arr){
var rArr = [];
var random = createRandomGenerator(stringTo32BitHash(str));
while(arr.length > 1){
rArr.push(arr.splice(Math.floor(random() * arr.length), 1)[0]);
}
rArr.push(arr[0]);
return rArr;
}
The need to test.
At this point I am 60% happy but by just looking at the code you can not tell how reliable it is. You need to test the function over a large set of values.
I have created two test.
Testing the random number generator.
This is a simple test, and rather than test it for standard deviation I do a visual test by graphing the distribution of the random numbers generated. I compare them to the inbuilt random function. It can give a good indication of the randomness. Why? because it’s more fun…
var createImage=function(w,h){var i=document.createElement("canvas");i.width=w;i.height=h;i.ctx=i.getContext("2d");return i;}
var canvas = createImage(512,200);
document.body.appendChild(canvas);
var ctx = canvas.ctx;
// creates a random number generator function.
function createRandomGenerator(seed){
const a = 54862307348; // some big numbers
const b = 69089698305;
const m = 98532050673;
var x = seed;
// returns a random value 0 <= num < 1
return function(seed = x){ // seed is optional. If supplied sets a new seed
x = (seed * a + b) % m;
return x / m;
}
}
ctx.font="18px arial"
var rand = createRandomGenerator(10394);
var values = new Array(canvas.width);
var valuesR = new Array(canvas.width);
for(var i = 0; i < canvas.width; i ++){
values[i]=0;
valuesR[i]=0;
}
function testRandom(){
for(var i = 0; i < 1000; i ++){
var r = Math.floor(rand()*canvas.width);
var r1 = Math.floor(Math.random()*canvas.width);
values[r] += 1;
valuesR[r1] += 1;
}
var max = 0;
for(var i = 0; i < canvas.width; i ++){
max = Math.max(max,values[i],valuesR[i]);
}
ctx.clearRect(0,0,512,200)
ctx.strokeStyle = "blue"
ctx.beginPath();
ctx.moveTo(0,canvas.height - (values[0]/max)*canvas.height);
for(var i = 1; i < canvas.width; i ++){
ctx.lineTo(i,canvas.height - (values[i]/max)*canvas.height);
}
ctx.stroke();
ctx.strokeStyle = "red"
ctx.beginPath();
ctx.moveTo(0,canvas.height - (valuesR[0]/max)*canvas.height);
for(var i = 1; i < canvas.width; i ++){
ctx.lineTo(i,canvas.height - (valuesR[i]/max)*canvas.height);
}
ctx.stroke();
ctx.fillStyle = "red";
ctx.fillText("Red builtin Math.random()",10,canvas.height - 40)
ctx.fillStyle = "Blue";
ctx.fillText("Blue custom random function",10,canvas.height - 20)
setTimeout(testRandom,100);
}
testRandom()
note You would need to do a proper test if you were to use this function in an enterprise situation. It looks good does not cut it in the real world.
Test the shuffle and hash
Next I test the number of repeated shuffled array results for unique words, and also test the hash function to see how often it repeats a hash. There is no perfect solution but the aim it to keep the number of matches as low as possible.
Note that if the test was left to run indefinitely the number of matches as a fraction of unique words would start to approch one. This is normal. The test counts how many unique words are required to end up with 200 matched shuffled arrays then displays the score as number of words. The greater the value the better the shuffle. For a perfect result and 10 array items you would expect a score of about 38141.
// creates a random number generator function.
function createRandomGenerator(seed){
const a = 5486230734; // some big numbers
const b = 6908969830;
const m = 9853205067;
var x = seed;
// returns a random value 0 <= num < 1
return function(seed = x){ // seed is optional. If supplied sets a new seed
x = (seed * a + b) % m;
return x / m;
}
}
// function creates a 32bit hash of a string
function stringTo32BitHash(str){
var v = 0;
for(var i = 0; i < str.length; i += 1){
v += str.charCodeAt(i) << (i % 24);
}
return v % 0xFFFFFFFF;
}
// shuffle array using the str as a key.
function shuffleString(str,arr){
var rArr = [];
var random = createRandomGenerator(stringTo32BitHash(str));
while(arr.length > 1){
rArr.push(arr.splice(Math.floor(random() * arr.length), 1)[0]);
}
rArr.push(arr[0]);
return rArr;
}
var resultsEl = document.createElement("div");
document.body.appendChild(resultsEl);
const chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const array = [0,1,2,3,4,5,6,7,8,9];
// for an array of ten unquie values
function randomWord(maxLen){
var wLen = Math.floor(Math.random() * (maxLen-7)) + 7;
var str = "";
while(wLen-- >0){
str += chars[Math.floor(Math.random()*chars.length)];
}
return str;
}
var count = 0;
const resultSet = new Set();
const wordSet = new Set();
const hashSet = new Set();
var matches = 0;
var hashMatches = 0;
function testASet(){
var c =0;
while(c++ < 1000){
var word = randomWord(10);
if(!wordSet.has(word)){
wordSet.add(word);
var hash = stringTo32BitHash(word)
if(hashSet.has(hash)){
hashMatches += 1;
}else{
hashSet.add(hash);
}
var r = shuffleString(word,[...array]).join("");
if(resultSet.has(r)){
matches += 1;
if(matches === 200){
break;
}
}else{
resultSet.add(r);
}
count++;
}
}
resultsEl.innerHTML = "Matches : " + matches + "<br>Hash matches : " + hashMatches+"<br> Unique words : " + count;
if(matches === 200){
resultsEl.innerHTML += "<br>Test complete. Score : "+ count + " higher is better ";
resultsEl.innerHTML += "<br>"+((count / 38141)*100).toFixed(0) + "% of ideal score.";
}else{
setTimeout(testASet,350);
}
}
testASet()
The results of the test indicate that the function shuffle string is only 9% of the ideal situation. But if you look at the numbers you will see that there is a high rate of hash clashes. Finding a better hash function will improve the score, also I used words from 7 -10 characters long, which puts the function at a slight disadvantage.
If you are wondering how I got the ideal score, I did it with the following test that simulated the ideal shuffle.
Improvements.
In the end the weak point is the hash generator. Using a 64bit hash will improve the result. But that will not get much more. Using the characters of the word as an integer will give a perfect hash, but then you will have numbers with bit counts as long as the string * 8, and the random number generator would have to be able to accommodate these large integers as seeds. It leaves the question, how many bits long does the number type have to be for the fnction to get within 99% of ideal?
You are unnecessarily running the djbHash(s)
function myArray.length
many times while in your example, it always return the same value which is 6953352744047
. I would advise you to call the djbHash(s)
function only once within the body of your code and pass the result of it to the mapToValues(s, values)
function instead of the string argument.