Checking whether two lists contain the same elements

Posted on

Problem

I have two lists of numbers:

{i1,i2,i3,i4}{i1,i2,i3,i4}

and

{j1,j2,j3,j4}{j1,j2,j3,j4}

The problem I want to solve is: Inside a loop, if the following two conditions are satisfied, then it will do something I want, otherwise it just cycle the loop. The conditions are:

  1. Both lists contain four different numbers;
  2. If ignoring the ordering, in other words, we view these two lists as two set, then they are the same.

I want to find a very fast way to get the answer to the problem. I have very slow Fortran code as follows:

if (i2==i1) cycle
if (i3==i1) cycle
if (i3==i2) cycle
if (i4==i1) cycle
if (i4==i2) cycle
if (i4==i3) cycle

if (j2==j1) cycle
if (j3==j1) cycle
if (j3==j2) cycle
if (j4==j1) cycle
if (j4==j2) cycle
if (j4==j3) cycle 
q1=0
q2=0
q3=0
q4=0
if (i1==j1)q1=1
if (i1==j2)q1=2
if (i1==j3)q1=3
if (i1==j4)q1=4
if (i2==j1)q2=1
if (i2==j2)q2=2
if (i2==j3)q2=3
if (i2==j4)q2=4
if (i3==j1)q3=1
if (i3==j2)q3=2
if (i3==j3)q3=3
if (i3==j4)q3=4
if (i4==j1)q4=1
if (i4==j2)q4=2
if (i4==j3)q4=3
if (i4==j4)q4=4

if (q1*q2*q3*q4==24)then
 Something I want to do
endif

I think to find a good program, the following should also be considered as well as cutting down the number of instructions:

  1. The probability to have four different numbers in both lists are about
    1/10;
  2. The probability to have for these two lists containing two same set
    of numbers are very small, about 10^(-11);
  3. I really have to do many loops.

Could someone help me out? I think a good idea can really boost the code.

Solution

The only way to reduce the number of operations is to sort one of the arrays. But 4 elements is hardly worth it.

Perhaps when adding the numbers to the array then add them in a sorted order, otherwise quicksort is good for arrays under 10 elements in size.

Performing a binary search on a sorted array would give you O(log n) time for each search – with O(n log n) time to compare two full arrays of the same size. Quicksort is O(n log n), with binary search that’s roughly O(2[n log n]).

Adding items in a sorted order, then searching, would be a similar complexity: O(n log n) for adding ‘n’ items, and O(n log n) for searching through n items. I prefer this solution because it’s one simple algorithm, and it’s cleaner.

Though I think the problem here is not speed, but number of code lines.
For short code arrays would be best.

As extra in pseudo-code short (readable) code (with arrays):

// Determine equality between i[] and j[].
dim k(1 to 4)     // Indices into j[].
for m = 1 to 4
    k[m] = m
for m = 1 to 4
    found = false
    for n = m to 4
        if (i[m] == j[k[n]])
            found = true
            t = k[m]
            k[m] = k[n]
            break
    next n
    if !found fail
next m

One can roll this out using variables:

// (First version.)
// Determine equality between {i1, i2, i3, i4}] and {j1, j2, j3, j4}.
if i1 != j1
    if i1 != j2
        if i1 != j3
            if i1 != j4
                cycle
            else
                j4 = j1
        else
            j3 = j1
    else
        j2 = j1
if i2 != j2
    if i2 != j3
        if i2 != j4
            cycle
        else
            j4 = j2
    else
        j3 = j2
if i3 != j3
    if i3 != j4
        cycle
    else
        j4 = j3
if i4 != j4
    cycle
success

Which is the same as:

// (Second version.)
// Determine equality between {i1, i2, i3, i4}] and {j1, j2, j3, j4}.
if i1 == j4
    j4 = j1
else if i1 == j3
    j3 = j1
else if i1 == j2
    j2 = j1
else if i1 != j1

if i2 = j4
    j4 = j2
else if i2 = j3
    j3 = j2
eise if i2 != j2
    cycle

if i3 = j4
    j4 = j3
else if i3 != j3
    cycle

if i4 != j4
    cycle

Checking for no duplicates should be done first, and can be for instance:

if ((i1-i2)*(i1-i3)*(i1-i4)*(i2-i3)*(i2-i4) == 0) cycle

(Disregarding overflow of multiplication to 0!)

As code review

The array code version I gave, as that is reminiscant to your use of q1, q2, q3, q4: a kind of indexing. I would use primes in order to prevent a mix-up with 1*4=4 and 2*2=4. (Or OR-ing of bit masks.)

For the rest the code has more the feel of assembly language (my code included).

The main optimisation would be in the prelude to this phase in coding: keeping a set, a sorted unique array.

Leave a Reply

Your email address will not be published. Required fields are marked *