Problem

This is the original question

https://www.geeksforgeeks.org/union-and-intersection-of-two-sorted-arrays-2/

Given two sorted arrays, find their union and intersection.

`Example: Input : arr1[] = {1, 3, 4, 5, 7} arr2[] = {2, 3, 5, 6} Output : Union : {1, 2, 3, 4, 5, 6, 7} Intersection : {3, 5} Input : arr1[] = {2, 5, 6} arr2[] = {4, 6, 8, 10} Output : Union : {2, 4, 5, 6, 8, 10} Intersection : {6}`

I also added one more case of finding items which are only in one of the two arrays and called it *Diff*.

Please review for performance.

**Please do not comment** about code in the same class as the test and the functions not being static. It is just faster for me like this to get to the point of the exercise.

```
using System.Collections.Generic;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace ArrayQuestions
{
/// <summary>
/// https://www.geeksforgeeks.org/union-and-intersection-of-two-sorted-arrays-2/
/// </summary>
[TestClass]
public class UnionAndIntersectionOfTwoSortedArrays2
{
[TestMethod]
public void UnionTest()
{
int[] arr1 = { 1, 3, 4, 5, 7 };
int[] arr2 = { 2, 3, 5, 6 };
int[] union = { 1, 2, 3, 4, 5, 6, 7 };
CollectionAssert.AreEqual(union, Union(arr1, arr2));
}
[TestMethod]
public void IntersectionTest()
{
int[] arr1 = { 1, 3, 4, 5, 7 };
int[] arr2 = { 2, 3, 5, 6 };
int[] intersection = { 3, 5 };
CollectionAssert.AreEqual(intersection, Intersection(arr1, arr2));
}
[TestMethod]
public void DiffTest()
{
int[] arr1 = { 1, 3, 4, 5, 7 };
int[] arr2 = { 2, 3, 5, 6 };
int[] diff = { 1, 2, 4, 6, 7 };
CollectionAssert.AreEqual(diff, Diff(arr1, arr2));
}
private int[] Diff(int[] arr1, int[] arr2)
{
int i = 0;
int j = 0;
int n = arr1.Length;
int m = arr2.Length;
List<int> list = new List<int>();
while (i < n && j < m)
{
if (arr1[i] == arr2[j])
{
i++;
j++;
}
else if (arr1[i] < arr2[j])
{
list.Add(arr1[i]);
i++;
}
else
{
list.Add(arr2[j]);
j++;
}
}
while (i < n)
{
list.Add(arr1[i]);
i++;
}
while (j < m)
{
list.Add(arr2[j]);
j++;
}
return list.ToArray();
}
private int[] Intersection(int[] arr1, int[] arr2)
{
int i = 0;
int j = 0;
int n = arr1.Length;
int m = arr2.Length;
List<int> list = new List<int>();
while (i < n && j < m)
{
if (arr1[i] == arr2[j])
{
list.Add(arr1[i]);
i++;
j++;
}
else if (arr1[i] < arr2[j])
{
i++;
}
else
{
j++;
}
}
return list.ToArray();
}
public int[] Union(int[] arr1, int[] arr2)
{
int i = 0;
int j = 0;
int n = arr1.Length;
int m = arr2.Length;
List<int> list = new List<int>();
while (i < n && j < m)
{
if (arr1[i] < arr2[j])
{
list.Add(arr1[i]);
i++;
}
else if (arr2[j] < arr1[i])
{
list.Add(arr2[j]);
j++;
}
else // equals
{
list.Add(arr1[i]);
i++;
j++;
}
}
//handle the rest
for (; i < n; i++)
{
list.Add(arr1[i]);
}
for (; j < m; j++)
{
list.Add(arr2[j]);
}
return list.ToArray();
}
}
}
```

Solution

- You can optimize all 3 methods if you initialize
`list`

‘s capacity to the longest of the two arrays. Resizing a list involves allocating a new internal array and copying old items into the new array, which is something to keep in mind if you care about performance. - Your tests often contain small pieces of manually crafted data. That’s not a very scalable approach. It’s easy to generate large amounts of test data with a bit of Linq:
`Enumerable.Range(0, 1000).Select(i => random.Next(0, 1000)).OrderBy(n => n).ToArray()`

gives you an input array. Verification can be done with a few loops (is every number in`arr1`

and`arr2`

present in`union`

, and is every number in`union`

present in either`arr1`

or`arr2`

?). For the duplicate-handling implementation verification is even easier with a combination of Linq’s`Union`

,`Intersect`

and`OrderBy`

methods. - Names like
`i`

,`j`

,`n`

and`m`

are not very descriptive – specifically which array they’re associated with is not clear from their names. Changing them to something like`index1`

,`index2`

,`length1`

and`length2`

will make that relationship clear. - If you’re storing array lengths in local variables, then why not also store the results of
`arr1[i]`

and`arr2[j]`

in local variables before comparing them? Both of these are micro-optimizations that affect the readability of the code, so I would only do this for performance-sensitive code, and only when profiling shows that it’s actually an improvement.