Extract index of first unique element in large array in Swift

Posted on

Problem

I’m using the following code to return the first index of a unique character in a large String. It works fine until I get to large strings, where it times out.

Is there a faster way to accomplish the goal of getting a hold of the unique character’s index using NSCountedSet?

Update

The string contains 25,000 characters. I refactored the original post to extract the unique chars, then cycle through the array and see if each index is contained within the uniqueChar array. It’s a little faster, but not fast enough to pass Leetcode’s timer.

func firstUniqChar(_ s: String) -> Int {
    guard Set(s.characters).count > 0 && s.characters.count > 0 else { return -1 }
    let stringArray = s.characters.map({String($0)})

    let countedSet = NSCountedSet(array: stringArray)

    var uniqueChars: [String] = []

    for char in countedSet {
        if countedSet.count(for: char) == 1 {
            uniqueChars.append(String(describing: char))
        }
    }

    for index in 0..<stringArray.count {
        if uniqueChars.contains(stringArray[index]) {
            return index
        }
    }

    return -1
}

Solution

Your initial test

guard Set(s.characters).count > 0 && s.characters.count > 0 else { return -1 }

is not needed, the remaining code already handles the case of an
empty string.

Determining the unique characters from countedSet can simpler be done
with a filter operation instead of a for-loop:

let uniqueChars = countedSet.filter { 
    countedSet.count(for: $0) == 1
} as! [String]

But actually that list is not needed at all because all you have to do
in the final loop is to find the first character which has a count
of one. The function then looks like this:

func firstUniqChar(_ s: String) -> Int {
    let stringArray = s.characters.map({String($0)})
    let countedSet = NSCountedSet(array: stringArray)
    for index in 0..<stringArray.count {
        if countedSet.count(for: stringArray[index]) == 1 {
            return index
        }
    }
    return -1
}

which is simpler and a bit faster than the original one.

This can further be improved by avoiding the conversion of each
character to a string and the array, and operating on the UTF-16
view of the given string directly:

func firstUniqChar(_ s: String) -> Int {
    let countedSet = NSCountedSet()
    for char in s.utf16 {
        countedSet.add(char)
    }
    for (index, char) in s.utf16.enumerated() {
        if countedSet.count(for: char) == 1 {
            return index
        }
    }
    return -1
}

NSCountedSet is from the Foundation library and works with
NSObject instances. The previous method works because the
UInt16 value is automatically wrapped into an object when
added to the counted set. This conversion can be avoided by
using a native Swift dictionary instead, which makes the
code much faster:

func firstUniqChar(_ s: String) -> Int {
    // Map from character to number of occurrences:
    var counts: [UInt16: Int] = [:]

    for char in s.utf16 {
        if let cnt = counts[char] {
            counts[char] = cnt + 1
        } else {
            counts[char] = 1
        }
    }
    for (index, char) in s.utf16.enumerated() {
        if counts[char]! == 1 {
            return index
        }
    }
    return -1
}

Benchmarks. Test code:

let s = String(repeating: "abcdefghijklmnopqrstuvwxy", count: 1000) + "z" + String(repeating: "abcdefghijklmnopqrstuvwxy", count: 1000)
print(s.characters.count) // 50001

let start = Date()
let i = firstUniqChar(s)
let end = Date()

print(i, end.timeIntervalSince(start))

Results (on a 3.5 GHz Intel Core i5 iMac, compiled in Release
configuration):

Your original function:  0.084 sec
First improvement:       0.058 sec
Second improvement:      0.014 sec
Last function:           0.003 sec

The last method can be more compactly written as

func firstUniqChar(_ s: String) -> Int {
    // Map from character to number of occurrences:
    var counts: [UInt16: Int] = [:]

    for char in s.utf16 {
        counts[char] = (counts[char] ?? 0) + 1
    }

    let index = s.utf16.enumerated()
        .first(where: { counts[$0.element]! == 1 })?
        .offset
    return index ?? -1
}

without changing the performance.

Leave a Reply

Your email address will not be published. Required fields are marked *