Problem
I’m using the following code to return the first index of a unique character in a large String
. It works fine until I get to large strings, where it times out.
Is there a faster way to accomplish the goal of getting a hold of the unique character’s index using NSCountedSet
?
Update
The string contains 25,000 characters. I refactored the original post to extract the unique chars, then cycle through the array and see if each index is contained within the uniqueChar
array. It’s a little faster, but not fast enough to pass Leetcode’s timer.
func firstUniqChar(_ s: String) -> Int {
guard Set(s.characters).count > 0 && s.characters.count > 0 else { return -1 }
let stringArray = s.characters.map({String($0)})
let countedSet = NSCountedSet(array: stringArray)
var uniqueChars: [String] = []
for char in countedSet {
if countedSet.count(for: char) == 1 {
uniqueChars.append(String(describing: char))
}
}
for index in 0..<stringArray.count {
if uniqueChars.contains(stringArray[index]) {
return index
}
}
return -1
}
Solution
Your initial test
guard Set(s.characters).count > 0 && s.characters.count > 0 else { return -1 }
is not needed, the remaining code already handles the case of an
empty string.
Determining the unique characters from countedSet
can simpler be done
with a filter operation instead of a for-loop:
let uniqueChars = countedSet.filter {
countedSet.count(for: $0) == 1
} as! [String]
But actually that list is not needed at all because all you have to do
in the final loop is to find the first character which has a count
of one. The function then looks like this:
func firstUniqChar(_ s: String) -> Int {
let stringArray = s.characters.map({String($0)})
let countedSet = NSCountedSet(array: stringArray)
for index in 0..<stringArray.count {
if countedSet.count(for: stringArray[index]) == 1 {
return index
}
}
return -1
}
which is simpler and a bit faster than the original one.
This can further be improved by avoiding the conversion of each
character to a string and the array, and operating on the UTF-16
view of the given string directly:
func firstUniqChar(_ s: String) -> Int {
let countedSet = NSCountedSet()
for char in s.utf16 {
countedSet.add(char)
}
for (index, char) in s.utf16.enumerated() {
if countedSet.count(for: char) == 1 {
return index
}
}
return -1
}
NSCountedSet
is from the Foundation library and works with
NSObject
instances. The previous method works because the
UInt16
value is automatically wrapped into an object when
added to the counted set. This conversion can be avoided by
using a native Swift dictionary instead, which makes the
code much faster:
func firstUniqChar(_ s: String) -> Int {
// Map from character to number of occurrences:
var counts: [UInt16: Int] = [:]
for char in s.utf16 {
if let cnt = counts[char] {
counts[char] = cnt + 1
} else {
counts[char] = 1
}
}
for (index, char) in s.utf16.enumerated() {
if counts[char]! == 1 {
return index
}
}
return -1
}
Benchmarks. Test code:
let s = String(repeating: "abcdefghijklmnopqrstuvwxy", count: 1000) + "z" + String(repeating: "abcdefghijklmnopqrstuvwxy", count: 1000)
print(s.characters.count) // 50001
let start = Date()
let i = firstUniqChar(s)
let end = Date()
print(i, end.timeIntervalSince(start))
Results (on a 3.5 GHz Intel Core i5 iMac, compiled in Release
configuration):
Your original function: 0.084 sec First improvement: 0.058 sec Second improvement: 0.014 sec Last function: 0.003 sec
The last method can be more compactly written as
func firstUniqChar(_ s: String) -> Int {
// Map from character to number of occurrences:
var counts: [UInt16: Int] = [:]
for char in s.utf16 {
counts[char] = (counts[char] ?? 0) + 1
}
let index = s.utf16.enumerated()
.first(where: { counts[$0.element]! == 1 })?
.offset
return index ?? -1
}
without changing the performance.