Problem
Can you think of a reason the following extension should not be used in production, or a better way of implementing it:
public extension Range {
@warn_unused_result
public func intersect(other: Range) -> Range {
let ds = startIndex.distanceTo(other.startIndex)
let de = startIndex.distanceTo(other.endIndex)
let s = ds <= 0 ? startIndex : startIndex.advancedBy(ds, limit: endIndex)
let e = de <= 0 ? startIndex : startIndex.advancedBy(de, limit: endIndex)
return s..<e
}
}
The method should pass the following assertions:
(5...7).intersect(9...9) == (8..<8) // ....|||.---..
(5...7).intersect(1...3) == (5..<5) // ....---.|||..
(5...7).intersect(8...9) == (8..<8) // ....|||---...
(5...7).intersect(1...4) == (5..<5) // ....---|||...
(5...7).intersect(7...9) == (7..<8) // ....||+--....
(5...7).intersect(1...5) == (5..<6) // ....--+||....
(5...7).intersect(6...9) == (6..<8) // ....|++-.....
(5...7).intersect(1...6) == (5..<7) // ....-++|.....
(5...7).intersect(6...6) == (6..<7) // ....|+|......
(5...7).intersect(4...8) == (5..<8) // ....-+++-....
Note the first four assertions with an empty intersection: the method is currently returning startIndex..<startIndex
if the other
range is on the “left”, or endIndex..<endIndex
if it is on the “right”. Is this a reasonable behaviour?
Solution
There is a general problem: The
distanceTo(_: Self) -> Self.Distance
of the
ForwardIndexType
protocol requires that
start
andend
are part of the same sequence when conforming toRandomAccessSequenceType
.
end
is reachable fromself
by incrementation otherwise.
(and RandomAccessSequenceType
is probably a typo and should be
RandomAccessIndexType
).
Here is a simple example where this conditions are not met, and
your code crashes:
let r1 = "f oo".rangeOfString(" ")!
print(r1) // 1..<5
let r2 = "ba r".rangeOfString(" ")!
print(r2) // 2..<6
let r = r1.intersect(r2) // fatal error: cannot increment endIndex
Incrementing the start index of r1
never hits the start index
of r2
.
Here is another example:
struct MyIndex: ForwardIndexType {
let value: Int
func successor() -> MyIndex {
return MyIndex(value: self.value + 1)
}
}
func == (lhs: MyIndex, rhs: MyIndex) -> Bool {
return lhs.value == rhs.value
}
is an index which can only be incremented. Then
let r1 = Range(MyIndex(value: 4) ... MyIndex(value: 6))
let r2 = Range(MyIndex(value: 3) ... MyIndex(value: 5))
let r = r1.intersect(r2)
will loop forever, because incrementing MyIndex(value: 4)
never reaches MyIndex(value: 3)
.
So intersecting ranges makes only sense if both ranges refer
to the same sequence. I don’t know if that can be ensured at
compile time with a suitable restriction, I assume that it is
not possible. It should at least be documented (similar to the
above mentioned requirements of ForwardIndexType
).
You could restrict the method to ranges of integer elements
public extension Range where Element : IntegerType { ... }
but of course that reduces its usability considerably.
The implementation itself looks good to me. The behavior in the case
of an empty intersection it a sensible choice. Is similar to that of the clamp()
method of ClosedInterval
:
ClosedInterval(3, 4).clamp(ClosedInterval(5, 6)) // 4 ... 4
ClosedInterval(3, 4).clamp(ClosedInterval(1, 2)) // 3 ... 3
which returns the left or right bound, depending on wether the other
interval lies on the left or on the right.
If your ranges just represent intervals and do not refer to indices
of sequences, then ClosedInterval
(and the existing clamp()
method) might be an alternative.
Martin’s excellent insight into the nature of the problem got me thinking if we could do any better than extension Range where Element : IntegerType
. Here is one possibility:
public extension Range where Element : Comparable {
@warn_unused_result
public func intersect(other: Range) -> Range {
guard endIndex > other.startIndex else {
return endIndex..<endIndex
}
guard other.endIndex > startIndex else {
return startIndex..<startIndex
}
let s = other.startIndex > startIndex ? other.startIndex : startIndex
let e = other.endIndex < endIndex ? other.endIndex : endIndex
return s..<e
}
}
This at least passes Martin’s range of string test.
Of course, when working with strings we must take the usual precautions since:
let s1 = "f oo"
let s2 = "ba r"
let r1 = s1.characters.indices // 0..<7
r1.count // 4
s1.utf16.count // 7
let i = r1.startIndex // 0
i.successor().successor() // 5 (i therefore holds reference to s1!!!)
This means that even though the intersect
method does what it’s supposed to:
import Foundation
let flag1r = s1.rangeOfString(" ")! // 1..<5
let flag2r = s2.rangeOfString(" ")! // 2..<6
let intersection = flag1r.intersect(flag2r) // 2..<5
The result may still be surprising if you then use it to:
s1[flag1r] // " "
s1[flag2r] // "� o"
s1[intersection] // "� "
This is because:
let u = " ".unicodeScalars
.map{ "\u{(String($0.value, radix: 16))}" }
.joinWithSeparator("")
print(u) // u{1f1e9}u{1f1f0}
"u{1f1e9}u{1f1f0}" //
"u{1f1e9}" //
"u{1f1f0}" //
Personally, I believe that the String API should guaranty that the following is always true:
string.characters.indices.description == (0..<string.characters.count).description
… and NOT, as it is currently the case:
string.characters.indices.description == (0..<string.utf16.count).description
… simply because Character
in swift represents extended grapheme clusters, not utf code points.
The fundamental problem, though, which goes beyond String API, is that the collection index is not currently a scalar value, but also holds a reference to its collection! Swift 3.0 is set to completely rethink indexing throughout the language, so we are yet to see whether we will no longer have to deal with these complexities (unless we explicitly want to).
Note, however, that this particular issue does not (in itself) make the intersect
method on ranges wrong, though it does require the same care that rangeOfString
requires when you are using it to cast a view into a collection whose index type has a dodgy sense of successor()
.