Problem
For an iOS app which helps me rolling back vandalism on Stack Exchange, I have a piece of Swift code which downloads a revision page (example) and tries to find the ‘spacer’ fragment just above a certain revision. It might not be clear what I’m talking about, so here’s a picture from Firefox + developer tools:
I have the HTML content of this page, and the GUID of the revision (8f9ab85f-1401-41e9-8f75-8a07b10bad32
) from the Stack Exchange API. I’m looking for that element just above the revision header, since those are the only HTML elements with IDs on the page. I need that spacer-9617187a-fe48-4212-9a1a-f3a366e62736
so I can link directly to https://codereview.stackexchange.com/posts/189958/revisions#spacer-9617187a-fe48-4212-9a1a-f3a366e62736
For that, I’ve written a few lines of Swift code. The problem is that string handling in Swift confuses the **** out of me. Most of the language feels rather good, but I’d rather do string manipulation in SQL than in Swift…
Here is what I have so far. It works, but I was wondering if it could break in cases I haven’t foreseen, or if it can be made more understandable/manageable by a future me. You see, even Stack Exchange’s syntax highlighter has problems understanding it…
The input parameters for this piece of code are html
(a String containing the content of the revisions page, e.g. https://codereview.stackexchange.com/posts/189958/revisions) and revisionGUID
(8F9AB85F-1401-41E9-8F75-8A07B10BAD32
in the example above – the API returns them in upper case). fragment
is eventually used as output parameter. The 43
is the length of spacer-
plus a GUID.
// Find fragment just above selected revision
let range = html.range(of: #"onclick="StackExchange.revisions.toggle('"# + revisionGUID.lowercased() + #"')""#)!
let index = html.range(of: #"<tr id=""#, options: .backwards, range: html.startIndex..<range.lowerBound)!.upperBound
let fragment = String(html[index..<html.index(index, offsetBy: 43)])
Solution
I don’t know how stable the precise HTML structure of those pages is, could that change in the future? Using a HTML parsing library might be a more robust approach.
Some remarks concerning the Swift implementation:
- Don’t force-unwrap optionals. If one of the searched strings is not found, your program will terminate with a runtime error. Use optional binding with
if let
orguard let
instead, and handle the failure case properly. - Instead of converting
revisionGUID
to lowercase you can do a case-insensitive search. -
The first search string can be created with string interpolation instead of concatenation, that makes the expression slightly shorter:
#"onclick="StackExchange.revisions.toggle('#(revisionGUID)')""#
-
Use a regular expression with positive look-ahead and look-behind for the second search. That allows to find the precise range of the spacer, without relying on a particular length.
- Put the code in a function, and add documentation.
Putting it together, the function could look like this:
/// Find spacer fragment for GUID on revisions page
/// - Parameter html: HTML of a revisions page
/// - Parameter revisionGUID: A revision GUID from the StackExchange API
/// - Returns: The spacer fragment, or `nil` if not found
func findFragment(html: String, revisionGUID: String) -> String? {
let pattern1 = #"onclick="StackExchange.revisions.toggle('#(revisionGUID)')""#
guard let range1 = html.range(of: pattern1, options: .caseInsensitive) else {
return nil
}
let pattern2 = #"(?<=<tr id=")[^"]+(?=")"#
guard let range2 = html.range(of: pattern2,
options: [.backwards, .regularExpression],
range: html.startIndex..<range1.lowerBound) else {
return nil
}
return(String(html[range2]))
}