Problem
I have an input for users to enter a Twitter account in any different way and I want to extract the user account.
For example:
twitters = [
"www.twitter.com/twitteruser1",
"@twitteruser2",
"twitteruser3",
"https://twitter.com/twitteruser4",
"https://www.twitter.com/twitteruser5",
"www.twitter.com/twitteruser6",
"http://www.twitter.com/twitteruser7",
"http://www.twitter.com/twitteruser8",
"twitter.com/twitteruser9"
]
The script that I’ve written to extract the data is the following:
twitters.each do |twitter|
# for the url
twitter_user = twitter.match(/twitter.com/([^/.]*)$/)
if twitter_user != nil
puts twitter_user[1]
next
end
# for @ beginning
twitter_user = twitter.match(/^@([^/.]*)$/)
if twitter_user != nil
puts twitter_user[1]
next
end
# if we arrive, we haven't found any coincidence
puts twitter
end
It actually works, outputting the following:
twitteruser1
twitteruser2
twitteruser3
twitteruser4
twitteruser5
twitteruser6
twitteruser7
twitteruser8
twitteruser9
But as I’m really newbie in Ruby I wanted to check for possible improvements.
Solution
I would put the regular expressions into a list:
TWITTER_PATTERNS = [
/twitter.com/([^/.]*)$/, # "www.twitter.com/twitteruser1"
/^@([^/.]*)$/, # "@twitteruser2"
]
This function can iterate over that list:
def twitter_user(twitter)
TWITTER_PATTERNS.each do |pattern|
return $1 if twitter =~ pattern
end
twitter
end
This clearly separates the “policy” (what patterns are used to extract twitter users) from the “mechanism” (the loop we use to apply the policy). A change to the function’s implementation won’t require a change to the patterns, and vice versa.
You can use a ruby case statement to check for different regex matches. If a match is found, it is by definition not nil, so it removes the need for a nil check as well.
twitters.each do |twitter|
case twitter
# for the url
when /twitter.com/([^/.]*)$/
puts $1
# for @ beginning
when /^@([^/.]*)$/
puts $1
# if we arrive, we haven't found any coincidence
else
puts twitter
end
end
When ruby does a regex match, any capture groups are assigned to the global variables $1
, $2
, $3
, and so on. See this question on StackOverflow for more details about the mechanic.
Note on your regex: If you want the capture groups to actually contain anything, you should change the asterisk to a plus: ([^/.]+)
. Without the plus, “www.twitter.com/” will be captured by one of the regex. With the plus, that string will fall into the default case.
twitters.each do |twitter|
puts twitter.match(/twitter.com/([^/.]*)$/) ||
twitter.match(/^@([^/.]*)$/) ||
twitter
end
It’s shorter, and I’d say at least equally readable. Chain of ||
operators will return first truthy value.