Problem
While preparing a solution for the CodeChef – magic tricks, an easy question about counting the frequency of given character from the string slice, I came up with the following solution:
input_str = input("")
t = int(input(""))
for i in range(t):
c, l1, l2 = input("").split()
l, r = int(l1),int(l2)
print(input_str[l-1:r].count(c))
But It turns out to be inefficient because it’s not meeting the time constraint. How can I better implement it?
(here, the indexing starts from 1)
Solution
Rewriting your code with Caridorc’s suggestions gets me something like
input_str = input()
iterations = int(input())
for _ in range(iterations):
try:
character, start, end = input().split()
except ValueError:
raise ValueError("Need exactly three items (character, start, end)")
start, end = int(start), int(end)
print(input_str[start-1:end].count(character))
Note that you should probably put this in a main
method.
The slowest part is going to involve the inner iteration; namely
try:
character, start, end = input().split()
except ValueError:
raise ValueError("Need exactly three items (character, start, end)")
start, end = int(start), int(end)
print(input_str[start-1:end].count(character))
Getting input can’t get much faster, I’d assume, so this line to blame is
print(input_str[start-1:end].count(character))
If start-1
and end
are wildly different (they can be up to 106−1 different) then this will take a long time. For a maximum of 106 iterations, this could take a while.
What you should do instead is pre-process the string so you can quickly do
print(number_between(start, end, letter))
How? One way is to build cumulative lists, such as
string: a b c b a c b a b a
a: 1 1 1 1 2 2 2 3 3 4
b: 0 1 1 2 2 2 3 3 4 4
c: 0 0 1 1 1 2 2 2 2 2
That way you can quickly it calculate with
def number_between(start, end, letter):
return counts[letter][end] - count[letter][start-1]
This initial generation might take a long time, though, as you need to generate a list for every possible letter – and there are a lot of those. If you only generate a list for which there is a letter in the string, you’ve improved your worst case but you’re still generating a large square. This could be expensive.
One other option is to store indexes:
string: a b c b a c b a b a
a: 0 4 7 9 = [0, 4, 7, 9]
b: 1 3 6 8 = [1, 3, 6, 8]
c: 2 5 = [2, 5]
Then you can do a binary search on the list to find the number of instances before an index, and do
def number_between(start, end, letter):
number_before_end = ... # binary search
number_before_start = ... # binary search
return number_before_end - number_before_start
Depending on how much startup costs affect you, this could either be better or worse than the original strategy.
I’ll stick with the second of these, since it’s less problematic in the face of Unicode. The test suite seems to ignore anything but lowercase ASCII, but I’m not willing to make such a horrible assumption.
So, first we want a bunch of lists indexed by character. This will be easiest as a defaultdict(list)
, since they all start out empty:
indices = defaultdict(list)
Then we want to go over the characters in the string and add the respective indices to the particular list:
for i, character in enumerate(input_string):
indices[character].append(i)
We want to do a binary search to find the counts. This is just bisect_right
and bisect_left
, depending on whether you want to be inclusive:
def number_between(start, end, counts):
number_before_start = bisect_left(counts, start)
number_before_end = bisect_right(counts, end)
return number_before_end - number_before_start
Now all that’s needed is input, giving:
from bisect import bisect_left, bisect_right
from collections import defaultdict
def preprocess(input_string):
indices = defaultdict(list)
for i, character in enumerate(input_string):
indices[character].append(i)
return indices
def number_between(start, end, counts):
return bisect_right(counts, end) - bisect_left(counts, start)
def main():
indices = preprocess(input())
for _ in range(int(input())):
try:
character, start, end = input().split()
except ValueError:
raise ValueError("Need exactly three items (character, start, end)")
# Change to 0-based indexing
start, end = int(start) - 1, int(end) - 1
print(number_between(start, end, indices[character]))
main()
Avoid one/two letter variable names
Your code is full of one and two character letters that make it very hard to read
Avoid cryptic failure on bad inputs
- If the input has not exactly three values and you try to split your code is going to fail badly. It is better to add
:
if input_var.count(',') != 3:
raise ValueError("Exactly three commas expected, found more or less.")
Miscellanous
-
It is common practice to use _ as a loop variable if you do not care about its value
-
Usually it is better to insert a prompt like
var = input("Question?")
, if you do not want that, you should anyway remove the""
because it is unnecessary
Performance
I don’t think Python can get faster than this, maybe you should try the low level C.
it turns out that this problem can be better solved by dynamic programming,
here is the better solution
input_str = input("")
d = {x:[0 for y in input_str] for x in 'abcdefghijklmnopqrstuvwxyz'}
test_cases = int(input())
d[input_str[0]][0]=1
index=0
for letter in input_str[1:]:
index += 1
for alpha in 'abcdefghijklmnopqrstuvwxyz':
if alpha == letter:
d[alpha][index] = d[alpha][index-1]+1
else:
d[alpha][index] = d[alpha][index-1]
for cases in range(test_cases):
letter,start,end = input("").split()
start=int(start)-1
end =int(end)-1
print(d[letter][end])