Posted on

Problem

# Description

Master Locksmith has just finished the work of his life: a combination
lock so big and complex that no one will ever open it without knowing
the right combination. He’s done testing it, so now all he has to do
is put the lock back into a neutral state, as to leave no trace of the
correct combination.

The lock consists of some number of independently rotating discs. Each
disc has k numbers from 0 to k-1 written around its circumference.
Consecutive numbers are adjacent to each other, i.e. 1 is between
0 and 2, and because the discs are circular, 0 is between k-1
and 1. Discs can rotate freely in either direction. On the front of
the lock there’s a marked bar going across all discs to indicate the
currently entered combination.

Master Locksmith wants to reset the lock by rotating discs until all
show the same number. However, he’s already itching to start work on
an even greater and more complicated lock, so he’s not willing to
waste any time: he needs to choose such a number that it will take as
little time as possible to reach it. He can only rotate one disc at a
time, and it takes him 1 second to rotate it by one position. Given
k and the initialState of the lock, find the number he should
choose. If there are multiple possible answers, return the smallest
one.

# Example

For k = 10 and initialState = [2, 7, 1]

the output should be masterLocksmith(k, initialState) = 1

It takes 1 second for the first disc to reach 1 (2 → 1). It takes 4
seconds for the second disc to reach 1 (7 → 8 → 9 → 0 → 1). The
third disc is already at 1. The whole process can be completed in
5 seconds. Reaching any other number would take longer, so this is
the optimal solution.

# Constraints

Guaranteed constraints:

3 ≤ k ≤ 10 ** 14

2 ≤ initialState.length ≤ 10**4

0 ≤ initialState[i] < k for all valid i

# Code

from operator import itemgetter
from collections import defaultdict

def masterLocksmith(k, initial_state):
occurence = defaultdict(int)
for i in initial_state:
occurence[i] += 1

num_set = set(initial_state)

shortest = {j: 0 for j in num_set}
for i in occurence:
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * occurence[i]
return min([[k, shortest[k]] for k in shortest], key=itemgetter(1,0))


The TLE’s were really hard, I’ve tried some optimization but the best I could solve this in was still O(nk)

$O\left(n\ast k\right)$

$O(n * k)$ which was not enough to complete the challenge. I am less concerned about readability but more how this can be sped up.

Solution

def masterLocksmith(k, initial_state):
...
for i in occurence:
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * occurence[i]
return min([[k, shortest[k]] for k in shortest], key=itemgetter(1,0))


The aliasing of k is not very helpful for readability; that last line is hard enough to decipher without confusion about the names. Another answer observed that using a running minimum would reduce memory usage and therefore possibly performance; in my opinion it would also be more readable.

from collections import defaultdict
...
occurence = defaultdict(int)
for i in initial_state:
occurence[i] += 1


There’s a specialised class for this:

from collections import Counter
...
occurence = Counter(initial_state)


the best I could solve this in was still O(nk)

$O\left(n\ast k\right)$

$O(n∗k)$

I concur with Quuxplusone that it’s actually O(n2)

$O\left({n}^{2}\right)$

$O(n^2)$. Looking at the small example case by hand, I think there’s a reasonably straightforward way to do it in O(nlgn)

$O\left(n\mathrm{lg}n\right)$

$O(n lg n)$ time. This is just a sketch: I haven’t implemented it.

Basically the task is to add a series of functions which look something like

  ^
/
/
/
/
v


Note that there are two points of inflexion: at the initial value, and opposite the initial value. (Correctly handling odd vs even values of k will be a minor complication). Also note that a sum of piecewise linear functions is piecewise linear.

So you can first calculate in O(n)

$O\left(n\right)$

$O(n)$ time the score if the target number is 0 and the gradient of the sum at 0

$0$

$0$. Then sort a list of inflexion points in O(nlgn)

$O\left(n\mathrm{lg}n\right)$

$O(n lg n)$ time and iterate through them, calculating the score by extrapolation from the previous one and the new gradient. Finally extrapolate to k-1.

Style nits: misspelling of occurence, gratuitous renaming of the problem statement’s initialState to initial_state.

I was initially skeptical of your algorithm, but I think I’ve convinced myself now that it’s correct — even the initially mysterious addition of 0 to the num_set. Basically you’re observing that if we have m dials set to the Mth position, and n dials set to Nth position, then it never makes sense to send them all to some position POS in the middle (M < POS < N). If m > n then we can save time by sending them all to M; if m < n then we save time by sending them all to N; and if m = n then sending them all to POS costs exactly the same amount as sending them all to either M or N. The one special case is when m = n and M < 0 < N, in which case the problem statement requires us to send them all to 0… and that’s why you special-case num_set.add(0)!

The above reasoning could productively have been recorded in a block comment.

Speed-wise, I see one obvious improvement and one possible improvement.

shortest = {j: 0 for j in num_set}
for i in occurence:
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * occurence[i]
return min([[k, shortest[k]] for k in shortest], key=itemgetter(1,0))


The obvious improvement is that you are repeatedly doing the lookup of occurence[i] (sic) inside your inner loop, even though you could have cached it in the outer loop. Let’s do that. And similarly shortest[k], even though I can’t imagine that being a bottleneck:

shortest = {j: 0 for j in num_set}
for i, population in occurence.iteritems():
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * population
return min(shortest.iteritems(), key=itemgetter(1,0))


The possible improvement is that you’ve got this dict shortest that you’re keeping only so that you can take its minimum at the end. This is a classic pattern that’s going to come up over and over, so you might as well learn it: keep a running min.

best = (0, float('inf'))
for j in num_set:
score = 0
for i, population in occurence.iteritems():
score += min(abs(i-j), k - abs(i-j)) * population
if score < best:
best = (j, score)
return best


This isn’t much of anything to look at, but it does completely eliminate the repeated computation of shortest[j] inside your inner loop.

If you’d included some test vectors in your question, then we could run them and see if these changes made any difference in the running time.

P.S. — You say your algorithm is O(n*k) (where n = initialState.length), but I believe your num_set optimization makes it actually O(n*n). Since n is less than 10000, you’re doing on the order of 100 million ops here, so I’d guesstimate it should be doable in a couple of seconds, tops. How much time are you seeing it take, and on what kind of test cases?

Clicking through to the source of the puzzle, I observe that the problem statement does not require that you use Python; in fact it offers compiled languages such as C++ as possible alternatives. In the real world, many problems can be solved by switching tools than by trying to hack a solution using the first tool you thought of. You should consider the very real possibility that the puzzle is deliberately not solveable in Python given the constraints imposed by the problem statement.

Here’s the mechanical translation to C++17 I mentioned in my previous comment.

#include <algorithm>
#include <cmath>
#include <map>
#include <set>
#include <tuple>
#include <vector>

long long masterLocksmith(long long k, std::vector<long long> initialState) {
std::map<long long, long long> occurence;
for (auto&& i : initialState)
occurence[i] += 1;

std::set<long long> num_set(initialState.begin(), initialState.end());
num_set.insert(0);

auto best = std::tuple(INFINITY, 0);
for (auto&& j : num_set) {
long long score = 0;
for (auto&& [i, population] : occurence)
score += std::min(std::abs(i-j), k - std::abs(i-j)) * population;
if (std::tuple(score, j) < best)
best = std::tuple(score, j);
}
return std::get<1>(best);
}


On my machine it takes 0.018s to run the 10 provided test cases; the Python version takes 0.073s. Neither takes anywhere close to 4 seconds. 🙂
OTOH, when I randomly generate a k=10000, n=10000 test case and try running that, I see C++ taking 6.8s (and with -O3 it takes 1.8s) whereas Python takes 76.1s. That’s quite the penalty for using an interpreted language!

You can cut the running time of the naïve C++ code in half again by noticing that iterating a map inside your inner loop involves following a lot of pointers, and that’s no good for your L1 cache. Replace the map with a cache-friendly vector, leaving the rest of the code the same:

std::map<long long, long long> temp;
for (auto&& i : initialState)
temp[i] += 1;
std::vector<std::pair<long long, long long>> occurence(temp.begin(), temp.end());


and the running time on my huge random test case drops to 3.5s (0.48s with -O3). I suspect that you could hack a similar cache-friendliness fix in Python, but I have no idea what it would look like; and anyway, if it halved your running time, you’d still be at 38s — a lot more hacks would be needed to get down to the compiled-language level.