Problem

# Description

Master Locksmith has just finished the work of his life: a combination

lock so big and complex that no one will ever open it without knowing

the right combination. He’s done testing it, so now all he has to do

is put the lock back into a neutral state, as to leave no trace of the

correct combination.The lock consists of some number of independently rotating discs. Each

disc has k numbers from`0`

to`k-1`

written around its circumference.

Consecutive numbers are adjacent to each other, i.e.`1`

is between

`0`

and`2`

, and because the discs are circular,`0`

is between`k-1`

and`1`

. Discs can rotate freely in either direction. On the front of

the lock there’s a marked bar going across all discs to indicate the

currently entered combination.Master Locksmith wants to reset the lock by rotating discs until all

show the same number. However, he’s already itching to start work on

an even greater and more complicated lock, so he’s not willing to

waste any time: he needs to choose such a number that it will take as

little time as possible to reach it. He can only rotate one disc at a

time, and it takes him`1`

second to rotate it by one position. Given

`k`

and the`initialState`

of the lock, find the number he should

choose. If there are multiple possible answers,return the smallest

one.

# Example

For

`k = 10`

and`initialState = [2, 7, 1]`

the output should be

`masterLocksmith(k, initialState) = 1`

It takes 1 second for the first disc to reach

`1 (2 → 1)`

. It takes 4

seconds for the second disc to reach`1 (7 → 8 → 9 → 0 → 1)`

. The

third disc is already at`1`

. The whole process can be completed in

`5`

seconds. Reaching any other number would take longer, so this is

the optimal solution.

# Constraints

Guaranteed constraints:

`3 ≤ k ≤ 10 ** 14`

`2 ≤ initialState.length ≤ 10**4`

`0 ≤ initialState[i] < k for all valid i`

# Code

```
from operator import itemgetter
from collections import defaultdict
def masterLocksmith(k, initial_state):
occurence = defaultdict(int)
for i in initial_state:
occurence[i] += 1
num_set = set(initial_state)
num_set.add(0)
shortest = {j: 0 for j in num_set}
for i in occurence:
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * occurence[i]
return min([[k, shortest[k]] for k in shortest], key=itemgetter(1,0))[0]
```

The TLE’s were really hard, I’ve tried some optimization but the best I could solve this in was still O(n∗k)

$O(n\ast k)$which was not enough to complete the challenge. I am less concerned about readability but more how this can be sped up.

Solution

```
``````
def masterLocksmith(k, initial_state):
...
for i in occurence:
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * occurence[i]
return min([[k, shortest[k]] for k in shortest], key=itemgetter(1,0))[0]
```

The aliasing of `k`

is not very helpful for readability; that last line is hard enough to decipher without confusion about the names. Another answer observed that using a running minimum would reduce memory usage and therefore possibly performance; in my opinion it would also be more readable.

```
``````
from collections import defaultdict
...
occurence = defaultdict(int)
for i in initial_state:
occurence[i] += 1
```

There’s a specialised class for this:

```
from collections import Counter
...
occurence = Counter(initial_state)
```

the best I could solve this in was still O(n∗k)

$O(n\ast k)$

I concur with Quuxplusone that it’s actually O(n2)

$O({n}^{2})$. Looking at the small example case by hand, I think there’s a reasonably straightforward way to do it in O(nlgn)

$O(n\mathrm{lg}n)$time. This is just a sketch: I haven’t implemented it.

Basically the task is to add a series of functions which look something like

```
^
/
/
/
/
v
```

Note that there are two points of inflexion: at the initial value, and opposite the initial value. (Correctly handling odd vs even values of `k`

will be a minor complication). Also note that a sum of piecewise linear functions is piecewise linear.

So you can first calculate in O(n)

$O(n)$ time the score if the target number is `0`

and the gradient of the sum at 0

. Then sort a list of inflexion points in O(nlgn)

$O(n\mathrm{lg}n)$ time and iterate through them, calculating the score by extrapolation from the previous one and the new gradient. Finally extrapolate to `k-1`

.

Style nits: misspelling of `occurence`

, gratuitous renaming of the problem statement’s `initialState`

to `initial_state`

.

I was initially skeptical of your algorithm, but I think I’ve convinced myself now that it’s correct — even the initially mysterious addition of `0`

to the `num_set`

. Basically you’re observing that if we have `m`

dials set to the `M`

th position, and `n`

dials set to `N`

th position, then it never makes sense to send them all to some position `POS`

in the middle (`M < POS < N`

). If `m > n`

then we can save time by sending them all to `M`

; if `m < n`

then we save time by sending them all to `N`

; and if `m = n`

then sending them all to `POS`

costs exactly the same amount as sending them all to either `M`

*or* `N`

. The one special case is when `m = n`

and `M < 0 < N`

, in which case the problem statement requires us to send them all to `0`

… and that’s why you special-case `num_set.add(0)`

!

The above reasoning could productively have been recorded in a block comment.

Speed-wise, I see one obvious improvement and one possible improvement.

```
shortest = {j: 0 for j in num_set}
for i in occurence:
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * occurence[i]
return min([[k, shortest[k]] for k in shortest], key=itemgetter(1,0))[0]
```

The obvious improvement is that you are repeatedly doing the lookup of `occurence[i]`

(sic) inside your inner loop, even though you could have cached it in the outer loop. Let’s do that. And similarly `shortest[k]`

, even though I can’t imagine *that* being a bottleneck:

```
shortest = {j: 0 for j in num_set}
for i, population in occurence.iteritems():
for j in num_set:
shortest[j] += min(abs(i-j), k - abs(i-j)) * population
return min(shortest.iteritems(), key=itemgetter(1,0))[0]
```

The possible improvement is that you’ve got this dict `shortest`

that you’re keeping *only* so that you can take its minimum at the end. This is a classic pattern that’s going to come up over and over, so you might as well learn it: *keep a running min.*

```
best = (0, float('inf'))
for j in num_set:
score = 0
for i, population in occurence.iteritems():
score += min(abs(i-j), k - abs(i-j)) * population
if score < best[1]:
best = (j, score)
return best[0]
```

This isn’t much of anything to look at, but it does completely eliminate the repeated computation of `shortest[j]`

inside your inner loop.

If you’d included some test vectors in your question, then we could run them and see if these changes made any difference in the running time.

P.S. — You say your algorithm is `O(n*k)`

(where `n`

= `initialState.length`

), but I believe your `num_set`

optimization makes it actually `O(n*n)`

. Since `n`

is less than `10000`

, you’re doing on the order of 100 million ops here, so I’d guesstimate it should be doable in a couple of seconds, tops. How much time are you seeing it take, and on what kind of test cases?

Clicking through to the source of the puzzle, I observe that the problem statement does not require that you use Python; in fact it offers compiled languages such as C++ as possible alternatives. In the real world, many problems can be solved by *switching tools* than by trying to hack a solution using the first tool you thought of. You should consider the very real possibility that the puzzle is deliberately *not solveable in Python* given the constraints imposed by the problem statement.

Here’s the mechanical translation to C++17 I mentioned in my previous comment.

```
#include <algorithm>
#include <cmath>
#include <map>
#include <set>
#include <tuple>
#include <vector>
long long masterLocksmith(long long k, std::vector<long long> initialState) {
std::map<long long, long long> occurence;
for (auto&& i : initialState)
occurence[i] += 1;
std::set<long long> num_set(initialState.begin(), initialState.end());
num_set.insert(0);
auto best = std::tuple(INFINITY, 0);
for (auto&& j : num_set) {
long long score = 0;
for (auto&& [i, population] : occurence)
score += std::min(std::abs(i-j), k - std::abs(i-j)) * population;
if (std::tuple(score, j) < best)
best = std::tuple(score, j);
}
return std::get<1>(best);
}
```

On my machine it takes 0.018s to run the 10 provided test cases; the Python version takes 0.073s. Neither takes anywhere close to 4 seconds. 🙂

OTOH, when I randomly generate a `k=10000, n=10000`

test case and try running *that*, I see C++ taking 6.8s (and with `-O3`

it takes 1.8s) whereas Python takes 76.1s. That’s quite the penalty for using an interpreted language!

You can cut the running time of the naïve C++ code in half again by noticing that *iterating a map* inside your inner loop involves following a lot of pointers, and that’s no good for your L1 cache. Replace the map with a cache-friendly `vector`

, leaving the rest of the code the same:

```
std::map<long long, long long> temp;
for (auto&& i : initialState)
temp[i] += 1;
std::vector<std::pair<long long, long long>> occurence(temp.begin(), temp.end());
```

and the running time on my huge random test case drops to 3.5s (0.48s with `-O3`

). I suspect that you could hack a similar cache-friendliness fix in Python, but I have no idea what it would look like; and anyway, if it halved your running time, you’d still be at 38s — a lot more hacks would be needed to get down to the compiled-language level.