Problem
The Scenario
You are given a matrix of size m x n (width x height) with m*n spots where there are a few obstacles. Spots with obstacles are marked as 1, and those without are marked as 0. You can move vertically or horizontally, but can visit each spot only once. The goal is to cover as much area/spots as possible, avoiding obstacles.
Input
A 2D Matrix consisting of 0s and 1s and the coordinates of starting point.
Example Matrix:
0 1 0
0 0 0
1 0 0
Starting Point (0,0).
The coordinates are nothing but array indices.
Output
Output should be a line containing comma separated moves. A move is a space separated direction (n,s,e,w) and number of units of movement in the direction (range 1-1000).
Example output for the above matrix would be,
s 1,e 1,s 1,e 1,n 2
(0,0) --> (1,0) --> (1,1) --> (2,1) --> (2,2) --> (1,2) --> (0,2)
which is the longest path, visiting each coordinate only once and avoiding those coordinates with obstacles.
My algorithm
for every vertex/coordinate, recursively choose the neighbor which
yields the longest path
Code
def getDirections(p1, p2):
return {(1,0):"n", (-1,0):"s", (0,1):"w", (0,-1):"e"}[(p1[0]-p2[0], p1[1]-p2[1])]
def findPath(start, mat):
m, paths = getMax(mat, getNeighbours(mat), start, list())
print(paths)
if len(paths) == 1:
return ""
directions = ""
prev = getDirections(paths[0], paths[1])
current = ""
ctr = 1
for i in range(1, len(paths)-1):
current = getDirections(paths[i], paths[i+1])
if current == prev:
ctr += 1
else:
directions, prev, ctr = directions + prev + " " + str(ctr) + ",", current, 1
return directions + current+ " " +str(ctr)
#Neighbor is (-1,-1) if indices are out of range. For top and bottom rows for example.
def getNeighbours(mat):
neighbours = dict()
for i in range(len(mat)):
for j in range(len(mat[i])):
n = (i-1, j) if i-1 >= 0 else (-1, -1)
s = (i+1, j) if i+1 < len(mat) else (-1, -1)
w = (i, j-1) if j-1 >= 0 else (-1, -1)
e = (i, j+1) if j+1 < len(mat[i]) else (-1, -1)
neighbours[(i,j)] = {'n':n, 's':s, 'w':w, 'e':e}
return neighbours
def getMax(mat, neighbours, coordinates, visited):
if coordinates == (-1,-1):
return -1, []
elif mat[coordinates[0]][coordinates[1]] == 1:
return -1, []
elif coordinates in visited:
return -1, []
else:
visited.append(coordinates)
n, nlist = getMax(mat, neighbours, neighbours[coordinates]['n'], visited[:])
s, slist = getMax(mat, neighbours, neighbours[coordinates]['s'], visited[:])
w, wlist = getMax(mat, neighbours, neighbours[coordinates]['w'], visited[:])
e, elist = getMax(mat, neighbours, neighbours[coordinates]['e'], visited[:])
if max(n,s,w,e) == -1:
return 0, visited
if max(n,s,w,e) == n:
visited = nlist
elif max(n,s,w,e) == s:
visited = slist
elif max(n,s,w,e) == w:
visited = wlist
elif max(n,s,w,e) == e:
visited = elist
return 1+max(n,s,w,e), visited
Solution
Well, first of all (as I said in my earlier comment); “Extended algorithmic discussion may be better on Programmers.SE, however performance improvement is definitely on-topic here!“. So, let me start with (1) a quick review of some more basic things, then maybe give you (2) some advice about your approach and (3) the performance of your code.
1: A Quick Review
1.1) Avoid Magical Complicated One-Liners
The very first thing I see in your code (really, before anything else), is the second line (the implementation of getDirections
):
return {(1,0):"n", (-1,0):"s", (0,1):"w", (0,-1):"e"}[(p1[0]-p2[0], p1[1]-p2[1])]
While Python is a particularly powerful language when it comes to doing a LOT of things in one line… This usually (almost always) is not the best idea. Both for readability, as well as for maintainability. After all, this isn’t Code Golf.
Additionally, you are also constructing the dictionary in each call – this is certainly not best practise performance wise (although in this case Python has probably optimized this for you).
But in general, if you have a lookup data structure that will not change throughout your program, you would make it a global variable, or class variable if you were using OOP.
1.2) Function Naming, and other Coding Standards
Taken from PEP8 section on Function Names:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
mixedCase is allowed only in contexts where that’s already the prevailing style (e.g. threading.py), to retain backwards compatibility
Your function names resemble Java methods, rather than Python functions. This strikes me as particularly unusual since you seem to be coding in a procedural style. (Often Java programmers will name things in a Java way, but they also use OO).
Rather than:
getDirections
findPath
getNeighbours
getMax
You should use a more Pythonic naming scheme, like this:
path_direction
find_path
cell_neighbours
max_path_length
1.3) Argument and Variable Names – Meaning is Important
In a lot (almost all) of your functions, I see some argument names like p1
, or mat
. As well as variable names like n
, s
, i
, j
, and wlist
, etc.
Certainly, at a minimum typing matrix
instead of mat
is MUCH better. p1
and p2
can be figured to mean path1
and path2
, but I feel the latter is much more expressive and clear.
When it comes to iteration variables like i
and j
, you are okay using things like this. But it is better to restructure you loops such that you can do things like for row in matrix:
, rather than for i in range(len(mat)):
.
2: A Graphical Approach
2.1) The Idea
As I eluded to in my earlier comment; “have you looked into any of the Graph Search Algorithms, such as Dijkstra and A-star (A)?*” – I (personally) think they key to this answer lays in one or another graph search algorithm.
2.2) Transformation
Let’s take the example 3×3 matrix from your question:
0 1 0
0 0 0
1 0 0
Now we want to represent each of these 9 matrix cells as a node in a graph. Given that 0
represents an empty node, and 1
is a blocked node, such a representation may look like the following:
Note that as the cell at row 0
, column 0
is always the start node, we can label it as such. This is important as most graph search algorithms will require a starting node to work from.
My apologies for the lack of planarity, however a matrix does not lend itself easily to having no cross-overs for 15 vertices!
Note, giving vertices from the blocked nodes back to the end node is (probably, depending on which exact algorithm you use) optional. I’ve included it here for completeness, but you may very well decide for performance reasons to just ignore blocked nodes all together.
2.3) Weighting the Graph
The key to this approach, is the different colours links (vertices) in the graph above.
Imagine if each green vertex has a zero (or negative) weight, and each red vertex had some positive weight (say 1
).
then an algorithm seeking to find the “best” (*i.e. optimum, in terms of cost) path would naturally visit the most nodes (via green links).
Taking the path produced in your question as an example:
s 1,e 1,s 1,e 1,n 2
This would represent a path with a cost of -1 + -1 + -1 + -1 + -2 + 1= -5
.
Which is cheaper (in terms of aggregated weight cost), than the sub-optimal path:
s 1,e 2,n 1
Which would have the cost: -1 + -2 + -1 + 1 = -3
!
2.4) Finding the Algorithm
Obviously, there are many choices of Graph Search algorithms out there. And going through and examining each one is certainly not within the scope of an answer on Rode Review. But there is plenty of information on Google/Wikipedia regarding these.
Essentially, you’ll want an algorithm that can avoid cycles (i.e. only visits nodes once), find the “best” (or shortest, or lowest score, etc) path – and (most importantly) can do this in a reasonable time.
Algorithms such as DFS come to mind, although you’ll no doubt need to alter most implementations for your purposes.
3: Performance of Your Code
3.1) Data Structures
The main reason for your codes slow performance is that you appear to not be using the correct data structures for your problem space.
Instead of using lists and tuples for everything, I suggest you take a look at something like Graph-tool and use graph based data structures.
3.2) Use Existing Algorithms
As I mentioned in ยง2 (A Graphical Approach) above, your best bet is to find an existing algorithm to accomplish this task (i.e. change the data structures you’re working with to fit an algorithm rather than the other way around as you’re currently doing).