Finding the longest path, avoiding obstacles in a 2D plane

Posted on

Problem

The Scenario

You are given a matrix of size m x n (width x height) with m*n spots where there are a few obstacles. Spots with obstacles are marked as 1, and those without are marked as 0. You can move vertically or horizontally, but can visit each spot only once. The goal is to cover as much area/spots as possible, avoiding obstacles.

Input

A 2D Matrix consisting of 0s and 1s and the coordinates of starting point.

Example Matrix:

0 1 0
0 0 0
1 0 0

Starting Point (0,0).
The coordinates are nothing but array indices.

Output

Output should be a line containing comma separated moves. A move is a space separated direction (n,s,e,w) and number of units of movement in the direction (range 1-1000).

Example output for the above matrix would be,

s 1,e 1,s 1,e 1,n 2

(0,0) --> (1,0) --> (1,1) --> (2,1) --> (2,2) --> (1,2) --> (0,2)

which is the longest path, visiting each coordinate only once and avoiding those coordinates with obstacles.

My algorithm

for every vertex/coordinate, recursively choose the neighbor which
yields the longest path

Code

def getDirections(p1, p2):
    return {(1,0):"n", (-1,0):"s", (0,1):"w", (0,-1):"e"}[(p1[0]-p2[0], p1[1]-p2[1])]

def findPath(start, mat):
    m, paths = getMax(mat, getNeighbours(mat), start, list())
    print(paths)
    if len(paths) == 1:
        return ""
    directions = ""
    prev = getDirections(paths[0], paths[1])
    current = ""
    ctr = 1
    for i in range(1, len(paths)-1):
        current = getDirections(paths[i], paths[i+1])
        if current == prev:
            ctr += 1
        else:
            directions, prev, ctr = directions + prev + " " + str(ctr) + ",", current, 1
    return directions + current+ " " +str(ctr)

#Neighbor is (-1,-1) if indices are out of range. For top and bottom rows for example.  
def getNeighbours(mat):
    neighbours = dict()
    for i in range(len(mat)):
        for j in range(len(mat[i])):
            n = (i-1, j) if i-1 >= 0 else (-1, -1)
            s = (i+1, j) if i+1 < len(mat) else (-1, -1)
            w = (i, j-1) if j-1 >= 0 else (-1, -1)
            e = (i, j+1) if j+1 < len(mat[i]) else (-1, -1)
            neighbours[(i,j)] = {'n':n, 's':s, 'w':w, 'e':e}
    return neighbours

def getMax(mat, neighbours, coordinates, visited):
    if coordinates == (-1,-1):
        return -1, []
    elif mat[coordinates[0]][coordinates[1]] == 1:
        return -1, []
    elif coordinates in visited:
        return -1, []
    else:
        visited.append(coordinates)
        n, nlist = getMax(mat, neighbours, neighbours[coordinates]['n'], visited[:])
        s, slist = getMax(mat, neighbours, neighbours[coordinates]['s'], visited[:])
        w, wlist = getMax(mat, neighbours, neighbours[coordinates]['w'], visited[:])
        e, elist = getMax(mat, neighbours, neighbours[coordinates]['e'], visited[:])
        if max(n,s,w,e) == -1:
            return 0, visited
        if max(n,s,w,e) == n:
            visited = nlist     
        elif max(n,s,w,e) == s:
            visited = slist     
        elif max(n,s,w,e) == w:
            visited = wlist     
        elif max(n,s,w,e) == e:
            visited = elist     
        return 1+max(n,s,w,e), visited

Solution

Well, first of all (as I said in my earlier comment); “Extended algorithmic discussion may be better on Programmers.SE, however performance improvement is definitely on-topic here!“. So, let me start with (1) a quick review of some more basic things, then maybe give you (2) some advice about your approach and (3) the performance of your code.


1:   A Quick Review

1.1) Avoid Magical Complicated One-Liners

The very first thing I see in your code (really, before anything else), is the second line (the implementation of getDirections):

return {(1,0):"n", (-1,0):"s", (0,1):"w", (0,-1):"e"}[(p1[0]-p2[0], p1[1]-p2[1])]

While Python is a particularly powerful language when it comes to doing a LOT of things in one line… This usually (almost always) is not the best idea. Both for readability, as well as for maintainability. After all, this isn’t Code Golf.

Additionally, you are also constructing the dictionary in each call – this is certainly not best practise performance wise (although in this case Python has probably optimized this for you).
But in general, if you have a lookup data structure that will not change throughout your program, you would make it a global variable, or class variable if you were using OOP.

1.2) Function Naming, and other Coding Standards

Taken from PEP8 section on Function Names:

Function names should be lowercase, with words separated by underscores as necessary to improve readability.
mixedCase is allowed only in contexts where that’s already the prevailing style (e.g. threading.py), to retain backwards compatibility

Your function names resemble Java methods, rather than Python functions. This strikes me as particularly unusual since you seem to be coding in a procedural style. (Often Java programmers will name things in a Java way, but they also use OO).

Rather than:

  • getDirections
  • findPath
  • getNeighbours
  • getMax

You should use a more Pythonic naming scheme, like this:

  • path_direction
  • find_path
  • cell_neighbours
  • max_path_length

1.3) Argument and Variable Names – Meaning is Important

In a lot (almost all) of your functions, I see some argument names like p1, or mat. As well as variable names like n, s, i, j, and wlist, etc.

Certainly, at a minimum typing matrix instead of mat is MUCH better. p1 and p2 can be figured to mean path1 and path2, but I feel the latter is much more expressive and clear.
When it comes to iteration variables like i and j, you are okay using things like this. But it is better to restructure you loops such that you can do things like for row in matrix:, rather than for i in range(len(mat)):.


2:   A Graphical Approach

2.1) The Idea

As I eluded to in my earlier comment; “have you looked into any of the Graph Search Algorithms, such as Dijkstra and A-star (A)?*” – I (personally) think they key to this answer lays in one or another graph search algorithm.

2.2) Transformation

Let’s take the example 3×3 matrix from your question:

0 1 0
0 0 0
1 0 0

Now we want to represent each of these 9 matrix cells as a node in a graph. Given that 0 represents an empty node, and 1 is a blocked node, such a representation may look like the following:

       Graph of example 3x3 Matrix
Note that as the cell at row 0, column 0 is always the start node, we can label it as such. This is important as most graph search algorithms will require a starting node to work from.

       Matrix Graph with Sink Node
My apologies for the lack of planarity, however a matrix does not lend itself easily to having no cross-overs for 15 vertices!

Note, giving vertices from the blocked nodes back to the end node is (probably, depending on which exact algorithm you use) optional. I’ve included it here for completeness, but you may very well decide for performance reasons to just ignore blocked nodes all together.

2.3) Weighting the Graph

The key to this approach, is the different colours links (vertices) in the graph above.
Imagine if each green vertex has a zero (or negative) weight, and each red vertex had some positive weight (say 1).
then an algorithm seeking to find the “best” (*i.e. optimum, in terms of cost) path would naturally visit the most nodes (via green links).

Taking the path produced in your question as an example:

s 1,e 1,s 1,e 1,n 2

This would represent a path with a cost of -1 + -1 + -1 + -1 + -2 + 1= -5.
Which is cheaper (in terms of aggregated weight cost), than the sub-optimal path:

 s 1,e 2,n 1

Which would have the cost: -1 + -2 + -1 + 1 = -3!

2.4) Finding the Algorithm

Obviously, there are many choices of Graph Search algorithms out there. And going through and examining each one is certainly not within the scope of an answer on Rode Review. But there is plenty of information on Google/Wikipedia regarding these.

Essentially, you’ll want an algorithm that can avoid cycles (i.e. only visits nodes once), find the “best” (or shortest, or lowest score, etc) path – and (most importantly) can do this in a reasonable time.
Algorithms such as DFS come to mind, although you’ll no doubt need to alter most implementations for your purposes.


3:   Performance of Your Code

3.1) Data Structures

The main reason for your codes slow performance is that you appear to not be using the correct data structures for your problem space.

Instead of using lists and tuples for everything, I suggest you take a look at something like Graph-tool and use graph based data structures.

3.2) Use Existing Algorithms

As I mentioned in ยง2 (A Graphical Approach) above, your best bet is to find an existing algorithm to accomplish this task (i.e. change the data structures you’re working with to fit an algorithm rather than the other way around as you’re currently doing).

Leave a Reply

Your email address will not be published. Required fields are marked *