Problem
I wanted to represent graphs in Python, and wrote the following class. I don’t care about values associated with nodes or the weights of the connections between them, simply the connections between nodes.
Is this a good enough implementation? Or is there a significantly better way?
class Graph:
def __init__(self, nodes):
self.array_representation = [[] for _ in range(nodes)]
def directly_connected(self, node_1, node_2):
# Predicate, checks if the two nodes are connected by one edge
return node_2 in self.array_representation[node_1]
def connect(self, node_1, node_2):
# Draws an edge between the node at node_1 and the node at node_2
self.array_representation[node_1].append(node_2)
self.array_representation[node_2].append(node_1)
Solution
If your graph becomes big (or rather if a node has many connections), node_2 in self.array_representation[node_1]
might take a long time, because in
is O(n) for lists. Therefore I would use a set for the connections of a node. You don’t even needed to change a lot:
class Graph:
def __init__(self, nodes):
self.connections = [set() for _ in range(nodes)]
def directly_connected(self, node_1, node_2):
"""Predicate, checks if the two nodes are connected by one edge"""
return node_2 in self.connections[node_1]
def connect(self, node_1, node_2):
"""Draws an edge between the node at node_1 and the node at node_2"""
self.connections[node_1].add(node_2)
self.connections[node_2].add(node_1)
And while I was at it, I promoted your comments to proper docstrings. Now you can do help(Graph.connect)
in an interactive session.
I also changed the name of the internal representation to self.connections
.
The business logic of graphs is mathematical in nature. As implemented the Graph
class does not directly reflect the underlying mathematics. Poor mapping between code and graph mathematics is not uncommon even for graph implementations in well regraded text books and online tutorials.
Mathematics
A graph G
is defined as a set of nodes or vertices V = {v1, v2...vn}
and a bag of edges E = {e1, e2, ...em}
. The only relation between edges and vertices is that for each edge e
between vertices u
and v
both u
and v
must be members of V
.
Dependencies
Mathematically, a set of vertices V
is independent of the set of edges E
. Two different graphs G1
and G2
can be defined across the same set of vertices based solely on the difference between two sets of edges E1
and E2
.
G1 = V, E1
G2 = V, E2
Nodes are necessarily properties of Edges. Edges are not properties of nodes.
Edges are [ properties | fields | objects ] of a graph. The dependencies are:
Graph <- Edges
Graph <- Nodes
Edge <- Node, Node
An Minimal Implementation
This code does not attempt to be particularly Pythonic. It attempts to reflect the underlying mathematics in an object based manner.
from collections import Counter
class Graph:
E = Counter()
V = set()
def add_node(self, node):
self.V.add(node)
def remove_node(self, node):
self.V.remove(node)
def add_edge(self, node_1, node_2):
if self.exists_node(node_1) and self.exists_node(node_2):
edge = str(node_1) + '_' + str(node_2)
self.E[edge]+=1
def remove_edge(self, node_1, node_2):
edge = str(node_1) + '_' + str(node_2)
if self.E[edge] > 0:
self.E[edge]-=1
else:
self.E[edge] = 0
def exists_edge(self, node_1, node_2):
edge_1 = str(node_1) + '_' + str(node_2)
return self.E[edge_1] > 0
def exists_node(self, node):
return node in self.V
The Graph
class maintains the nodes as a set in order to provide set semantics. It uses collection.Counter
to maintain a bag of edges as recommended by the Python documentation. The implementation of undirectional edges is left as an exercise.
Building on Ben Rudgers answer above, I would change some small things. Storing the edge as string seems odd, so I changed it to a namedtuple. Naming the internal variables to something more readable. Raising an error on adding an edge with unknown nodes, instead of ignoring it.
I also added the directly_connected() and connected() methods the OP is asking for.
from collections import namedtuple
Edge = namedtuple('Edge', ['node_1', 'node_2'])
class Graph:
edges = set()
nodes = set()
def add_node(self, node):
self.nodes.add(node)
def remove_node(self, node):
self.nodes.remove(node)
def add_edge(self, node_1, node_2):
if self.exists_node(node_1) and self.exists_node(node_2):
self.edges.add(Edge(node_1, node_2))
else:
msg = 'Either {} or {} are not registered nodes!'
msg = msg.format(node_1, node_2)
raise KeyError(msg)
def remove_edge(self, node_1, node_2):
self.edges.remove(Edge(node_1, node_2))
def exists_edge(self, node_1, node_2):
return Edge(node_1, node_2) in self.edges
def exists_node(self, node):
return node in self.nodes
def directly_connected(self, node_1, node_2):
return self.exists_edge(node_1, node_2,)
def connected(self, node_1, node_2):
if self.directly_connected(node_1, node_2):
return True
for edge in self.edges:
if edge.node_1 == node_1:
return self.connected(edge.node_2, node_2)
return False
g = Graph()
g.add_node('A')
g.add_node('B')
g.add_node('C')
g.add_edge('A', 'B')
g.add_edge('B', 'C')
print("A -> C Directly connected: ", g.directly_connected('A', 'C'))
print("A -> C Connected: ", g.connected('A', 'C'))
Returns
A -> C Directly connected: False
A -> C Connected: True