Graphs
Graphs arise in fascinating ways in the real world, and are an intuitive way to structure and understand many phenomenon, such as networking, digital mapping, spatially distributions, and causally relations. They find useful applications in path finding, routing, and recommendation systems.
Graphs consist of nodes (vertices) and edges. You should know at least three types of graphs: undirected, directed, and directed acyclic. Topological ordering is possible for directed acyclic graphs---it is the process of representing a graph as a list whereby each node with outbound edges precedes all of it's downstream nodes. In other words, no node in the ordering comes before a node directed at it. For the DAG above, a topological ordering is $[1, 2, 3, 4, 7, 6, 5]$. A graph is said to be connected if it is undirected, and you can access each node from any other node. A connected component is a connected subgraph of any larger graph that may itself be disconnected. Finally, a weakly connected graph is a directed graph which, if it were be be converted into an undirected graph, would be connected. In-degree is the number of inbound edges to a node; out-degree being the opposite.
A binary tree is a special type of directed acyclic graph.
It is best to represent a graph using an adjacency list---a hash map with nodes as keys, and a list of nodes to which it has an edge. You may create graph nodes with additional data to supplement the data structure, such as weights. See also adjacency matrix, a sparse graph representation.
Complexity
- DFS & BFS on a graph have linear time complexity with the size of the graph $O(V + E)$, and linear space complexity $O(V)$.
Working with graphs
- Know how to implement topological sorting for both BFS and DFS.
- DFS works great for finding structure in graphs, e.g., cycles and connected components
- BFS works great for optimization problems, e.g., finding the shortest path from one vertex to another. These include Dijkstra's algorithm and minimum spanning tree.
Python: Adjacency List
Create an adjacency list by declaring a
defaultdict
of typelist
and then fill, for example,pythonfrom collections import defaultdict adj_list = defaultdict(list) for dest, src in known_edges: adj_list[src].append(dest)
from collections import defaultdict adj_list = defaultdict(list) for dest, src in known_edges: adj_list[src].append(dest)
Searching (DFS & BFS)
def graph_dfs(graph):
visited = set()
def traverse(node):
print(node)
if node in visited:
return
visited.add(node)
for neighbor in graph[node]:
traverse(neighbor)
return
traverse(1) # Assuming we know the first element in the DAG
return
def graph_dfs(graph):
visited = set()
def traverse(node):
print(node)
if node in visited:
return
visited.add(node)
for neighbor in graph[node]:
traverse(neighbor)
return
traverse(1) # Assuming we know the first element in the DAG
return
def graph_bfs(graph):
visited = set()
q = collections.deque()
q.append(1)
while q:
node = q.popleft()
if node in visited:
continue
visited.add(node)
for dest in graph[node]:
q.append(dest)
return
def graph_bfs(graph):
visited = set()
q = collections.deque()
q.append(1)
while q:
node = q.popleft()
if node in visited:
continue
visited.add(node)
for dest in graph[node]:
q.append(dest)
return
Topological sorting
+ Wikipedia
Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited_._ A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time.
def topo_ordering_dfs(graph):
visited = set()
output = []
def traverse(node):
if node in visited:
return
visited.add(node)
for neighbor in graph[node]:
traverse(neighbor)
output.append(node)
return
traverse(1) # Assuming we know the first element in the DAG
return list(reversed(output))
def topo_ordering_dfs(graph):
visited = set()
output = []
def traverse(node):
if node in visited:
return
visited.add(node)
for neighbor in graph[node]:
traverse(neighbor)
output.append(node)
return
traverse(1) # Assuming we know the first element in the DAG
return list(reversed(output))
Review Kahn's algorithm for BFS
Classes of problems
- Shortest path: Dijkstra or A*
- Minimum spanning tree: Lowest weight subtree for a graph
- Matching: Only one inbound edge for each vertex
- Maximum flow:
Dijkstra's algorithm
Context. You have a graph with vertices and edges, each edge is annotated with a weight representing the difficultly of traversal.
The key underlying data structure of this search algorithm is the priority queue (or it may be implemented with a heap). It is used to keep track of the graph nodes that have been visited.
- Starting from the origin graph node, push it into the queue with a weight attribute representing the path length OR difficulty of traversal.
- Next visit each neighbouring vertex. For each vertex visited, push it into the queue with the vertex that it most recently travelled through and the the cumulative weight of all edges travelled along its journey.
- Once you've visited all neighbours of the current vertex and placed them into the priority queue, set the current vertex aside (into a stack data structure LIFO)
- Pop the highest priority vertex from the queue, and repeat from step 2. Once the vertex popped in this step is the destination vertex, proceed to step 5.
- Find the stack that you've been pushing completed nodes into
Vertex | Score | Travelled through |
---|---|---|
C | 15 | B |
Relevant problems. see here and Network Delay Time (LeetCode 743)
A* algorithm
A-star is a small addition to Dijkstra that says "we're getting a bit closer". It builds directionality into the algorithm by additionally tagging each vertex with the Euclidean distance to the destination node. (Note that the edges are still tagged with a difficulty score.) So to identify which vertices are rejected from queue, the sum of the difficulty and distance are used. In effect, this prioritizes nodes closer to the destination.
Minimum spanning trees
- A tree is a connected acyclic graph
- Spanning tree of graph G is a tree within G that contains all the vertices of G with a subset of edges
- The find a minimum spanning tree (MST) you need an undirected weighted graph G = (V, E)
- We want to find a spanning tree T of minimum total weight, whereby the weight is the sum of all edges of T
- The solution the MST is a Greedy Algorithms & Invariants.
Prim's algorithm
This is a greedy algorithm.
- Begin with an origin point, an empty hash map $H$ for representing the MST, and a hash map $L$ with the current lowest values.
- From the starting point, add it to $H$, and fill $L$ with values from neighbours of the origin node.
- Traverse to the node with the lowest edge weight, add this entry to $H$
- Update $L$ with the lowest known edge weights to all observed neighbours
Relevant problems. See Min Cost to Connect All Points (LeetCode 1584)
Bipartite Graph
A bipartite graph consists of two classes of nodes, where each node of class $A$ is connected only to nodes of class $B$. Below is three representations of the same bipartite graph.
Problems
18.1 Search a maze
Difficult(ish) problem. I got many of the fundamental components, but had trouble putting it all together, especially without test cases. This type of problem is perfect for a test case, in an interview, you should build a simple one out and (only 3x3 or such) and test out your DFS solutions on it.
- No adjaceny list was required for this problem. More of a DFS/BFS problem.
- Understand how to conduct DFS & BFS on a graph instead of a tree.
18.2 Paint a Boolean Matrix
Good work on this one, using the IOCFDNB rule, was able to digest and create a good plan for the solution. I used DFS. Only small syntax errors, assumptions, etc. Good work.
- Review complexities for DFS and BFS
- Review BFS strategy in general
18.7 Transform one string to another
- I didn't actually code this problem, just thought about it.
- Shortest paths in an undirected graph are naturally computed with BFS
- The solution to this problem used the vertices as the entries in the dictionary (which in the python implementation is sorted by default now). Next, loop through all possible 1-character changes to the candidate string, if the candidate string, with 1-char change is in the dictionary, add it to the BFS queue, and increase the 'distance' by one.
- Again, being familiar with named tuples is a great way to give syntactical meaning to your types.
Course Schedule
Create a schedule list that accomodates a set of prerequisites. Ensure you check for cycles.
- Use an adjacency list and DFS (or BFS) to conduct topological sorting
Parallel sorting
Parallel sorting: Find the fewest number of semesters to complete all courses
- Use an adjacency list and DFS (or BFS) to conduct topological sorting
18.6 Making wired connections (Graph is bipartite)
Network Delay Time (LeetCode 743)
Given a set of nodes, and a broadcast network request relayed through these nodes with weights on each edge, determine the minimum time for the signal to reach all nodes.
To solve this problem you have to use Dijkstra's algorithm. The key is to use the priority queue (heap) to filter the next lowest
def network_delay_time(times: List[List[int]], n: int, k: int) -> int:
graph = collections.defaultdict(list)
for u, v, w in times:
edge = (w, v)
graph[u].append(edge)
heap = [(0, k)]
running_weights = {}
while heap:
curr_w, u = heapq.heappop(heap)
if u in running_weights:
continue
running_weights[u] = curr_w
for edge_w, v in graph[u]:
if v in running_weights:
continue
new_weight = edge_w + curr_w
heapq.heappush(heap, (new_weight, v))
return -1 if len(running_weights) < n else max(running_weights.values())
def network_delay_time(times: List[List[int]], n: int, k: int) -> int:
graph = collections.defaultdict(list)
for u, v, w in times:
edge = (w, v)
graph[u].append(edge)
heap = [(0, k)]
running_weights = {}
while heap:
curr_w, u = heapq.heappop(heap)
if u in running_weights:
continue
running_weights[u] = curr_w
for edge_w, v in graph[u]:
if v in running_weights:
continue
new_weight = edge_w + curr_w
heapq.heappush(heap, (new_weight, v))
return -1 if len(running_weights) < n else max(running_weights.values())
Min Cost to Connect All Points (LeetCode 1584)
This is a minimum spanning tree problem. The solution implemented below uses Prim's algorithm, which shadow's Dijkstra's algorithm in that it uses a priority queue (heap) to keep track of which node to evaluate next.
def min_cost_to_connect_points(points: List[List[int]]) -> int:
cost_map = {}
pq = [(0, 0)]
visited = set()
taxicab_distance = lambda a, b: abs(a[0] - b[0]) + abs(a[1] - b[1])
while pq:
w, u = heappop(pq)
if u in visited or cost_map.get(u, inf) < w:
continue
visited.add(u)
for v in range(len(points)):
if v not in visited:
distance = taxicab_distance(points[u], points[v])
if distance < cost_map.get(v, inf):
cost_map[v] = distance
heappush(pq, (distance, v))
return sum(cost_map.values())
def min_cost_to_connect_points(points: List[List[int]]) -> int:
cost_map = {}
pq = [(0, 0)]
visited = set()
taxicab_distance = lambda a, b: abs(a[0] - b[0]) + abs(a[1] - b[1])
while pq:
w, u = heappop(pq)
if u in visited or cost_map.get(u, inf) < w:
continue
visited.add(u)
for v in range(len(points)):
if v not in visited:
distance = taxicab_distance(points[u], points[v])
if distance < cost_map.get(v, inf):
cost_map[v] = distance
heappush(pq, (distance, v))
return sum(cost_map.values())
Is Graph Bipartite? (LeetCode 785)
Use BFS from the first point, label each neighbour as 'black', add them to the queue, and then for each new node you pop, check it's colour and assign all neighbours as the opposing colour. If you encounter a neighbour that already has a colour, then ensure that it has the correct colour, otherwise exit with a False, indicating that the graph is not bipartite.
def isBipartite(graph: List[List[int]]) -> bool:
status = [0] * len(graph)
def bfs(start):
if status[start]:
return True
q = deque([start])
status[start] = 1
while q:
idx = q.popleft()
edges = graph[idx]
for e in edges:
if status[e] == status[idx]:
# Then then there is a matching neighbor
# => graph is not bipartite
return False
elif not status[e]:
# Then it's 0 => hasn't been visited
status[e] = -1 * status[idx]
q.append(e)
return True
for i in range(len(graph)):
if not bfs(i):
return False
return True
def isBipartite(graph: List[List[int]]) -> bool:
status = [0] * len(graph)
def bfs(start):
if status[start]:
return True
q = deque([start])
status[start] = 1
while q:
idx = q.popleft()
edges = graph[idx]
for e in edges:
if status[e] == status[idx]:
# Then then there is a matching neighbor
# => graph is not bipartite
return False
elif not status[e]:
# Then it's 0 => hasn't been visited
status[e] = -1 * status[idx]
q.append(e)
return True
for i in range(len(graph)):
if not bfs(i):
return False
return True