Skip to content

Graphs

Graphs arise in fascinating ways in the real world, and are an intuitive way to structure and understand many phenomenon, such as networking, digital mapping, spatially distributions, and causally relations. They find useful applications in path finding, routing, and recommendation systems.

null

Graphs consist of nodes (vertices) and edges. You should know at least three types of graphs: undirected, directed, and directed acyclic. Topological ordering is possible for directed acyclic graphs---it is the process of representing a graph as a list whereby each node with outbound edges precedes all of it's downstream nodes. In other words, no node in the ordering comes before a node directed at it. For the DAG above, a topological ordering is $[1, 2, 3, 4, 7, 6, 5]$. A graph is said to be connected if it is undirected, and you can access each node from any other node. A connected component is a connected subgraph of any larger graph that may itself be disconnected. Finally, a weakly connected graph is a directed graph which, if it were be be converted into an undirected graph, would be connected. In-degree is the number of inbound edges to a node; out-degree being the opposite.

A binary tree is a special type of directed acyclic graph.

It is best to represent a graph using an adjacency list---a hash map with nodes as keys, and a list of nodes to which it has an edge. You may create graph nodes with additional data to supplement the data structure, such as weights. See also adjacency matrix, a sparse graph representation.

Complexity
  • DFS & BFS on a graph have linear time complexity with the size of the graph $O(V + E)$, and linear space complexity $O(V)$.
Working with graphs
  • Know how to implement topological sorting for both BFS and DFS.
  • DFS works great for finding structure in graphs, e.g., cycles and connected components
  • BFS works great for optimization problems, e.g., finding the shortest path from one vertex to another. These include Dijkstra's algorithm and minimum spanning tree.
Python: Adjacency List

Create an adjacency list by declaring a defaultdict of type list and then fill, for example,

python
from collections import defaultdict
adj_list = defaultdict(list)
for dest, src in known_edges:
    adj_list[src].append(dest)
from collections import defaultdict
adj_list = defaultdict(list)
for dest, src in known_edges:
    adj_list[src].append(dest)

Searching (DFS & BFS)

python
def graph_dfs(graph):
    visited = set()

    def traverse(node):
        print(node)
        if node in visited:
            return

        visited.add(node)

        for neighbor in graph[node]:
            traverse(neighbor)
        return

    traverse(1)  # Assuming we know the first element in the DAG
    return
def graph_dfs(graph):
    visited = set()

    def traverse(node):
        print(node)
        if node in visited:
            return

        visited.add(node)

        for neighbor in graph[node]:
            traverse(neighbor)
        return

    traverse(1)  # Assuming we know the first element in the DAG
    return
python
def graph_bfs(graph):
    visited = set()

    q = collections.deque()
    q.append(1)

    while q:
        node = q.popleft()
        if node in visited:
            continue
        visited.add(node)

        for dest in graph[node]:
            q.append(dest)
    return
def graph_bfs(graph):
    visited = set()

    q = collections.deque()
    q.append(1)

    while q:
        node = q.popleft()
        if node in visited:
            continue
        visited.add(node)

        for dest in graph[node]:
            q.append(dest)
    return

Topological sorting

+ Wikipedia

Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited_._ A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time.

python
def topo_ordering_dfs(graph):
    visited = set()
    output = []

    def traverse(node):
        if node in visited:
            return

        visited.add(node)

        for neighbor in graph[node]:
            traverse(neighbor)
        output.append(node)

        return

    traverse(1)  # Assuming we know the first element in the DAG
    return list(reversed(output))
def topo_ordering_dfs(graph):
    visited = set()
    output = []

    def traverse(node):
        if node in visited:
            return

        visited.add(node)

        for neighbor in graph[node]:
            traverse(neighbor)
        output.append(node)

        return

    traverse(1)  # Assuming we know the first element in the DAG
    return list(reversed(output))
Review Kahn's algorithm for BFS
Classes of problems
  1. Shortest path: Dijkstra or A*
  2. Minimum spanning tree: Lowest weight subtree for a graph
  3. Matching: Only one inbound edge for each vertex
  4. Maximum flow:

Dijkstra's algorithm

Youtube

Context. You have a graph with vertices and edges, each edge is annotated with a weight representing the difficultly of traversal.

The key underlying data structure of this search algorithm is the priority queue (or it may be implemented with a heap). It is used to keep track of the graph nodes that have been visited.

  1. Starting from the origin graph node, push it into the queue with a weight attribute representing the path length OR difficulty of traversal.
  2. Next visit each neighbouring vertex. For each vertex visited, push it into the queue with the vertex that it most recently travelled through and the the cumulative weight of all edges travelled along its journey.
  3. Once you've visited all neighbours of the current vertex and placed them into the priority queue, set the current vertex aside (into a stack data structure LIFO)
  4. Pop the highest priority vertex from the queue, and repeat from step 2. Once the vertex popped in this step is the destination vertex, proceed to step 5.
  5. Find the stack that you've been pushing completed nodes into
VertexScoreTravelled through
C15B

Relevant problems. see here and Network Delay Time (LeetCode 743)

A* algorithm

A-star is a small addition to Dijkstra that says "we're getting a bit closer". It builds directionality into the algorithm by additionally tagging each vertex with the Euclidean distance to the destination node. (Note that the edges are still tagged with a difficulty score.) So to identify which vertices are rejected from queue, the sum of the difficulty and distance are used. In effect, this prioritizes nodes closer to the destination.

Minimum spanning trees

  • A tree is a connected acyclic graph
  • Spanning tree of graph G is a tree within G that contains all the vertices of G with a subset of edges
  • The find a minimum spanning tree (MST) you need an undirected weighted graph G = (V, E)
  • We want to find a spanning tree T of minimum total weight, whereby the weight is the sum of all edges of T
  • The solution the MST is a Greedy Algorithms & Invariants.

Prim's algorithm

This is a greedy algorithm.

  1. Begin with an origin point, an empty hash map $H$ for representing the MST, and a hash map $L$ with the current lowest values.
  2. From the starting point, add it to $H$, and fill $L$ with values from neighbours of the origin node.
  3. Traverse to the node with the lowest edge weight, add this entry to $H$
  4. Update $L$ with the lowest known edge weights to all observed neighbours

Relevant problems. See Min Cost to Connect All Points (LeetCode 1584)

Bipartite Graph

A bipartite graph consists of two classes of nodes, where each node of class $A$ is connected only to nodes of class $B$. Below is three representations of the same bipartite graph.

BipartiteGraph_1000


Problems

18.1 Search a maze

Difficult(ish) problem. I got many of the fundamental components, but had trouble putting it all together, especially without test cases. This type of problem is perfect for a test case, in an interview, you should build a simple one out and (only 3x3 or such) and test out your DFS solutions on it.

  • No adjaceny list was required for this problem. More of a DFS/BFS problem.
  • Understand how to conduct DFS & BFS on a graph instead of a tree.

18.2 Paint a Boolean Matrix

Good work on this one, using the IOCFDNB rule, was able to digest and create a good plan for the solution. I used DFS. Only small syntax errors, assumptions, etc. Good work.

  • Review complexities for DFS and BFS
  • Review BFS strategy in general

18.7 Transform one string to another

  • I didn't actually code this problem, just thought about it.
  • Shortest paths in an undirected graph are naturally computed with BFS
  • The solution to this problem used the vertices as the entries in the dictionary (which in the python implementation is sorted by default now). Next, loop through all possible 1-character changes to the candidate string, if the candidate string, with 1-char change is in the dictionary, add it to the BFS queue, and increase the 'distance' by one.
  • Again, being familiar with named tuples is a great way to give syntactical meaning to your types.

Course Schedule

Create a schedule list that accomodates a set of prerequisites. Ensure you check for cycles.

  • Use an adjacency list and DFS (or BFS) to conduct topological sorting

Parallel sorting

Parallel sorting: Find the fewest number of semesters to complete all courses

  • Use an adjacency list and DFS (or BFS) to conduct topological sorting

18.6 Making wired connections (Graph is bipartite)

Network Delay Time (LeetCode 743)

Given a set of nodes, and a broadcast network request relayed through these nodes with weights on each edge, determine the minimum time for the signal to reach all nodes.

To solve this problem you have to use Dijkstra's algorithm. The key is to use the priority queue (heap) to filter the next lowest

python
def network_delay_time(times: List[List[int]], n: int, k: int) -> int:
    graph = collections.defaultdict(list)

    for u, v, w in times:
        edge = (w, v)
        graph[u].append(edge)

    heap = [(0, k)]
    running_weights = {}

    while heap:
        curr_w, u = heapq.heappop(heap)

        if u in running_weights:
            continue
        running_weights[u] = curr_w

        for edge_w, v in graph[u]:
            if v in running_weights:
                continue

            new_weight = edge_w + curr_w
            heapq.heappush(heap, (new_weight, v))

    return -1 if len(running_weights) < n else max(running_weights.values())
def network_delay_time(times: List[List[int]], n: int, k: int) -> int:
    graph = collections.defaultdict(list)

    for u, v, w in times:
        edge = (w, v)
        graph[u].append(edge)

    heap = [(0, k)]
    running_weights = {}

    while heap:
        curr_w, u = heapq.heappop(heap)

        if u in running_weights:
            continue
        running_weights[u] = curr_w

        for edge_w, v in graph[u]:
            if v in running_weights:
                continue

            new_weight = edge_w + curr_w
            heapq.heappush(heap, (new_weight, v))

    return -1 if len(running_weights) < n else max(running_weights.values())

Min Cost to Connect All Points (LeetCode 1584)

This is a minimum spanning tree problem. The solution implemented below uses Prim's algorithm, which shadow's Dijkstra's algorithm in that it uses a priority queue (heap) to keep track of which node to evaluate next.

python
def min_cost_to_connect_points(points: List[List[int]]) -> int:
    cost_map = {}
    pq = [(0, 0)]
    visited = set()
    taxicab_distance = lambda a, b: abs(a[0] - b[0]) + abs(a[1] - b[1])

    while pq:
        w, u = heappop(pq)

        if u in visited or cost_map.get(u, inf) < w:
            continue
        visited.add(u)

        for v in range(len(points)):
            if v not in visited:
                distance = taxicab_distance(points[u], points[v])

                if distance < cost_map.get(v, inf):
                    cost_map[v] = distance
                    heappush(pq, (distance, v))

    return sum(cost_map.values())
def min_cost_to_connect_points(points: List[List[int]]) -> int:
    cost_map = {}
    pq = [(0, 0)]
    visited = set()
    taxicab_distance = lambda a, b: abs(a[0] - b[0]) + abs(a[1] - b[1])

    while pq:
        w, u = heappop(pq)

        if u in visited or cost_map.get(u, inf) < w:
            continue
        visited.add(u)

        for v in range(len(points)):
            if v not in visited:
                distance = taxicab_distance(points[u], points[v])

                if distance < cost_map.get(v, inf):
                    cost_map[v] = distance
                    heappush(pq, (distance, v))

    return sum(cost_map.values())

Is Graph Bipartite? (LeetCode 785)

Use BFS from the first point, label each neighbour as 'black', add them to the queue, and then for each new node you pop, check it's colour and assign all neighbours as the opposing colour. If you encounter a neighbour that already has a colour, then ensure that it has the correct colour, otherwise exit with a False, indicating that the graph is not bipartite.

python
def isBipartite(graph: List[List[int]]) -> bool:
    status = [0] * len(graph)

    def bfs(start):
        if status[start]:
            return True
        q = deque([start])
        status[start] = 1

        while q:
            idx = q.popleft()
            edges = graph[idx]


            for e in edges:
                if status[e] == status[idx]:
                    # Then then there is a matching neighbor 
                    # => graph is not bipartite
                    return False
                elif not status[e]:
                    # Then it's 0 => hasn't been visited
                    status[e] = -1 * status[idx]
                    q.append(e)
        return True

    for i in range(len(graph)):
        if not bfs(i):
            return False
    return True
def isBipartite(graph: List[List[int]]) -> bool:
    status = [0] * len(graph)

    def bfs(start):
        if status[start]:
            return True
        q = deque([start])
        status[start] = 1

        while q:
            idx = q.popleft()
            edges = graph[idx]


            for e in edges:
                if status[e] == status[idx]:
                    # Then then there is a matching neighbor 
                    # => graph is not bipartite
                    return False
                elif not status[e]:
                    # Then it's 0 => hasn't been visited
                    status[e] = -1 * status[idx]
                    q.append(e)
        return True

    for i in range(len(graph)):
        if not bfs(i):
            return False
    return True