Union-Find (DSU): The Pattern for Connectivity Problems

Union-find — also called disjoint set union (DSU) — tracks a set of elements split into non-overlapping groups and answers two questions fast: which group is an element in (find), and merge these two groups (union). With its two standard optimizations it runs in near-O(1) amortized time, which makes it the cleanest tool for connectivity and grouping problems in a coding interview.

When to recognize this pattern

You need to count connected components or check whether two nodes are in the same component.
A grid problem in the number of islands family, especially the dynamic “add land one cell at a time and report component count” variants.
You must detect a cycle in an undirected graph, or find the redundant connection that creates one.
Grouping by a transitive relationship: accounts merge, friend circles, or equation equivalence (“a == b, b == c, so a == c”).
Edges arrive incrementally and you must keep answering connectivity queries without re-traversing the graph each time.

The structure

Each element points to a parent; following parents leads to the group’s root, which is its identity. Two elements are connected exactly when they share a root. The whole structure is two arrays: parent and rank. Initially every element is its own parent — n singleton groups — and each union that merges two distinct groups reduces the group count by one.

It helps to see what DSU deliberately does not do. It cannot tell you the path between two nodes, list the members of a group cheaply, or efficiently split a group back apart — union-find is a one-way merge structure. What it does superbly is answer “same group?” and “merge,” repeatedly and incrementally, faster than re-running a traversal. When a problem only needs those two questions, DSU is almost always the shortest correct solution to write under time pressure.

Union by rank + path compression

The two optimizations are what make DSU fast. Union by rank attaches the shorter tree under the taller one so trees stay flat. Path compression re-points every node visited during a find straight at the root, so the next lookup is instant.

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))   # each node is its own root
        self.rank = [0] * n            # tree height upper bound
        self.count = n                 # number of disjoint groups

    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]  # path compression
            x = self.parent[x]
        return x

    def union(self, a, b):
        ra, rb = self.find(a), self.find(b)
        if ra == rb:
            return False               # already connected
        if self.rank[ra] < self.rank[rb]:
            ra, rb = rb, ra            # ra is the taller root
        self.parent[rb] = ra
        if self.rank[ra] == self.rank[rb]:
            self.rank[ra] += 1
        self.count -= 1
        return True

Why it’s near-O(1). With both optimizations, a sequence of m operations on n elements costs O(m · α(n)), where α is the inverse Ackermann function. α(n) grows so slowly it is at most 4 or 5 for any n that fits in this universe — effectively a constant. See the Big-O cheat sheet for how amortized bounds like this are reasoned about.

Worked examples

Number of connected components

Problem: Given n nodes and an edge list, count the connected components.

Union every edge; the DSU’s running count drops by one on each successful merge and ends at the component count.

def countComponents(n, edges):
    dsu = DSU(n)
    for a, b in edges:
        dsu.union(a, b)
    return dsu.count

O(E · α(n)) time. The count bookkeeping inside union means you never run a separate pass to tally roots.

Redundant connection (cycle detection)

Problem: A tree had one extra edge added, forming exactly one cycle. Return that edge.

Process edges in order. The first edge whose endpoints already share a root is the one that closes the cycle.

def findRedundantConnection(edges):
    dsu = DSU(len(edges) + 1)   # nodes are 1-indexed
    for a, b in edges:
        if not dsu.union(a, b):  # already connected -> this edge is redundant
            return [a, b]
    return []

O(E · α(n)) time. This “union returns False means cycle” check is the same primitive that drives Kruskal’s minimum spanning tree algorithm — see graph algorithms for coding interviews.

Number of islands

Problem: Count islands of 1s in a grid where adjacent land cells connect.

Map each cell to an id r * cols + c, union each land cell with its right and down land neighbors, then count distinct roots among land cells.

def numIslands(grid):
    rows, cols = len(grid), len(grid[0])
    dsu = DSU(rows * cols)
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == "1":
                for dr, dc in ((1, 0), (0, 1)):
                    nr, nc = r + dr, c + dc
                    if nr < rows and nc < cols and grid[nr][nc] == "1":
                        dsu.union(r * cols + c, nr * cols + nc)
    roots = {dsu.find(r * cols + c)
             for r in range(rows) for c in range(cols)
             if grid[r][c] == "1"}
    return len(roots)

O(R · C · α) time. BFS or DFS also solves the static grid; DSU wins when cells are added one at a time (the “islands II” variant).

Accounts merge

Problem: Merge accounts that share any email; each account also has an owner name.

Union accounts that share an email by mapping each email to the first account index that owned it. Then group emails by root.

def accountsMerge(accounts):
    dsu = DSU(len(accounts))
    email_to_id = {}
    for i, acc in enumerate(accounts):
        for email in acc[1:]:
            if email in email_to_id:
                dsu.union(i, email_to_id[email])
            else:
                email_to_id[email] = i
    groups = {}
    for email, i in email_to_id.items():
        groups.setdefault(dsu.find(i), []).append(email)
    return [[accounts[r][0]] + sorted(emails)
            for r, emails in groups.items()]

O(N · K log K) dominated by sorting emails. The DSU itself is near-linear; the trick is mapping the merge key (email) back to an account id.

Variations & pitfalls

Skip one optimization and it degrades. Without union by rank or path compression, trees can become linked lists and find degrades toward O(n). Keep both.
Union by size works too. Tracking subtree size instead of rank gives the same bound and is handy when a problem asks for the largest component’s size.
Indexing. Many graph problems are 1-indexed; size the arrays to n + 1 so node n is valid.
Undirected only for cycle detection. The “same root means cycle” rule is for undirected graphs. Directed-cycle detection needs DFS colors or Kahn’s topological sort instead.
Pitfall: comparing raw parents instead of roots. Always compare find(a) == find(b), never parent[a] == parent[b] — a node deep in a tree has the wrong parent.

Reach for the right structure faster

CoPilot Interview suggests the likely pattern — union-find, monotonic stack, graph traversal — with a working solution and Big-O in about four seconds during real Zoom and Teams calls. Free for Windows and macOS.

See the coding interview copilot

FAQ

What is union-find / disjoint set union?

Union-find (also called DSU, disjoint set union) is a data structure that tracks a collection of elements partitioned into disjoint groups. It supports two operations: find, which returns the representative root of an element’s group, and union, which merges two groups. It is the go-to tool for connectivity and grouping questions.

When should I use union-find in a coding interview?

Use it when the problem is about connectivity or grouping: counting connected components, number-of-islands variants, detecting a cycle in an undirected graph, redundant connection, accounts merge, or any “are these two things in the same group / merge these groups” question — especially when edges arrive incrementally.

What makes union-find near O(1)?

Two optimizations: union by rank (or size) attaches the smaller tree under the larger to keep trees flat, and path compression flattens the path to the root during find. Together they give an amortized cost per operation of O(α(n)), where α is the inverse Ackermann function — effectively a small constant (at most 4 or 5) for any input you will ever see.

Can union-find detect cycles in a graph?

Yes, in an undirected graph. Process each edge: if its two endpoints already share the same root, adding the edge would close a cycle. If they have different roots, union them. This is the basis of redundant-connection problems and of Kruskal’s minimum spanning tree algorithm.

Union-find or BFS/DFS for connected components?

BFS or DFS works well when the whole graph is known up front. Union-find shines when edges arrive incrementally (a streaming or dynamic-connectivity setting) or when you must repeatedly answer “are these two connected” as you go, since each query is near-constant time without re-traversing the graph.

Union-Find (DSU): The Pattern for Connectivity Problems

When to recognize this pattern

The structure

Union by rank + path compression

Worked examples

Number of connected components

Redundant connection (cycle detection)

Number of islands

Accounts merge

Variations & pitfalls

Reach for the right structure faster

FAQ

Related guides