Union-find — also called disjoint set union (DSU) — tracks a set of elements split into non-overlapping groups and answers two questions fast: which group is an element in (find), and merge these two groups (union). With its two standard optimizations it runs in near-O(1) amortized time, which makes it the cleanest tool for connectivity and grouping problems in a coding interview.
When to recognize this pattern
- You need to count connected components or check whether two nodes are in the same component.
- A grid problem in the number of islands family, especially the dynamic “add land one cell at a time and report component count” variants.
- You must detect a cycle in an undirected graph, or find the redundant connection that creates one.
- Grouping by a transitive relationship: accounts merge, friend circles, or equation equivalence (“a == b, b == c, so a == c”).
- Edges arrive incrementally and you must keep answering connectivity queries without re-traversing the graph each time.
The structure
Each element points to a parent; following parents leads to the group’s root, which is its identity. Two elements are connected exactly when they share a root. The whole structure is two arrays: parent and rank. Initially every element is its own parent — n singleton groups — and each union that merges two distinct groups reduces the group count by one.
It helps to see what DSU deliberately does not do. It cannot tell you the path between two nodes, list the members of a group cheaply, or efficiently split a group back apart — union-find is a one-way merge structure. What it does superbly is answer “same group?” and “merge,” repeatedly and incrementally, faster than re-running a traversal. When a problem only needs those two questions, DSU is almost always the shortest correct solution to write under time pressure.
Union by rank + path compression
The two optimizations are what make DSU fast. Union by rank attaches the shorter tree under the taller one so trees stay flat. Path compression re-points every node visited during a find straight at the root, so the next lookup is instant.
class DSU:
def __init__(self, n):
self.parent = list(range(n)) # each node is its own root
self.rank = [0] * n # tree height upper bound
self.count = n # number of disjoint groups
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # path compression
x = self.parent[x]
return x
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb:
return False # already connected
if self.rank[ra] < self.rank[rb]:
ra, rb = rb, ra # ra is the taller root
self.parent[rb] = ra
if self.rank[ra] == self.rank[rb]:
self.rank[ra] += 1
self.count -= 1
return True
Worked examples
Number of connected components
Problem: Given n nodes and an edge list, count the connected components.
Union every edge; the DSU’s running count drops by one on each successful merge and ends at the component count.
def countComponents(n, edges):
dsu = DSU(n)
for a, b in edges:
dsu.union(a, b)
return dsu.count
O(E · α(n)) time. The count bookkeeping inside union means you never run a separate pass to tally roots.
Redundant connection (cycle detection)
Problem: A tree had one extra edge added, forming exactly one cycle. Return that edge.
Process edges in order. The first edge whose endpoints already share a root is the one that closes the cycle.
def findRedundantConnection(edges):
dsu = DSU(len(edges) + 1) # nodes are 1-indexed
for a, b in edges:
if not dsu.union(a, b): # already connected -> this edge is redundant
return [a, b]
return []
O(E · α(n)) time. This “union returns False means cycle” check is the same primitive that drives Kruskal’s minimum spanning tree algorithm — see graph algorithms for coding interviews.
Number of islands
Problem: Count islands of 1s in a grid where adjacent land cells connect.
Map each cell to an id r * cols + c, union each land cell with its right and down land neighbors, then count distinct roots among land cells.
def numIslands(grid):
rows, cols = len(grid), len(grid[0])
dsu = DSU(rows * cols)
for r in range(rows):
for c in range(cols):
if grid[r][c] == "1":
for dr, dc in ((1, 0), (0, 1)):
nr, nc = r + dr, c + dc
if nr < rows and nc < cols and grid[nr][nc] == "1":
dsu.union(r * cols + c, nr * cols + nc)
roots = {dsu.find(r * cols + c)
for r in range(rows) for c in range(cols)
if grid[r][c] == "1"}
return len(roots)
O(R · C · α) time. BFS or DFS also solves the static grid; DSU wins when cells are added one at a time (the “islands II” variant).
Accounts merge
Problem: Merge accounts that share any email; each account also has an owner name.
Union accounts that share an email by mapping each email to the first account index that owned it. Then group emails by root.
def accountsMerge(accounts):
dsu = DSU(len(accounts))
email_to_id = {}
for i, acc in enumerate(accounts):
for email in acc[1:]:
if email in email_to_id:
dsu.union(i, email_to_id[email])
else:
email_to_id[email] = i
groups = {}
for email, i in email_to_id.items():
groups.setdefault(dsu.find(i), []).append(email)
return [[accounts[r][0]] + sorted(emails)
for r, emails in groups.items()]
O(N · K log K) dominated by sorting emails. The DSU itself is near-linear; the trick is mapping the merge key (email) back to an account id.
Variations & pitfalls
- Skip one optimization and it degrades. Without union by rank or path compression, trees can become linked lists and
finddegrades toward O(n). Keep both. - Union by size works too. Tracking subtree size instead of rank gives the same bound and is handy when a problem asks for the largest component’s size.
- Indexing. Many graph problems are 1-indexed; size the arrays to
n + 1so nodenis valid. - Undirected only for cycle detection. The “same root means cycle” rule is for undirected graphs. Directed-cycle detection needs DFS colors or Kahn’s topological sort instead.
- Pitfall: comparing raw parents instead of roots. Always compare
find(a) == find(b), neverparent[a] == parent[b]— a node deep in a tree has the wrong parent.
Reach for the right structure faster
CoPilot Interview suggests the likely pattern — union-find, monotonic stack, graph traversal — with a working solution and Big-O in about four seconds during real Zoom and Teams calls. Free for Windows and macOS.
See the coding interview copilotFAQ
What is union-find / disjoint set union?
Union-find (also called DSU, disjoint set union) is a data structure that tracks a collection of elements partitioned into disjoint groups. It supports two operations: find, which returns the representative root of an element’s group, and union, which merges two groups. It is the go-to tool for connectivity and grouping questions.
When should I use union-find in a coding interview?
Use it when the problem is about connectivity or grouping: counting connected components, number-of-islands variants, detecting a cycle in an undirected graph, redundant connection, accounts merge, or any “are these two things in the same group / merge these groups” question — especially when edges arrive incrementally.
What makes union-find near O(1)?
Two optimizations: union by rank (or size) attaches the smaller tree under the larger to keep trees flat, and path compression flattens the path to the root during find. Together they give an amortized cost per operation of O(α(n)), where α is the inverse Ackermann function — effectively a small constant (at most 4 or 5) for any input you will ever see.
Can union-find detect cycles in a graph?
Yes, in an undirected graph. Process each edge: if its two endpoints already share the same root, adding the edge would close a cycle. If they have different roots, union them. This is the basis of redundant-connection problems and of Kruskal’s minimum spanning tree algorithm.
Union-find or BFS/DFS for connected components?
BFS or DFS works well when the whole graph is known up front. Union-find shines when edges arrive incrementally (a streaming or dynamic-connectivity setting) or when you must repeatedly answer “are these two connected” as you go, since each query is near-constant time without re-traversing the graph.