HomeBlog › Design Google Drive

Design Google Drive / Dropbox — System Design Walkthrough

A senior-level walkthrough: requirements, scale math, chunking and dedup, the metadata-vs-blob split, sync with conflict handling, and the sharing model the design turns on.

"Design Google Drive" — or its near-twin "Design Dropbox" — is a staple file storage system design interview question. It looks like "just store files," but it forces you to confront huge binary objects, multi-device sync, conflicts, sharing, and the question of what stays strongly consistent. Below is a structured cloud storage system design walkthrough you can mirror in a real interview, narrating each stage as you go. Keep the system design interview cheat sheet open alongside for the underlying template.

1. Clarify the requirements

Don't start drawing boxes. Scope the problem first so you optimize the right thing.

Functional requirements

Non-functional requirements

I'll de-scope real-time collaborative editing (Google Docs-style operational transforms) and full-text search for now — calling that out shows judgment without burning time.

2. Scale estimates (back-of-the-envelope)

Rough numbers justify the architecture; precision isn't the point.

QuantityAssumptionResult
Users~500M users, ~100M DAU100M active/day
Stored data~10GB average/user × 500M~5 EB raw (before dedup)
Dedup savingsshared/duplicate content is commonlarge fraction reclaimed
Daily uploads~10M files/day average, 4MB chunkstens of TB/day ingest
Read:writedownloads + syncs exceed uploadsread-heavy on blobs

The headline takeaways: storage is enormous (so dedup matters a lot), files are big binary objects (so they don't belong in a database), and the metadata is small but accessed constantly. That split drives the whole architecture in sections 4 and 5.

3. API design

A small surface covers it. Uploads are chunked and resumable; the metadata commit is separate from the byte transfer.

POST /v1/files/upload-init   { name, parentFolderId, size, chunkHashes:[...] }
       -> { fileId, missingChunks:[...] }   // server says which chunks it lacks

PUT  /v1/blocks/{chunkHash}   (binary)      // upload only missing chunks
POST /v1/files/commit         { fileId, version, orderedChunkHashes:[...] }

GET  /v1/files/{fileId}                      // metadata + chunk list
GET  /v1/blocks/{chunkHash}                  // download a chunk (via CDN)
POST /v1/files/{fileId}/share { userId, role }
GET  /v1/changes?cursor=...                  // sync: changes since last cursor

Note upload-init returns missingChunks — the client only uploads chunks the system doesn't already have. That single step is what powers both delta sync and deduplication.

4. The core idea: chunking and deduplication

The central decision in file storage system design is to never treat a file as one opaque blob. Instead, split each file into fixed-size chunks (commonly ~4MB) and content-address each chunk by its hash (e.g., SHA-256).

Metadata vs. blob is the key split. A metadata service holds the file system — names, folder tree, versions, permissions, and the ordered list of chunk hashes that reconstruct each file — in a strongly consistent, indexed database. A blob store holds the raw chunk bytes, content-addressed, in a cheap, massively durable object store behind a CDN. Keeping them separate lets each scale and be consistent on its own terms: small/structured/consistent metadata, huge/immutable/eventually-replicated blobs.

5. Architecture diagram

Clients talk to stateless services behind a gateway; metadata and blob paths diverge.

                    ┌──────────────┐
   ┌────────┐  meta  │  Metadata    │   names, folders, versions,
   │ Client │ ─────► │  service     │── permissions, chunk-hash lists
   │ (sync  │ ◄───── │  (strong DB) │   (sharded by userId)
   │  agent)│  list  └──────┬───────┘
   └───┬────┘               │ change events
       │ chunks             ▼
       │              ┌──────────────┐   pub/sub
       │   missing?   │ Notification │──► other devices pull /changes
       ▼   ──────────►│   service    │
   ┌──────────────┐   └──────────────┘
   │ Block / blob │   content-addressed chunks (hash → bytes)
   │ store + CDN  │   highly durable object storage, dedup'd
   └──────────────┘
Metadata (small, strongly consistent) is separated from blob storage (huge, content-addressed, dedup'd). A notification service drives multi-device sync.

6. Sync and conflict handling

Sync is the part interviewers probe hardest. Each device runs a sync agent that watches the local folder and the server:

Conflicts. Two devices can edit the same file version offline. Every change is versioned, and the metadata service orders writes; when it sees a commit against a version that has already been superseded, it does not silently overwrite. Instead it preserves both — the standard move is to create a "conflicted copy" (e.g., report (Alex's conflicted copy).docx) so no data is lost, leaving the user to reconcile. Version history makes any prior state recoverable.

7. Sharing, permissions, and consistency

Sharing lives entirely in the metadata service. A file or folder has an access-control list mapping users (or links) to roles (viewer/editor). A request to read or write checks the ACL before serving any chunk.

This is where you make a sharp consistency call:

Naming this metadata-strong / blob-eventual split is the highest-signal moment of the interview — it shows you know which guarantees actually matter where.

8. Scaling and bottlenecks

Framework reminder: Cloud storage follows the same arc as every system design answer — requirements → estimates → API → data model → high-level design → the core decision → trade-offs. The metadata-vs-blob and content-addressing ideas recur whenever you separate a small structured index from large immutable payloads — the same instinct behind generating short keys in Design a URL Shortener and storing crawled pages in a web crawler.

Practice cloud storage and other system designs with live AI support

CoPilot Interview surfaces a structured design skeleton — requirements, scale, API, data model, and the core trade-off — in about 4 seconds during real Zoom and Teams calls, so you can practice narrating a clean answer. Free for Windows and macOS. Learn more about the free AI interview assistant or browse the full system design interview guide.

Download free

FAQ

Why split files into chunks when designing Google Drive or Dropbox?

Chunking a file into fixed blocks (commonly around 4MB) enables resumable uploads, parallel transfer, and delta sync: when a file changes, only the modified chunks are re-uploaded instead of the whole file. Each chunk is content-addressed by its hash, which also makes block-level deduplication possible.

How does deduplication work in a cloud storage system?

Each chunk is hashed (e.g., SHA-256) and stored under that hash in blob storage. Before uploading, the client or server checks whether a chunk's hash already exists; if it does, it is not stored again - only a reference is recorded in metadata. Identical chunks across files and users are stored once, saving large amounts of space.

What is the difference between the metadata service and blob storage in Design Google Drive?

The metadata service holds the file system structure - file names, folder hierarchy, versions, permissions, and the ordered list of chunk hashes that make up each file - in a strongly consistent, indexed database. Blob storage holds the actual chunk bytes, content-addressed by hash, in a cheap, highly durable object store behind a CDN. Splitting them lets each scale and be consistent independently.

How do you handle sync conflicts in Dropbox-style storage?

Each file change is versioned. Clients watch for changes and pull updates; the server orders writes and detects when two clients edited the same version concurrently. Rather than silently overwriting, the system keeps both versions - typically creating a "conflicted copy" - so no data is lost, and lets the user reconcile.

Is a cloud storage system strongly or eventually consistent?

It is split. Metadata - permissions, file structure, version pointers - is kept strongly consistent so users never act on a stale view of who can access what. The bulk blob data and its propagation to other devices and regions is allowed to be eventually consistent, since a few seconds of sync delay is acceptable.