Monorepo Git techniques: sparse checkout, partial clones, and staying fast at scale
A 5 GB monorepo with a 14-minute clone is fixable. Sparse checkout, partial clones, commit-graph, and fsmonitor — practical Git techniques for monorepos at scale.
A new engineer joined a team I work with last month. On her first day, she ran git clone. Fourteen minutes later, she had a 5 GB working directory containing forty services, half of which she would never touch. Her laptop fans never stopped.
Monorepos work. Google, Meta, Stripe, and many others run them at huge scale. But "monorepo" out of the box, with default Git settings, gets slow as the repo grows. The fix is not to abandon the model. The fix is a handful of Git features — most of them recent, all of them well-supported — that make a big repo feel small on each engineer's machine.
This post is the operational companion to the strategy posts in this series. It covers what a monorepo actually is, why companies use them, and the specific Git commands and configuration that keep one fast at scale.
What a monorepo actually is
A monorepo is one Git repository that contains many projects. Inside, you might have:
my-monorepo/
├── apps/
│ ├── web/
│ ├── mobile/
│ └── admin/
├── services/
│ ├── auth/
│ ├── billing/
│ ├── search/
│ └── notifications/
├── libs/
│ ├── ui-components/
│ ├── api-client/
│ └── shared-types/
└── infra/
├── terraform/
└── k8s/
Each folder might be its own deployable thing. The point of putting them in one repo is atomic cross-project changes. If you change libs/api-client, you can update every consumer in services/ and apps/ in the same commit. With multi-repo, that change requires N PRs across N repos, often merged in a careful order.
This is the upside. The downside is the repo gets large, and naive git operations get slow.
Why companies use monorepos
The case for monorepos is well documented. Three primary sources worth reading:
- Google's 2016 paper, "Why Google Stores Billions of Lines of Code in a Single Repository," describes the technical and organisational reasons their monorepo works at the scale of thousands of engineers and tens of millions of files.
- Meta has written about Sapling, the source-control tool they built to handle their own monorepo when Git could not keep up.
- Microsoft has written about Scalar and partial clones, which were partly motivated by the Windows codebase migration to Git.
The recurring themes: atomic cross-project changes, shared tooling, one place to search, easier dependency management. The recurring caveat: at large scale, you need specialised tooling.
The good news for most teams: you are not Google. You can get most of the monorepo benefits with techniques that are now part of plain git, no exotic tooling required.
Sparse checkout: don't download files you don't use
Sparse checkout lets you have the full repo history but only check out a subset of files into your working directory. The engineer working on services/billing/ does not need apps/mobile/ on disk.
The modern interface is git sparse-checkout, added in Git 2.25. The reference is the git-sparse-checkout(1) man page.
Set it up:
# Clone normally
git clone https://github.com/org/monorepo.git
cd monorepo
# Enable sparse checkout in "cone" mode (faster, simpler)
git sparse-checkout init --cone
# Pick the directories you want
git sparse-checkout set services/billing libs/api-client libs/shared-types
After set, your working directory shrinks to just those folders (plus top-level files). The other directories still exist in .git/objects but are not on disk. git pull, git checkout, git status all run faster because they have less to scan.
To add another directory later:
git sparse-checkout add services/notifications
To see what is currently checked out:
git sparse-checkout list
To turn it off:
git sparse-checkout disable
Caveats:
- Cone mode is much faster than the older pattern mode. Use cone mode unless you have a specific reason not to.
- Searches across the whole codebase (
git grep, IDE-wide search) only see the checked-out files. You may want to keepdocs/and a few common paths always present. - Tools that read files outside the cone (build systems, lint configs) need to be aware. Most modern build tools handle this well.
Partial clone: don't even download history you don't use
Sparse checkout still downloads the whole .git/objects directory. For a 5 GB monorepo, that is the slow part of git clone.
Partial clone (Git 2.22+) downloads only what you need when you need it. The most common form filters out blobs (file contents):
git clone --filter=blob:none https://github.com/org/monorepo.git
The clone now downloads only the commit graph and trees — typically 10–50× smaller. When you check out a commit, Git fetches the blobs for that commit on demand.
Combine with sparse checkout for maximum effect:
git clone --filter=blob:none --no-checkout https://github.com/org/monorepo.git
cd monorepo
git sparse-checkout init --cone
git sparse-checkout set services/billing libs/api-client
git checkout main
The 14-minute clone becomes 90 seconds. The working directory is 200 MB instead of 5 GB.
Microsoft's introduction to Scalar covers the engineering behind partial clone in more depth. Scalar itself is now shipped with Git as a wrapper that turns these features on by default.
Caveats:
- Some operations (like
git log -pon old commits) trigger on-demand fetches. They work, but they can be slow. - Servers must support the protocol. GitHub and GitLab do. Older self-hosted servers may not.
commit-graph: skip the work Git did not need to do
Git stores history as a chain of commits. To answer "is commit X an ancestor of Y?" Git walks back through the chain. On a million-commit monorepo, this walk takes seconds.
The commit-graph file (Git 2.18+) pre-computes commit metadata into a single file. Operations that walk history get dramatically faster.
Enable it once:
git config core.commitGraph true
git config gc.writeCommitGraph true
git commit-graph write --reachable
Now git log --graph, git branch --contains, git merge-base, and many other commands run faster. The file regenerates during git gc.
The git-commit-graph(1) man page has the full reference.
fsmonitor: tell Git which files changed without scanning them
When you run git status in a 5 GB working directory, Git scans every file to see what changed. This is slow.
fsmonitor (Git 2.36+ for the built-in version) lets Git ask the operating system "which files changed since I last looked?" The OS already knows — it just had to be asked properly.
Enable the built-in fsmonitor:
git config core.fsmonitor true
git config core.untrackedCache true
git status in a large repo goes from 3 seconds to under 200 milliseconds. This is the single biggest quality-of-life improvement for engineers working in a large repo every day.
The git-config documentation for core.fsmonitor covers the options. It works on macOS, Windows, and recent Linux kernels.
CODEOWNERS sharding by path
A monorepo is shared by many teams. Reviews must route to the right people. CODEOWNERS is the tool, but in a monorepo, it has to be carefully sharded by path.
A good pattern:
# Top-level defaults
* @org/platform-leads
# Per-service ownership
/services/auth/ @org/auth-team
/services/billing/ @org/billing-team
/services/search/ @org/search-team
/services/notifications/ @org/messaging-team
# App-level
/apps/web/ @org/web-team
/apps/mobile/ @org/mobile-team
/apps/admin/ @org/internal-tools
# Shared libraries — require both lib team and any consumer reviewer
/libs/api-client/ @org/platform-leads
/libs/ui-components/ @org/design-system
# Infrastructure — locked down
/infra/ @org/sre @org/security
/.github/workflows/ @org/sre
This way, a PR touching only services/billing/ automatically requests review from @org/billing-team and does not bother anyone else. A PR touching libs/api-client/ triggers platform review. A cross-cutting PR triggers multiple teams.
Combine with branch protection that requires Code Owner approval. Now the routing is enforced, not just suggested.
Conditional CI per path
Running every test in a 5 GB monorepo for every PR is wasteful. A change to services/billing/ should not trigger the mobile-app test suite.
Most CI providers support path filters. A GitHub Actions example:
on:
pull_request:
paths:
- "services/billing/**"
- "libs/api-client/**"
- "libs/shared-types/**"
This workflow only runs when the PR touches those paths. Each service gets its own workflow file with its own path filter. Time saved per PR adds up across thousands of PRs a month.
For more complex setups (transitive dependencies between projects), build tools like Bazel, Nx, or Turborepo can compute the actual affected set. But for many monorepos, simple path filters get you 80% of the way there with 5% of the complexity.
Maintenance commands that keep a monorepo fast
A few git commands and configurations that pay off at monorepo scale. Run them periodically (or automate them).
git maintenance
Git 2.29 introduced the git maintenance command, which packages routine maintenance into a single scheduled task. It runs gc, prefetches from remotes, updates the commit-graph, and more.
# Schedule background maintenance on this repo
git maintenance start
The git-maintenance(1) man page covers the schedule options. Enabling this once per repo gives you most of the maintenance benefits without manual cron jobs.
Repacking
Over time, a busy repo accumulates many small pack files. Repacking merges them, often shrinking the repo by 20–50%.
git gc --aggressive
# or for very large repos:
git repack -a -d -f --depth=250 --window=250
This is slow (minutes for a large monorepo), so run it during off-hours.
Pruning unused objects
git prune --expire=now
Removes orphan objects from the database. Combine with git gc to fully reclaim disk space.
A weekly cron
Many monorepo teams run a weekly maintenance cron that performs these steps and reports the repo size. If size grows unexpectedly, that signals a problem (binary blobs committed, history written to inappropriate places).
Path-based ownership at scale
In a monorepo with 40 services and 8 teams, ownership routing is what keeps reviews sensible. CODEOWNERS by path was covered above; two additions worth knowing.
Hierarchical defaults. Patterns are last-match-wins by default in GitHub's CODEOWNERS, so put broader rules earlier and narrower rules later. This lets you say "anything under /services/ is owned by service-platform, but /services/payments/ is also owned by payments-team."
Cross-team review for shared libraries. When a library is consumed by many teams, requiring approval from any consumer of the library prevents one team from making changes that break others. List multiple owners on the path; require all to approve via "Require review from Code Owners."
When NOT to monorepo
Monorepos have a real cost. They are not always the right answer.
Strongly independent teams. If your "platform" team has not touched the "data science" team's code in two years, and they have different release cadences, a monorepo creates coupling that does not need to exist.
External open-source release. If one of your projects needs to ship as an open-source library, putting it in a monorepo with proprietary code makes that release awkward. A separate repo from day one is simpler.
Wildly different toolchains. A monorepo where one service is in Rust, another in Python, another in Java is workable, but each toolchain wants its own conventions. Path-scoped CI helps; cultural overhead remains.
Teams below 5 engineers, fewer than 3 services. The benefits of a monorepo show up at coordination scale. With 3 engineers and 2 services, multi-repo is simpler.
A worked example: cutting a 5 GB clone to 90 seconds
To make the techniques concrete, here is a worked example combining everything. Start state: 5 GB monorepo, 40 services, default Git settings, 14-minute clones, 3-second git status.
Step 1: Partial clone.
git clone --filter=blob:none --no-checkout \
https://github.com/org/monorepo.git
cd monorepo
Clone now downloads ~250 MB instead of 5 GB. Time: ~90 seconds.
Step 2: Sparse checkout for this engineer.
This engineer works on the billing service. She also touches shared-types and api-client regularly.
git sparse-checkout init --cone
git sparse-checkout set \
services/billing \
libs/shared-types \
libs/api-client \
docs
git checkout main
Working directory now has just those folders. About 180 MB on disk.
Step 3: Enable maintenance and fsmonitor.
git config core.commitGraph true
git config gc.writeCommitGraph true
git config core.fsmonitor true
git config core.untrackedCache true
git maintenance start
Background tasks now keep the commit-graph fresh and prefetch from remotes. git status drops from 3 seconds to under 200ms.
Step 4: Repository-level configuration (commit once, applies to everyone).
In the repo, set defaults that benefit everyone:
# .gitattributes — mark large generated files as binary to avoid diff cost
*.lock binary
build/** export-ignore
And a .gitconfig shipped with the repo or referenced in onboarding docs.
Result: a new engineer can be productive in under 5 minutes from git clone. The full repo history is there. The full set of files is available when she needs to expand her sparse checkout. But day-to-day, she operates on a 180 MB slice.
This is what most monorepo teams converge on. The techniques are not exotic. They are just default-off in Git, and you have to opt in.
Common myths
Myth 1: "Monorepos require Bazel." Wrong. Bazel solves build speed at huge scale. It is a build-system answer to a build-system problem. Most monorepos do fine with the build tool each language already provides, plus path-scoped CI. Bazel becomes worth it when build times start dominating engineer-hours.
Myth 2: "Monorepos don't scale past a few GB." Wrong with modern Git. Sparse checkout + partial clone + commit-graph + fsmonitor make a 50 GB monorepo behave like a 500 MB one for daily work. Google, Meta, and Microsoft have run far larger.
Myth 3: "Monorepos force everyone onto the same release schedule." Wrong. Each service in a monorepo can deploy on its own schedule. The repo is shared; the deployment pipelines are not. Many monorepos have services that deploy every commit alongside services that deploy quarterly.
What to read next
- Multi-repo coordination: submodules, subtrees, and internal packages compared — the alternative when a monorepo is not right.
- Scaling a Git workflow from solo to large team — how team scale interacts with repo shape.
- Trunk-based development: when it wins and when it doesn't — the workflow that pairs naturally with monorepos.
- Choosing a Git workflow: a decision guide for real teams — the series overview.
If you want to feel how Git's maintenance commands keep a large repo fast, the Repository Maintenance lesson below opens a live terminal where you can try git gc, git commit-graph, and friends in two minutes.