All posts
·14 min read· monorepo· performance· sparse-checkout· partial-clone· scaling

Monorepo Git techniques: sparse checkout, partial clones, and staying fast at scale

A 5 GB monorepo with a 14-minute clone is fixable. Sparse checkout, partial clones, commit-graph, and fsmonitor — practical Git techniques for monorepos at scale.

A new engineer joined a team I work with last month. On her first day, she ran git clone. Fourteen minutes later, she had a 5 GB working directory containing forty services, half of which she would never touch. Her laptop fans never stopped.

Monorepos work. Google, Meta, Stripe, and many others run them at huge scale. But "monorepo" out of the box, with default Git settings, gets slow as the repo grows. The fix is not to abandon the model. The fix is a handful of Git features — most of them recent, all of them well-supported — that make a big repo feel small on each engineer's machine.

This post is the operational companion to the strategy posts in this series. It covers what a monorepo actually is, why companies use them, and the specific Git commands and configuration that keep one fast at scale.

What a monorepo actually is

A monorepo is one Git repository that contains many projects. Inside, you might have:

my-monorepo/
├── apps/
│   ├── web/
│   ├── mobile/
│   └── admin/
├── services/
│   ├── auth/
│   ├── billing/
│   ├── search/
│   └── notifications/
├── libs/
│   ├── ui-components/
│   ├── api-client/
│   └── shared-types/
└── infra/
    ├── terraform/
    └── k8s/

Each folder might be its own deployable thing. The point of putting them in one repo is atomic cross-project changes. If you change libs/api-client, you can update every consumer in services/ and apps/ in the same commit. With multi-repo, that change requires N PRs across N repos, often merged in a careful order.

This is the upside. The downside is the repo gets large, and naive git operations get slow.

Why companies use monorepos

The case for monorepos is well documented. Three primary sources worth reading:

The recurring themes: atomic cross-project changes, shared tooling, one place to search, easier dependency management. The recurring caveat: at large scale, you need specialised tooling.

The good news for most teams: you are not Google. You can get most of the monorepo benefits with techniques that are now part of plain git, no exotic tooling required.

Sparse checkout: don't download files you don't use

Sparse checkout lets you have the full repo history but only check out a subset of files into your working directory. The engineer working on services/billing/ does not need apps/mobile/ on disk.

The modern interface is git sparse-checkout, added in Git 2.25. The reference is the git-sparse-checkout(1) man page.

Set it up:

# Clone normally
git clone https://github.com/org/monorepo.git
cd monorepo

# Enable sparse checkout in "cone" mode (faster, simpler)
git sparse-checkout init --cone

# Pick the directories you want
git sparse-checkout set services/billing libs/api-client libs/shared-types

After set, your working directory shrinks to just those folders (plus top-level files). The other directories still exist in .git/objects but are not on disk. git pull, git checkout, git status all run faster because they have less to scan.

To add another directory later:

git sparse-checkout add services/notifications

To see what is currently checked out:

git sparse-checkout list

To turn it off:

git sparse-checkout disable

Caveats:

Partial clone: don't even download history you don't use

Sparse checkout still downloads the whole .git/objects directory. For a 5 GB monorepo, that is the slow part of git clone.

Partial clone (Git 2.22+) downloads only what you need when you need it. The most common form filters out blobs (file contents):

git clone --filter=blob:none https://github.com/org/monorepo.git

The clone now downloads only the commit graph and trees — typically 10–50× smaller. When you check out a commit, Git fetches the blobs for that commit on demand.

Combine with sparse checkout for maximum effect:

git clone --filter=blob:none --no-checkout https://github.com/org/monorepo.git
cd monorepo
git sparse-checkout init --cone
git sparse-checkout set services/billing libs/api-client
git checkout main

The 14-minute clone becomes 90 seconds. The working directory is 200 MB instead of 5 GB.

Microsoft's introduction to Scalar covers the engineering behind partial clone in more depth. Scalar itself is now shipped with Git as a wrapper that turns these features on by default.

Caveats:

commit-graph: skip the work Git did not need to do

Git stores history as a chain of commits. To answer "is commit X an ancestor of Y?" Git walks back through the chain. On a million-commit monorepo, this walk takes seconds.

The commit-graph file (Git 2.18+) pre-computes commit metadata into a single file. Operations that walk history get dramatically faster.

Enable it once:

git config core.commitGraph true
git config gc.writeCommitGraph true
git commit-graph write --reachable

Now git log --graph, git branch --contains, git merge-base, and many other commands run faster. The file regenerates during git gc.

The git-commit-graph(1) man page has the full reference.

fsmonitor: tell Git which files changed without scanning them

When you run git status in a 5 GB working directory, Git scans every file to see what changed. This is slow.

fsmonitor (Git 2.36+ for the built-in version) lets Git ask the operating system "which files changed since I last looked?" The OS already knows — it just had to be asked properly.

Enable the built-in fsmonitor:

git config core.fsmonitor true
git config core.untrackedCache true

git status in a large repo goes from 3 seconds to under 200 milliseconds. This is the single biggest quality-of-life improvement for engineers working in a large repo every day.

The git-config documentation for core.fsmonitor covers the options. It works on macOS, Windows, and recent Linux kernels.

CODEOWNERS sharding by path

A monorepo is shared by many teams. Reviews must route to the right people. CODEOWNERS is the tool, but in a monorepo, it has to be carefully sharded by path.

A good pattern:

# Top-level defaults
*                                       @org/platform-leads

# Per-service ownership
/services/auth/                         @org/auth-team
/services/billing/                      @org/billing-team
/services/search/                       @org/search-team
/services/notifications/                @org/messaging-team

# App-level
/apps/web/                              @org/web-team
/apps/mobile/                           @org/mobile-team
/apps/admin/                            @org/internal-tools

# Shared libraries — require both lib team and any consumer reviewer
/libs/api-client/                       @org/platform-leads
/libs/ui-components/                    @org/design-system

# Infrastructure — locked down
/infra/                                 @org/sre @org/security
/.github/workflows/                     @org/sre

This way, a PR touching only services/billing/ automatically requests review from @org/billing-team and does not bother anyone else. A PR touching libs/api-client/ triggers platform review. A cross-cutting PR triggers multiple teams.

Combine with branch protection that requires Code Owner approval. Now the routing is enforced, not just suggested.

Conditional CI per path

Running every test in a 5 GB monorepo for every PR is wasteful. A change to services/billing/ should not trigger the mobile-app test suite.

Most CI providers support path filters. A GitHub Actions example:

on:
  pull_request:
    paths:
      - "services/billing/**"
      - "libs/api-client/**"
      - "libs/shared-types/**"

This workflow only runs when the PR touches those paths. Each service gets its own workflow file with its own path filter. Time saved per PR adds up across thousands of PRs a month.

For more complex setups (transitive dependencies between projects), build tools like Bazel, Nx, or Turborepo can compute the actual affected set. But for many monorepos, simple path filters get you 80% of the way there with 5% of the complexity.

Maintenance commands that keep a monorepo fast

A few git commands and configurations that pay off at monorepo scale. Run them periodically (or automate them).

git maintenance

Git 2.29 introduced the git maintenance command, which packages routine maintenance into a single scheduled task. It runs gc, prefetches from remotes, updates the commit-graph, and more.

# Schedule background maintenance on this repo
git maintenance start

The git-maintenance(1) man page covers the schedule options. Enabling this once per repo gives you most of the maintenance benefits without manual cron jobs.

Repacking

Over time, a busy repo accumulates many small pack files. Repacking merges them, often shrinking the repo by 20–50%.

git gc --aggressive
# or for very large repos:
git repack -a -d -f --depth=250 --window=250

This is slow (minutes for a large monorepo), so run it during off-hours.

Pruning unused objects

git prune --expire=now

Removes orphan objects from the database. Combine with git gc to fully reclaim disk space.

A weekly cron

Many monorepo teams run a weekly maintenance cron that performs these steps and reports the repo size. If size grows unexpectedly, that signals a problem (binary blobs committed, history written to inappropriate places).

Path-based ownership at scale

In a monorepo with 40 services and 8 teams, ownership routing is what keeps reviews sensible. CODEOWNERS by path was covered above; two additions worth knowing.

Hierarchical defaults. Patterns are last-match-wins by default in GitHub's CODEOWNERS, so put broader rules earlier and narrower rules later. This lets you say "anything under /services/ is owned by service-platform, but /services/payments/ is also owned by payments-team."

Cross-team review for shared libraries. When a library is consumed by many teams, requiring approval from any consumer of the library prevents one team from making changes that break others. List multiple owners on the path; require all to approve via "Require review from Code Owners."

When NOT to monorepo

Monorepos have a real cost. They are not always the right answer.

Strongly independent teams. If your "platform" team has not touched the "data science" team's code in two years, and they have different release cadences, a monorepo creates coupling that does not need to exist.

External open-source release. If one of your projects needs to ship as an open-source library, putting it in a monorepo with proprietary code makes that release awkward. A separate repo from day one is simpler.

Wildly different toolchains. A monorepo where one service is in Rust, another in Python, another in Java is workable, but each toolchain wants its own conventions. Path-scoped CI helps; cultural overhead remains.

Teams below 5 engineers, fewer than 3 services. The benefits of a monorepo show up at coordination scale. With 3 engineers and 2 services, multi-repo is simpler.

A worked example: cutting a 5 GB clone to 90 seconds

To make the techniques concrete, here is a worked example combining everything. Start state: 5 GB monorepo, 40 services, default Git settings, 14-minute clones, 3-second git status.

Step 1: Partial clone.

git clone --filter=blob:none --no-checkout \
    https://github.com/org/monorepo.git
cd monorepo

Clone now downloads ~250 MB instead of 5 GB. Time: ~90 seconds.

Step 2: Sparse checkout for this engineer.

This engineer works on the billing service. She also touches shared-types and api-client regularly.

git sparse-checkout init --cone
git sparse-checkout set \
    services/billing \
    libs/shared-types \
    libs/api-client \
    docs
git checkout main

Working directory now has just those folders. About 180 MB on disk.

Step 3: Enable maintenance and fsmonitor.

git config core.commitGraph true
git config gc.writeCommitGraph true
git config core.fsmonitor true
git config core.untrackedCache true
git maintenance start

Background tasks now keep the commit-graph fresh and prefetch from remotes. git status drops from 3 seconds to under 200ms.

Step 4: Repository-level configuration (commit once, applies to everyone).

In the repo, set defaults that benefit everyone:

# .gitattributes — mark large generated files as binary to avoid diff cost
*.lock binary
build/** export-ignore

And a .gitconfig shipped with the repo or referenced in onboarding docs.

Result: a new engineer can be productive in under 5 minutes from git clone. The full repo history is there. The full set of files is available when she needs to expand her sparse checkout. But day-to-day, she operates on a 180 MB slice.

This is what most monorepo teams converge on. The techniques are not exotic. They are just default-off in Git, and you have to opt in.

Common myths

Myth 1: "Monorepos require Bazel." Wrong. Bazel solves build speed at huge scale. It is a build-system answer to a build-system problem. Most monorepos do fine with the build tool each language already provides, plus path-scoped CI. Bazel becomes worth it when build times start dominating engineer-hours.

Myth 2: "Monorepos don't scale past a few GB." Wrong with modern Git. Sparse checkout + partial clone + commit-graph + fsmonitor make a 50 GB monorepo behave like a 500 MB one for daily work. Google, Meta, and Microsoft have run far larger.

Myth 3: "Monorepos force everyone onto the same release schedule." Wrong. Each service in a monorepo can deploy on its own schedule. The repo is shared; the deployment pipelines are not. Many monorepos have services that deploy every commit alongside services that deploy quarterly.

What to read next

If you want to feel how Git's maintenance commands keep a large repo fast, the Repository Maintenance lesson below opens a live terminal where you can try git gc, git commit-graph, and friends in two minutes.