Git's Batch Fsync Solved My Filesystem Corruption Problem — Then Created a New One
TL;DR: core.fsyncMethod=batch dramatically speeds up git operations that write many objects by batching disk flushes instead of syncing after every single object. It shipped in Git 2.37 (June 2022). If you’re on an older Linux kernel, it can cause disk thrashing because the underlying sync_file_range() syscall behaves poorly under load — writing objects to disk far more often than you’d expect. If your git operations are getting interrupted and leaving zero-byte files behind, batch fsync might help, but check your kernel version first.
I ran into a problem a while back where git operations on my local dev environment were getting interrupted constantly. A git checkout, a git pull, sometimes even a git status would get cut short and leave behind zero-byte files in .git/objects/. Once those files existed, git was effectively broken — it would see the object hash, assume the object was valid, and then choke when it tried to read an empty file. I couldn’t check out branches, I couldn’t fetch, I couldn’t do much of anything until I hunted down and removed the zero-byte files manually.
This was happening often enough that I had a shell alias for the cleanup.
After some digging I found core.fsyncMethod=batch, which seemed like exactly what I needed. It promised to make object writes more atomic and reduce the window during which a partial write could leave you with a corrupt object store. I enabled it and things improved — the interruption problem got much better.
Then I noticed my disk was getting hammered.
What Batch Fsync Actually Does
When git writes loose objects (the individual files under .git/objects/), it needs to make sure those writes survive a crash. The traditional approach is straightforward: write the object, call fsync(), move on to the next one. Every object gets its own sync to durable storage.
This is safe but slow. Each fsync() forces the disk’s hardware write cache to flush all the way to the physical media. On a git add that touches hundreds of files, you’re waiting for hundreds of sequential disk flushes. Benchmarks from the git mailing list showed that adding 500 files took 0.06 seconds without fsync, but 1.88 seconds with per-object fsync on Linux. On macOS it was worse — 11 seconds.
Batch mode, configured with core.fsyncMethod=batch, changes the strategy. Instead of syncing each object individually, it:
- Creates a temporary directory for the new objects
- Writes each object to a file in that temp directory
- Issues a lightweight page cache writeback for each file (not a full hardware flush)
- After all objects are written, issues a single
fsync()to flush everything at once - Renames the objects from the temp directory to their final locations in
.git/objects/
The single-fsync approach brings the 500-file benchmark down to 0.15 seconds on Linux. That’s a 12x speedup over per-object fsync, and nearly as fast as no fsync at all.
The temporary directory is important for correctness. If objects were written directly to .git/objects/ without being synced, git might see a hash that exists but hasn’t been durably written yet. If the system crashed at that moment, the object would be lost — and git would never try to write it again because it already “exists.” By staging objects in a temp directory and only renaming them after the fsync, batch mode ensures that any object visible in the object store is backed by durable data.
When It Shipped
The feature arrived across two git releases:
- Git 2.36 (April 2022) introduced the
core.fsyncandcore.fsyncMethodconfiguration knobs - Git 2.37 (June 2022) added the
batchoption forcore.fsyncMethod
On Windows (Git for Windows), batch mode with loose-object fsyncing is enabled by default. On Linux and macOS, you have to opt in.
To enable it:
git config --global core.fsyncMethod batch
git config --global core.fsync loose-object
The Older Kernel Problem
Here’s where I got burned.
Batch mode on Linux relies on sync_file_range() for the lightweight per-object writeback in step 3 above. This syscall is supposed to push dirty pages from the OS page cache to the disk’s hardware write cache without waiting for the hardware to actually commit them to durable storage. The final fsync() in step 4 then flushes the hardware cache once for everything.
In theory, this separates the expensive part (hardware flush) from the cheap part (page cache writeback) and lets you batch the expensive part. In practice, sync_file_range() has some ugly behaviors, especially on older kernels:
It blocks when it shouldn’t. Despite being designed to be lightweight, sync_file_range() blocks when the disk I/O queue is saturated. If you’re writing a lot of objects — say, during a large rebase or a fresh clone — the queue fills up and every subsequent sync_file_range() call stalls waiting for the disk to catch up. Instead of batching the I/O, you end up with worse throughput than per-object fsync because you’re paying the overhead of sync_file_range() plus the blocking.
It flushes more than you asked for. On XFS, sync_file_range() triggers writeback of neighboring dirty pages beyond the range you specified. This means writing a single small object can cause a cascade of I/O for unrelated data that happened to be nearby in the page cache.
Older kernels have worse writeback heuristics. Kernel changes over the years have adjusted when and how aggressively dirty pages get flushed to disk. Older kernels are more likely to let dirty pages accumulate until a threshold is hit, then flush everything at once in a storm that saturates the disk. Batch mode’s strategy of accumulating many dirty pages and then relying on a clean final flush interacts poorly with this behavior — the kernel-initiated flush competes with git’s explicit flush, and the disk thrashes between them.
The sync_file_range() man page itself calls the function “extremely dangerous” and warns against using it in portable programs. Christoph Hellwig, a Linux kernel developer who reviewed the git patches, was blunt: “Linux doesn’t have any set of syscalls to make batch mode safe.”
What I observed on my system was exactly this kind of thrashing. After enabling batch fsync, git operations that wrote many objects would cause sustained high disk I/O, far beyond what you’d expect for the amount of data being written. The system would slow to a crawl as the disk tried to service competing writeback requests. Ironically, I’d enabled batch mode to make git more resilient to interruptions to improve productivity, and the disk thrashing was making the whole system less responsive.
What to Do About It
If you’re considering enabling batch fsync, check your kernel version first:
uname -r
On recent kernels (5.x and newer), the writeback behavior is generally better-tuned and batch mode works as intended. If you’re on an older kernel and you can’t upgrade, you might be better off sticking with the default fsync method or using writeout-only, which issues pagecache writeback without the batching machinery:
git config --global core.fsyncMethod writeout-only
This gives you some protection against partial writes without the sync_file_range() batching behavior that causes thrashing.
If you’re hitting the problem I originally had — interrupted git operations leaving zero-byte files — the real fix depends on what’s causing the interruptions. In my case, batch fsync was treating a symptom. If you find yourself in the same situation, investigate the root cause before reaching for fsync configuration. But if you’re on a modern kernel, batch fsync is a solid default — it’s fast, it’s reasonably safe for a developer workstation, and it avoids the zero-byte corruption scenario by staging objects in a temp directory before making them visible.
Just don’t assume it’s a free lunch on every system.