This document provides a high-level introduction to Git's architecture and the major subsystems that comprise the Git version control system. It explains how commands are dispatched, how data is stored and accessed, and how the various layers interact. This overview is intended to provide context for the more detailed documentation that follows.
For detailed information on specific subsystems, see:
Git is a distributed version control system built on a layered architecture. The system consists of four primary layers: the user interface layer that handles command dispatch, the command layer implementing porcelain operations, the core subsystems providing reusable functionality, and the storage layer managing persistent data.
Overall Git Architecture - Layer View
Architecture Layers
| Layer | Key Components | Importance | Responsibilities |
|---|---|---|---|
| User Interface | git.c | 2861 | Command parsing, dispatching to 150+ built-in commands, alias resolution |
| Command Layer | builtin/commit.c builtin/fetch.c sequencer.c | 822-1319 | High-level porcelain commands users interact with directly |
| Core Subsystems | refs.c read-cache.c object-file.c diff.c revision.c | 1273-2333 | Reference management, index/staging area, object storage, diff generation, history traversal |
| Storage Layer | packfile.c odb.c shallow.c | 2333 | Packfile compression, object database, shallow clone support |
| Platform Abstraction | git-compat-util.h wrapper.c | 927 | Cross-platform compatibility, safe system call wrappers |
| Build System | Makefile GIT-VERSION-GEN | 2861, 1757 | Platform detection, compilation orchestration, version generation |
The importance scores indicate how frequently each component is modified, reflecting its centrality to Git's operation. The highest-importance components are git.c (command dispatch), Makefile (build system), and the core trio of read-cache.c object-file.c and packfile.c (data storage), all at importance 2333.
Sources: git.c1-900 Makefile1-100 refs.c1-100 read-cache.c1-200 object-file.c1-100 packfile.c1-100 builtin/commit.c builtin/fetch.c sequencer.c diff.c1-100 revision.c1-100
When a user runs a Git command, it flows through a well-defined dispatch mechanism that handles options, aliases, and routing to the appropriate implementation.
Command Dispatch Table
The command dispatch table in git.c529-700 maps command names to implementation functions:
Key dispatch flags defined in git.c21-31:
RUN_SETUP - Requires a Git repositoryRUN_SETUP_GENTLY - Try to find a repository but don't failUSE_PAGER - Automatically paginate outputNEED_WORK_TREE - Requires a working tree (not bare repo)Sources: git.c1-100 git.c157-366 git.c368-464 git.c466-527 builtin.h1-120
Git's data layer consists of three primary storage systems that work together to manage repository state: the object database (content storage), the index (staging area), and the reference system (branch/tag management).
Object Storage Architecture
The object database maintains a chain of object sources (primary plus alternates from $GIT_ALTERNATE_OBJECT_DIRECTORIES). Each source contains both loose objects (individual files) and packfiles (compressed collections). The packfile system at packfile.c1-300 uses MRU (most-recently-used) ordering for performance and supports multi-pack indexes (MIDX) to aggregate multiple packs. Object operations check caches first, then search packfiles, finally falling back to loose objects. Write operations support bulk fsync via tmp_objdir for transactional semantics.
Working Tree & Index Management
The index at read-cache.c1-200 is Git's staging area, containing cache_entry structures with file metadata and object IDs. It maintains several optimizations: a cache tree for efficient tree object creation, a name hash for fast path lookups, and support for split index mode. Index entries carry flags like CE_MATCHED, CE_UPDATE, CE_REMOVE to track their state during operations. Commands like git checkout and git reset use unpack_trees() to apply tree changes with different merge strategies (oneway, twoway, threeway). The index lifecycle involves reading, locking (for atomicity), modifying entries, and writing with COMMIT_LOCK.
Reference Management System
The reference system at refs.c38-66 uses a pluggable backend architecture. The files backend at refs/files-backend.c83-130 stores refs as files under .git/refs/ and maintains an in-memory cache. Refs can be packed into .git/packed-refs via the packed backend. The reftable backend at refs/reftable-backend.c offers better performance for repositories with many references. Reference operations include reading (with symref resolution), updating (single ref), transactions (atomic multi-ref updates with OPEN→PREPARED→CLOSED state machine), deletion, and iteration. Locking uses .lock files with compare-and-swap semantics for atomicity. Reflogs at .git/logs/ track reference history.
Sources: odb.c1-100 packfile.c1-300 read-cache.c1-200 refs.c38-100 refs/files-backend.c83-204 refs/packed-backend.c1-100 refs/reftable-backend.c1-100
History & Revision System
The revision walking system at revision.c1-100 centers on struct rev_info, which configures traversal parameters. Commands call setup_revisions() to parse arguments, prepare_revision_walk() to initialize, and get_revision() in a loop to iterate commits. Traversal uses a priority queue sorted by commit date, marks uninteresting parents for exclusion, and simplifies history using TREESAME detection and bloom filters. git log displays results via show_log() and the diff engine. The diff system at diff.c1-100 supports rename detection via diffcore_rename(), pickaxe search, and multiple output formats. Object flags (SEEN, UNINTERESTING, TREESAME, SHOWN, BOUNDARY) control traversal behavior.
Remote Operations Architecture
Remote operations at builtin/fetch.c are built on layered architecture. remote.c manages configuration (remotes, refspecs, branches, URL rewriting). transport.c provides protocol abstraction supporting Git native protocol, bundles, and external helpers. fetch-pack.c and send-pack.c implement client-side protocols with want/have negotiation for efficient transfer. Server-side upload-pack.c and receive-pack.c handle incoming requests. Advanced features include shallow clones at shallow.c partial clone with object filtering, and submodule recursion. Protocol v2 offers improved capability negotiation over legacy v0/v1.
Sources: revision.c1-100 diff.c1-100 builtin/log.c1-100 builtin/fetch.c1-100 builtin/push.c
Git manages output through a pager system that controls how command output is displayed to users. The pager configuration at pager.c1-100 determines when output should be piped through a pager like less. Commands can specify the USE_PAGER flag in their cmd_struct entry at git.c529-700 to enable automatic pagination.
The pager system checks configuration (core.pager, pager.<cmd>) and whether stdout is a terminal. If pagination is enabled, Git spawns a pager process at pager.c70-150 and redirects stdout/stderr to it. The GIT_PAGER environment variable can override the configured pager. The default pager is defined as "less" at pager.c12
Sources: pager.c1-150 git.c126-148 git.c489-493
Build System
The Makefile1-3000 orchestrates:
config.mak.unamegit binarygit describe or version fileConfiguration Precedence
Configuration is loaded in order (later overrides earlier) by config.c2000-3000:
/etc/gitconfig~/.gitconfig or $XDG_CONFIG_HOME/git/config.git/config.git/config.worktreeGIT_*-c key=valueKey functions:
git_config() at config.c2000-2100 - Read and parse config filesgit_config_get_value() - Query a config keygit_config_set() - Write config valuesSources: Makefile1-1000 config.c1-3000 generate-cmdlist.sh1-100 git.c157-280
Common Operation Pattern
Most Git commands follow this pattern implemented in git.c466-527:
setup_git_directory() if RUN_SETUP flag setgit_config()read_index() at read-cache.c2300-2500refs_resolve_ref_unsafe() at refs.c390-399Sources: git.c466-527 read-cache.c2300-2500 refs.c390-421 packfile.c2000-2100
Git's core data structures represent the fundamental entities managed by the version control system.
Cache Entry (Index Entry)
The struct cache_entry represents a single file in the index/staging area. Defined in the index management code and managed by read-cache.c98-112 it contains:
struct stat_data ce_stat_data - File metadata (mtime, ctime, size, inode)struct object_id oid - SHA-1/SHA-256 hash of the object contentunsigned int ce_mode - File mode (regular file, symlink, gitlink)unsigned int ce_flags - Status flags (CE_VALID, CE_UPDATE, CE_MATCHED, etc.)unsigned int ce_namelen - Length of the file pathchar name[FLEX_ARRAY] - File path (variable length)The index at read-cache.c1-200 maintains an array of these entries sorted by path for efficient binary search.
Packed Git (Packfile)
The struct packed_git at packfile.h15-53 represents a packfile and its index:
struct pack_window *windows - Memory-mapped windows for large pack accessconst void *index_data - Pointer to loaded pack indexsize_t index_size - Size of the index datauint32_t num_objects - Number of objects in the packunsigned char hash[GIT_MAX_RAWSZ] - Pack checksum (SHA-1 or SHA-256)int pack_fd - File descriptor for the packfilechar pack_name[FLEX_ARRAY] - Path to the .pack filePackfiles are accessed via packfile.c270-380 and use memory mapping for efficient access to large packs.
Reference Store
The struct ref_store provides an abstract interface to the reference system. Defined in refs/refs-internal.h it uses a vtable pattern:
const struct ref_storage_be *be - Backend implementation (files, reftable, etc.)struct repository *repo - Repository contextconst char *gitdir - Path to Git directoryConcrete implementations include files_ref_store at refs/files-backend.c83-94 and reftable_backend at refs/reftable-backend.c39-42
Revision Info
The struct rev_info at revision.c configures revision walking:
This structure is central to commands like git log, git rev-list, and git format-patch.
Sources: read-cache.c98-112 packfile.h15-53 packfile.c270-380 refs/refs-internal.h refs/files-backend.c83-94 revision.c1-100
Git's architecture is organized into five distinct layers, each with clear responsibilities:
The system uses a consistent pattern: commands flow through the dispatch layer, load configuration and repository state, perform operations using the data layer, and write results. The pluggable backend architecture (especially for references) allows Git to scale to different repository sizes and use cases.
Sources: git.c1-900 Makefile1-1000 config.c1-500 read-cache.c1-500 packfile.c1-500 refs.c1-500
Refresh this wiki
This wiki was recently refreshed. Please wait 3 days to refresh again.