The latest on Git updates - The GitHub Blog

Highlights from Git 2.52

Taylor Blau — Mon, 17 Nov 2025 17:54:31 +0000

The open source Git project just released Git 2.52 with features and bug fixes from over 94 contributors, 33 of them new. We last caught up with you on the latest in Git back when 2.51 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Tree-level blame information

If you’re a seasoned Git user, then you are no doubt familiar with git blame, Git’s tool for figuring out which commit most recently modified each line at a given filepath. Git’s blame functionality is great for figuring out when a bug was introduced, or why some code was written the way it was.

If you want to know which commit last modified any portion of a given filepath, that’s easy enough to do with git log -1 -- path/to/my/file, since -1 will give us only the first commit which modifies that path. But what if instead you want to know which commit most recently modified every file in some directory? Answering that question may seem contrived, but it’s not. If you’ve ever looked at a repository’s file listing on GitHub, the middle column of information has a link to the commit which most recently modified that path, along with (part of) its commit message.

GitHub’s repository file listing, showing tree-level blame information.

The question remains: how do we efficiently determine which commit most recently modified each file in a given directory? You could imagine that you might enumerate each tree entry, feeding it to git log -1 and collecting the output there, like so:

$ git ls-tree -z --name-only HEAD^{tree} | xargs -0 -I{} sh -c '
    git log -1 --format="$1 %h %s" -- $1
  ' -- {}  | column -t -l3 
.cirrus.yml     1e77de10810  ci: update FreeBSD image to 14.3
.clang-format   37215410730  clang-format: exclude control macros from SpaceBeforeParens
.editorconfig   c84209a0529  editorconfig: add .bash extension
.gitattributes  d3b58320923  merge-file doc: set conflict-marker-size attribute
.github         5db9d35a28f  Merge branch 'js/ci-github-actions-update'
[...]

That works, but not efficiently. To see why, consider a case with files A, B, and C introduced by commits C1, C2, and C3, respectively. To blame A, we walk from C3 back to C1 in order to determine that C1 was the most recent commit to modify A. That traversal passed through C2 and C3, but since we were only looking for modifications to A, we’ll end up revisiting those commits when trying to blame B and C. In this example, we visit those three commits six times in total, which is twice the necessary number of history traversals.

Git 2.52 introduces a new command which comes up with the same information in a fraction of the time: git last-modified. To get a sense for how much faster last-modified is than the example above, here are some hyperfine results:

Benchmark 1: git ls-tree + log
  Time (mean ± σ):      3.962 s ±  0.011 s    [User: 2.676 s, System: 1.330 s]
  Range (min … max):    3.940 s …  3.984 s    10 runs

Benchmark 2: git last-modified
  Time (mean ± σ):     722.7 ms ±   4.6 ms    [User: 682.4 ms, System: 40.1 ms]
  Range (min … max):   717.3 ms … 731.3 ms    10 runs

Summary
  git last-modified ran
    5.48 ± 0.04 times faster than git ls-tree + log

The core functionality behind git last-modified was written by GitHub over many years (originally called blame-tree in GitHub’s fork of Git), and is what has powered our tree-level blame since 2012. Earlier this year, we shared those patches with engineers at GitLab, who tidied up years of development into a reviewable series of patches which landed in this release.

There are still some features in GitHub’s version of this command that have yet to make their way into a Git release, including an on-disk format to cache the results of previous runs. In the meantime, check out git last-modified, available in Git 2.52.

[source, source, source]

Advanced repository maintenance strategies

Returning readers of this series may recall our coverage of the git maintenance command. If this is your first time reading along, or you could use a refresher, we’ve got you covered.

git maintenance is a Git command which can perform repository housekeeping tasks either on a scheduled or ad-hoc basis. The maintenance command can perform a variety of tasks, like repacking the contents of your repository, updating commit-graphs, expiring stale reflog entries, and much more. Put together, maintenance ensures that your repository continues to operate smoothly and efficiently.

By default (or when running the gc task), git maintenance relies on git gc internally to repack your repository, and remove any unreachable objects. This has a couple of drawbacks, namely that git gc performs “all-into-one” repacks to consolidate the contents of your repository, which can be sluggish for very large repositories. As an alternative, git maintenance has an incremental-repack strategy, but this never prunes out any unreachable objects.

Git 2.52 bridges this gap by introducing a new geometric task within git maintenance that avoids all-into-one repacks when possible, and prunes unreachable objects on a less frequent basis. This new task uses tools (like geometric repacking) that were designed at GitHub and have powered GitHub’s own repository maintenance for many years. Those tools have been in Git since 2.33, but were awkward to use or discover since their implementation was buried within git repack, not git gc.

The geometric task here works by inspecting the contents of your repository to determine if we can combine some number of packfiles to form a geometric progression by object count. If it can, it performs a geometric repack, condensing the contents of your repository without pruning any objects. Alternatively, if a geometric repack would pack the entirety of your repository into a single pack, then a full git gc is performed instead, which consolidates the contents of your repository and prunes out unreachable objects.

Git 2.52 makes it a breeze to keep even your largest repositories running smoothly. Check out the new geometric strategy, or any of the many other capabilities of git maintenance can do in 2.52.

[source]

The tip of the iceberg…

Now that we’ve covered some of the larger changes in more detail, let’s take a closer look at a selection of some other new features and updates in this release.

This release saw a couple of new sub-commands be added to git refs, Git’s relatively new tool for providing low-level access to your repository’s references. Prior to this release, git refs was capable of migrating between reference backends (e.g., to have your repository store reference data in the reftable format), along with verifying the internal representation of those references.
git refs now includes two new sub-commands: git refs list and git refs exists. The former is an alias for git for-each-ref and supports the same set of options. The latter works like git show-ref --exists, and can be used to quickly determine whether or not a given reference exists.
Neither of these new sub-commands introduce new functionality, but they do consolidate a couple of common reference-related operations into a single Git command rather than many individual ones.
[source]
If you’ve ever scripted around Git, you are likely familiar with Git’s rev-parse command. If not, you’d be forgiven for thinking that rev-parse is designed to just resolve the various ways to describe a commit into a full object ID. In reality, rev-parse can perform functionality totally unrelated to resolving object IDs, including shell quoting, option parsing (as a replacement for getopt), printing local GIT_ environment variables, resolving paths inside of $GIT_DIR and so much more.
Git 2.52 introduces the first step to giving some of this functionality a new home via its new git repo command. The git repo command—currently designated as experimental—is designed to be a general-purpose tool for retrieving pieces of information about your repository. For example, you can check whether or not a repository is shallow or bare, along with what type of object and reference format it uses, like so:
```
$ keys='layout.bare layout.shallow object.format references.format'
$ git repo info $keys
layout.bare=false
layout.shallow=false
object.format=sha1
references.format=files
```
The new git repo command can also print out some general statistics about your repository’s structure and contents via its git repo structure sub-command:
```
$ git repo structure
Counting objects: 497533, done.
| Repository structure | Value  |
| -------------------- | ------ |
| * References         |        |
|   * Count            |   2871 |
|     * Branches       |     58 |
|     * Tags           |   1273 |
|     * Remotes        |   1534 |
|     * Others         |      6 |
|                      |        |
| * Reachable objects  |        |
|   * Count            | 497533 |
|     * Commits        |  91386 |
|     * Trees          | 208050 |
|     * Blobs          | 197103 |
|     * Tags           |    994 |
```
[source, source, source]
Back in 2.28, the Git project introduced the init.defaultBranch configuration option to provide a default branch name for any repositories created with git init. Since its introduction, the default value of that configuration option was “master”, though many set init.defaultBranch to “main” instead.
Beginning in Git 3.0, the default value for init.defaultBranch will change to “main”. That means that any repositories created in Git 3.0 or newer using git init will have their default branch named “main” without the need for any additional configuration.
If you want to get a sneak peak of that, or any other planned change for Git 3.0, you can build Git locally with the WITH_BREAKING_CHANGES build-flag to try out the new changes today.
[source, source]
By default, Git uses SHA-1 to provide a content-addressable hash of any object in your repository. In Git 3.0, Git will instead use SHA-256 which offers more appealing security properties. Back in our coverage of Git 2.45, we talked about some new changes which enable writing out separate copies of new objects using both SHA-1 and SHA-256 as a transitory step towards interoperability between the two.
In Git 2.52, the rest of that work towards interoperability begins. Though the changes that landed in this release are focused on laying the groundwork for future interoperability features, the hope is that eventually you can use a Git repository with one hash algorithm, while pushing and pulling from another repository using a different hash algorithm.
[source]
Speaking of other bleeding-edge changes in Git, this release is the first to (optionally) use Rust code for some internal functionality within Git. This mode is optional and guarded behind a new WITH_RUST build flag. When built with this mode enabled, Git will use a Rust implementation for encoding and decoding variable-width integers.
Though this release only introduces a Rust variant of some minor utility functionality, it sets up the infrastructure for much more interesting parts of Git to be rewritten in Rust.
Rust support is not yet mandatory, so Git 2.52 will continue to run just fine on platforms that don’t have a Rust compiler. However, Rust support will be required for Git 3.0, at which point many more components of Git will likely depend on Rust code.
[source, source, source]
Long-time readers may recall our coverage of changed-path Bloom filters within Git from back in 2.28. If not, a changed-path Bloom filter is a probabilistic data structure that can approximate which file path(s) were modified by a commit (relative to its first parent). Since Bloom filters never have false negatives (i.e. indicating a commit did not modify some path when it in fact did), they can be used to accelerate many path-scoped traversals throughout Git (including last-modified above!).
More recently, we covered new ways of using Bloom filters within Git, like providing multiple paths of interest at the same time (e.g., git log /my/subdir /my/other/subdir) which previously were not supported with Bloom filters. At that time, we wrote that there were ongoing discussions about supporting Bloom filters in even more of Git’s expressive pathspec syntax.
This release delivers the result of those discussions, and now supports the performance benefits of using Bloom filters in even more scenarios. One example here is when a pathspec contains wildcards in some, but not all of its components, like foo/bar/*/baz, where Git will now use its Bloom filter for the non-wildcard components of the path. To read about even more scenarios that can now leverage Bloom filters, check out the link below.
[source]
This release also saw a number of performance improvements across many areas of the project. git describe learned how to use a priority queue to speed up performance by 30%. git remote picked up a couple of new tricks to optimize renaming references with its rename sub-command. git ls-files can keep the index sparse in cases where it couldn’t before. git log -L became significantly faster by avoiding some unnecessary tree-level diffs when processing merge commits. Finally, xdiff (the library that powers Git’s file-level diff and merge engine) benefitted from a pair of optimizations (here, and here) in this release, and even more optimizations that will likely land in a future release.
[source, source, source, source]
Last but not least, some updates to Git’s sparse-checkout feature, which learned a new “clean” sub-command. git sparse-checkout clean can help you recover from tricky cases where some files are left outside of your sparse-checkout definition when changing which part(s) of the repository you have checked out.
The details of how one might get into this situation, and why recovering from it with pre-2.52 tools alone was so difficult, are surprisingly technical. If you’re interested in all of the gory details, this commit has all of the information about this change.
In the meantime, if you use sparse-checkout and have ever had difficulty cleaning up when switching your sparse-checkout definition, give git sparse-checkout clean a whirl with Git 2.52.
[source]

…the rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.52, or any previous version in the Git repository.

The post Highlights from Git 2.52 appeared first on The GitHub Blog.

20 Years of Git, 2 days at GitHub HQ: Git Merge 2025 highlights 🎉

Lee Reilly — Thu, 09 Oct 2025 08:00:00 +0000

Two decades after Linus’s first git commit, contributors from around the world gathered at GitHub HQ in San Francisco—not just to reflect on Git’s history, but to imagine its future. Git Merge 2025 marked 20 years of Git with technical talks, community collaboration, and the kind of hallway chats you can’t capture in slides. More than 100 people joined us in person, and over 600 tuned in online.

Day 1: Talks for everyone

From deep dives into Git internals to beginner-friendly sessions on creative workflows, this year’s program offered something for everyone. We heard from maintainers, educators, hobbyists, and even a high school student, sharing how Git shapes their work and learning. Speakers joined both in person and remotely from around the globe, making this one of our most accessible and inclusive Git Merge events yet.

Attendees gathered in the GitHub HQ amphitheater during Git Merge 2025.

Scott Chacon mixed comedy and code in a live demo of the GitButler CLI, while Google’s Martin von Zweigbergk unpacked how Jujutsu integrates with Git. Jacob Stopak reimagined Git learning through visualization and gamification, Steffen Hiller and Zoran Petrovic showcased new ways to visualize how repositories grow over time, and brian m. carlson unpacked what’s next for SHA-256 interoperability. Explore the playlist to watch these talks and more!

Day 2: Community at the center

The second day focused on collaboration with the annual Git Contributor’s Summit and an Unconference. Core maintainers and contributors met, both in person and remotely, to shape Git’s roadmap for the year ahead in one of our most remote-friendly gatherings yet.

Git Contributor’s Summit

During the summit, our Unconference opened the floor to everyone, with whiteboards filling quickly with ideas on branching strategies, Git education, and creative workflows.

Thank you

Git Merge 2025 wouldn’t have happened without this community. From the speakers who shared their work, to the contributors and volunteers who gave their time, to every attendee who showed up ready to learn and connect… thank you.

Huge thanks to the GitHub teams behind the scenes who made the event seamless for attendees around the world.

And a special thanks to our sponsors, GitButler and Google, for helping bring this year’s celebration to life.

Until the next Git Merge, keep committing. <3

Catch up on the talks from Git Merge 2025 >

The post 20 Years of Git, 2 days at GitHub HQ: Git Merge 2025 highlights 🎉 appeared first on The GitHub Blog.

What’s next for Git? 20 years in, the community is still pushing forward

Lee Reilly — Mon, 22 Sep 2025 16:33:00 +0000

This year marks 20 years since Git’s first commit. Since then, it’s become the default version control system for everything from weekend side projects to the largest monorepos in the world.

But Git’s evolution didn’t stop at git init. Every year, contributors continue to improve Git’s performance, UX, and interoperability while new tools and use cases push it into unfamiliar territory.

More than 20 years after its initial commit, the Git project continues to thrive thanks to a community that keeps pushing for better performance, new features, and a friendlier experience.
Taylor Blau, Principal Software Engineer at GitHub

At Git Merge 2025, we’re celebrating that momentum — not with nostalgia, but with a look at where Git is headed next:

Delivering faster merges, new backends, and experiments in correctness
Enabling SHA-256 interoperability for a more secure future
Distilling two decades of Git UX lessons into better clients
Teaching Git through visualization, simulation, and gamification
Powering surprising new use cases: local-first apps, genomic research, and WASM Git servers

AI meets Git: New workflows, new friction

As AI-powered coding agents generate more of our code, new questions emerge: how should those agents use Git responsibly?

In his talk, GitHub and GitButler co-founder Scott Chacon will explore practical strategies for teaching AI agents good Git hygiene, from writing meaningful commit messages to amending, squashing, and rebasing with context. He’ll also dig into forge management challenges, showcase several Git-related MCP servers, and demo tooling he’s developed to make human-agent collaboration inside Git smoother. Finally, Scott will share ideas on how both Git and MCP could evolve to better support these workflows.

Explore the full Git Merge 2025 speaker lineup and register now to attend in-person or online.

Git Merge 2025 is made possible thanks to the support of our partners Google and GitButler.

The post What’s next for Git? 20 years in, the community is still pushing forward appeared first on The GitHub Blog.

Highlights from Git 2.51

Taylor Blau — Mon, 18 Aug 2025 17:04:36 +0000

The open source Git project just released Git 2.51 with features and bug fixes from over 91 contributors, 21 of them new. We last caught up with you on the latest in Git back when 2.50 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Cruft-free multi-pack indexes

Returning readers will have likely seen our coverage of cruft packs, multi-pack indexes (MIDXs), and reachability bitmaps. In case you’re new around here or otherwise need a refresher, here’s a brief overview:

Git stores repository contents as “objects” (blobs, trees, commits), either individually (“loose” objects, e.g. $GIT_DIR/objects/08/10d6a05...) or grouped into “packfiles” ($GIT_DIR/objects/pack). Each pack has an index (*.idx) that maps object hashes to offsets. With many packs, lookups slow down to O(M*log(N)), (where M is the number of packs in your repository, and N is the number of objects within a given pack).

A MIDX works like a pack index but covers the objects across multiple individual packfiles, reducing the lookup cost to O(log(N)), where N is the total number of objects in your repository. We use MIDXs at GitHub to store the contents of your repository after splitting it into multiple packs. We also use MIDXs to store a collection of reachability bitmaps for some selection of commits to quickly determine which object(s) are reachable from a given commit¹.

However, we store unreachable objects separately in what is known as a “cruft pack”. Cruft packs were meant to exclude unreachable objects from the MIDX, but we realized pretty quickly that doing so was impossible. The exact reasons are spelled out in this commit, but the gist is as follows: if a once-unreachable object (stored in a cruft pack) later becomes reachable from some bitmapped commit, but the only copy of that object is stored in a cruft pack outside of the MIDX, then that object has no bit position, making it impossible to write a reachability bitmap.

Git 2.51 introduces a change to how the non-cruft portion of your repository is packed. When generating a new pack, Git used to exclude any object which appeared in at least one pack that would not be deleted during a repack operation, including cruft packs. In 2.51, Git now will store additional copies of objects (and their ancestors) whose only other copy is within a cruft pack. Carrying this process out repeatedly guarantees that the set of non-cruft packs does not have any object which reaches some other object not stored within that set of packs. (In other words, the set of non-cruft packs is closed under reachability.)

As a result, Git 2.51 has a new repack.MIDXMustContainCruft configuration which uses the new repacking behavior described above to store cruft packs outside of the MIDX. Using this at GitHub has allowed us to write significantly smaller MIDXs, in a fraction of the time, and resulting in faster repository read performance overall. (In our primary monorepo, MIDXs shrunk by about 38%, we wrote them 35% faster, and improved read performance by around 5%.)

Give cruft-less MIDXs a try today using the new repack.MIDXMustContainCruft configuration option.

[source]

Smaller packs with path walk

In Git 2.49, we talked about Git’s new “name-hash v2” feature, which changed the way that Git selects pairs of objects to delta-compress against one another. The full details are covered in that post, but here’s a quick gist. When preparing a packfile, Git computes a hash of all objects based on their filepath. Those hashes are then used to sort the list of objects to be packed, and Git uses a sliding window to search between pairs of objects to identify good delta/base candidates.

Prior to 2.49, Git used a single hash function based on the object’s filepath, with a heavy bias towards the last 16 characters of the path. That hash function, dating back all the way to 2006, works well in many circumstances, but can fall short when, say, unrelated blobs appear in paths whose final 16 characters are similar. Git 2.49 introduced a new hash function which takes more of the directory structure into account², resulting in significantly smaller packs in some circumstances.

Git 2.51 takes the spirit of that change and goes a step further by introducing a new way to collect objects when repacking, called “path walk”. Instead of walking objects in revision order with Git emitting objects with their corresponding path names along the way, the path walk approach emits all objects from a given path at the same time. This approach avoids the name-hash heuristic altogether and can look for deltas within groups of objects that are known to be at the same path.

As a result, Git can generate packs using the path walk approach that are often significantly smaller than even those generated with the new name hash function described above. Its timings are competitive even with generating packs using the existing revision order traversal.

Try it out today by repacking with the new --path-walk command-line option.

[source]

Stash interchange format

If you’ve ever needed to switch to another branch, but wanted to save any uncommitted changes, you have likely used git stash. The stash command stores the state of your working copy and index, and then restores your local copy to match whatever was in HEAD at the time you stashed.

If you’ve ever wondered how Git actually stores a stash entry, then this section is for you. Whenever you push something onto your stash, Git creates three³ commits behind the scenes. There are two commits generated which capture the staged and unstaged changes. The staged changes represent whatever was in your index at the time of stashing, and the working directory changes represent everything you changed in your local copy but didn’t add to the index. Finally, Git creates a third commit listing the other two as its parents, capturing the entire snapshot.

Those internally generated commits are stored in the special refs/stash ref, and multiple stash entries are managed with the reflog. They can be accessed with git stash list, and so on. Since there is only one stash entry in refs/stash at a time, it’s extremely cumbersome to migrate stash entries from one machine to another.

Git 2.51 introduces a variant of the internal stash representation that allows multiple stash entries to be represented as a sequence of commits. Instead of using the first two parents to store changes from the index and working copy, this new representation adds one more parent to refer to the previous stash entry. That results in stash entries that contain four⁴ parents, and can be treated like an ordinary log of commits.

As a consequence of that, you can now export your stashes to a single reference, and then push or pull it like you would a normal branch or tag. Git 2.51 makes this easy by introducing two new sub-commands to git stash to import and export, respectively. You can now do something like:

$ git stash export --to-ref refs/stashes/my-stash
$ git push origin refs/stashes/my-stash

on one machine to push the contents of your stash to origin, and then:

$ git fetch origin '+refs/stashes/*:refs/stashes/*'
$ git stash import refs/stashes/my-stash

on another, preserving the contents of your stash between the two.

[source]

All that…

Now that we’ve covered some of the larger changes in more detail, let’s take a quicker look at a selection of some other new features and updates in this release.

If you’ve ever scripted around the object contents of your repository, you have no doubt encountered git cat-file, Git’s dedicated tool to print the raw contents of a given object.
git cat-file also has specialized --batch and --batch-check modes, which take a sequence of objects over stdin and print each object’s information (and contents, in the case of --batch). For example, here’s some basic information about the README.md file in Git’s own repository.
```
$ echo HEAD:README.md | git.compile cat-file --batch-check
d87bca1b8c3ebf3f32deb557ae9796ddc5b792ca blob 3662
```
Here, Git is telling us the object ID, type, and size for the object we specified, just as we expect. cat-file produces the same information for tree and commit objects. But what happens if we give it the path to a submodule? Prior to Git 2.51, cat-file would just print missing. But Git 2.51 improves this output, making cat-file more useful in a new variety of scripting scenarios:
```
[ pre-2.51 git ]
$ echo HEAD:sha1collisiondetection | git cat-file --batch-check
HEAD:sha1collisiondetection missing

[ git 2.51 ]
$ echo HEAD:sha1collisiondetection | git cat-file --batch-check 855827c583bc30645ba427885caa40c5b81764d2 submodule
```
[source]
Back in our coverage of 2.28, we talked about Git’s new changed-path Bloom feature. If you aren’t familiar with Bloom filters, or could use a refresher about how they’re used in Git, then read on.

A Bloom filter is a probabilistic data structure that behaves like a set, with one difference. It can only tell you with 100% certainty whether an element is not in the set, but may have some false positives when indicating that an item is in the set.

Git uses Bloom filters in its commit-graph data structure to store a probabilistic set of which paths were modified by that commit relative to its first parent. That allows history traversals like git log origin -- path/to/my/file to quickly skip over commits which are known not to modify that path (or any of its parents). However, because Git’s full pathspec syntax is far more expressive than that, Bloom filters can’t always optimize pathspec-scoped history traversals.

Git 2.51 addresses part of that limitation by adding support for using multiple pathspec items, like git log -- path/to/a path/to/b, which previously could not make use of changed-path Bloom filters. At the time of writing, there is ongoing discussion about adding support for even more special cases.

[source]
The modern equivalents of git checkout, known as git switch and git restore have been considered experimental since their introduction back in Git 2.23. These commands delineate the many jobs that git checkout can perform into separate, more purpose-built commands. Six years later⁵, these commands are no longer considered experimental, making their command-line interface stable and backwards compatible across future releases.

[source]
Even if you’re a veteran Git user, it’s not unlikely to encounter a new Git command (among the 144!⁶) every once in a while. One such command you might not have heard of is git whatchanged, which behaves like its modern alternative git log --raw.

That command is now marked as deprecated with eventual plans to remove it in Git 3.0. As with other similar deprecations, you can still use this command behind the aptly-named --i-still-use-this flag⁷.

[source]
Speaking of Git 3.0, this release saw a few more entries added to the BreakingChanges list. First, Git’s reftable backend (which we talked about extensively in our coverage of Git 2.45) will become the new default format in repositories created with Git 3.0, when it is eventually released. Git 3.0 will also use the SHA-256 hash function as its default hash when initializing new repositories.

Though there is no official release date yet planned for Git 3.0, you can get a feel for some of the new defaults by building Git yourself with the WITH_BREAKING_CHANGES flag.

[source, source]
Last but not least, a couple of updates on Git’s internal development process. Git has historically prioritized wide platform compatibility, and, as a result, has taken a conservative approach to adopting features from newer C standards. Though Git has required a C99-compatible compiler since near the end of 2021, it has adopted features from that standard gradually, since some of the compilers Git targets only have partial support for the standard.

One example is the bool keyword, which became part of the C standard in C99. Here, the project began experimenting with the bool keyword back in late 2023. This release declares that experiment a success and now permits the use of bool throughout its codebase. This release also began documenting C99 features that the project is using experimentally along with C99 features that the project doesn’t use.

Finally, this release saw an update to Git’s guidelines on submitting patches, which have historically required contributions to be non-anonymous, and submitted under a contributor’s legal name. Git now aligns more closely with the Linux kernel’s approach, to permit submitting patches with an identity other than the contributor’s legal name.

[source, source, source]

…and a bag of chips

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.51, or any previous version in the Git repository.

¹ For some bit position (corresponding to a single object in your repository,) a 1 means that object can be reached from that bitmap’s associated commit, and a 0 means it is not reachable from that commit. There are also four type-level bitmaps (for blobs, trees, commits, and annotated tags); the XOR of those bitmaps is the all 1s bitmap. For more details on multi-pack reachability bitmaps, check out our previous post on Scaling monorepo maintenance. ⤴️

² For the curious, each layer of the directory is hashed individually, then downshifted and XOR ed into the overall result. This results in a hash function which is more sensitive to the whole path structure, rather than just the final 16 characters. ⤴️

³ Usually. Git will sometimes generate a fourth commit if you stashed untracked (new files that haven’t yet been committed) or ignored files (that match one or more patterns in a .gitignore). ⤴️

⁴Or five. ⤴️

⁵ Almost to the day; Git 2.23 was released on August 16, 2019, and Git 2.51 was released on August 18, 2025. ⤴️

⁶ It’s true; git --list-cmds=builtins | wc -l outputs “144” with Git 2.51. ⤴️

⁷ If you are somehow a diehard git whatchanged user, please let us know by sending a message to the Git mailing list. ⤴️

The post Highlights from Git 2.51 appeared first on The GitHub Blog.

Git security vulnerabilities announced

Taylor Blau — Tue, 08 Jul 2025 17:02:11 +0000

Today, the Git project released new versions to address seven security vulnerabilities that affect all prior versions of Git.

Vulnerabilities in Git

CVE-2025-48384

When reading a configuration value, Git will strip any trailing carriage return (CR) and line feed (LF) characters. When writing a configuration value, however, Git does not quote trailing CR characters, causing them to be lost when they are read later on. When initializing a submodule whose path contains a trailing CR character, the stripped path is used, causing the submodule to be checked out in the wrong place.

If a symlink already exists between the stripped path and the submodule’s hooks directory, an attacker can execute arbitrary code through the submodule’s post-checkout hook.

[source]

CVE-2025-48385

When cloning a repository, Git can optionally fetch a bundle, allowing the server to offload a portion of the clone to a CDN. The Git client does not properly validate the advertised bundle(s), allowing the remote side to perform protocol injection. When a specially crafted bundle is advertised, the remote end can cause the client to write the bundle to an arbitrary location, which may lead to code execution similar to the previous CVE.

[source]

CVE-2025-48386 (Windows only)

When cloning from an authenticated remote, Git uses a credential helper in order to authenticate the request. Git includes a handful of credential helpers, including Wincred, which uses the Windows Credential Manager to store its credentials.

Wincred uses the contents of a static buffer as a unique key to store and retrieve credentials. However, it does not properly bounds check the remaining space in the buffer, leading to potential buffer overflows.

[source]

Vulnerabilities in Git GUI and Gitk

This release resolves four new CVEs related to Gitk and Git GUI. Both tools are Tcl/Tk-based graphical interfaces used to interact with Git repositories. Gitk is focused on showing a repository’s history, whereas Git GUI focuses on making changes to existing repositories.

CVE-2025-27613 (Gitk)

When running Gitk in a specially crafted repository without additional command-line arguments, Gitk can write and truncate arbitrary writable files. The “Support per-file encoding” option must be enabled; however, the operation of “Show origin of this line” is affected regardless.

[source]

CVE-2025-27614 (Gitk)

If a user is tricked into running gitk filename (where filename has a particular structure), they may run arbitrary scripts supplied by the attacker, leading to arbitrary code execution.

[source]

CVE-2025-46334 (Git GUI, Windows only)

If a malicious repository includes an executable sh.exe, or common textconv programs (for e.g., astextplain, exif, or ps2ascii), path lookup on Windows may locate these executables in the working tree. If a user running Git GUI in such a repository selects either the “Git Bash” or “Browse Files” from the menu, these programs may be invoked, leading to arbitrary code execution.

[source]

CVE-2025-46835 (Git GUI)

When a user is tricked into editing a file in a specially named directory in an untrusted repository, Git GUI can create and overwrite arbitrary writable files, similar to CVE-2025-27613.

[source]

Upgrade to the latest Git version

The most effective way to protect against these vulnerabilities is to upgrade to Git 2.50.1, the newest release containing fixes for the aforementioned vulnerabilities. If you can’t upgrade immediately, you can reduce your risk by doing the following:

Avoid running git clone with --recurse-submodules against untrusted repositories.
Disable auto-fetching bundle URIs by setting the transfer.bundleURI configuration value to “false.”
Avoid using the wincred credential helper on Windows.
Avoid running Gitk and Git GUI in untrusted repositories.

In order to protect users against attacks related to these vulnerabilities, GitHub has taken proactive steps. Specifically, we have scheduled releases of GitHub Desktop. GitHub Codespaces and GitHub Actions will update their versions of Git shortly. GitHub itself, including Enterprise Server, is unaffected by these vulnerabilities.

CVE-2025-48384, CVE-2025-48385, and CVE-2025-48386 were discovered by David Leadbeater. Justin Tobler and Patrick Steinhardt provided fixes for CVEs 2025-48384 and 2025-48385 respectively. The fix for CVE-2025-48386 is joint work between Taylor Blau and Jeff King

CVE-2025-46835 was found and fixed by Johannes Sixt. Mark Levedahl discovered and fixed CVE-2025-46334. Avi Halachmi discovered both CVE-2025-27613 and CVE-2025-27614, and fixed the latter. CVE-2025-27613 was fixed by Johannes Sixt.

The post Git security vulnerabilities announced appeared first on The GitHub Blog.

Highlights from Git 2.50

Taylor Blau — Mon, 16 Jun 2025 17:12:27 +0000

The open source Git project just released Git 2.50 with features and bug fixes from 98 contributors, 35 of them new. We last caught up with you on the latest in Git back when 2.49 was released.

💡 Before we get into the details of this latest release, we wanted to remind you that Git Merge, the conference for Git users and developers is back this year on September 29-30, in San Francisco. Git Merge will feature talks from developers working on Git, and in the Git ecosystem. Tickets are on sale now; check out the website to learn more.

With that out of the way, let’s take a look at some of the most interesting features and changes from Git 2.50.

Improvements for multiple cruft packs

When we covered Git 2.43, we talked about newly added support for multiple cruft packs. Git 2.50 improves on that with better command-line ergonomics, and some important bugfixes. In case you’re new to the series, need a refresher, or aren’t familiar with cruft packs, here’s a brief overview:

Git objects may be either reachable or unreachable. The set of reachable objects is everything you can walk to starting from one of your repository’s references: traversing from commits to their parent(s), trees to their sub-tree(s), and so on. Any object that you didn’t visit by repeating that process over all of your references is unreachable.

In Git 2.37, Git introduced cruft packs, a new way to store your repository’s unreachable objects. A cruft pack looks like an ordinary packfile with the addition of an .mtimes file, which is used to keep track of when each object was most recently written in order to determine when it is safe¹ to discard it.

However, updating the cruft pack could be cumbersome–particularly in repositories with many unreachable objects–since a repository’s cruft pack must be rewritten in order to add new objects. Git 2.43 began to address this through a new command-line option: git repack --max-cruft-size. This option was designed to split unreachable objects across multiple packs, each no larger than the value specified by --max-cruft-size. But there were a couple of problems:

If you’re familiar with git repack’s --max-pack-size option, --max-cruft-size’s behavior is quite confusing. The former option specifies the maximum size an individual pack can be, while the latter involves how and when to move objects between multiple packs.
The feature was broken to begin with! Since --max-cruft-size also imposes on cruft packs the same pack-size constraints as --max-pack-size does on non-cruft packs, it is often impossible to get the behavior you want.

For example, suppose you had two 100 MiB cruft packs and ran git repack --max-cruft-size=200M. You might expect Git to merge them into a single 200 MiB pack. But since --max-cruft-size also dictates the maximum size of the output pack, Git will refuse to combine them, or worse: rewrite the same pack repeatedly.

Git 2.50 addresses both of these issues with a new option: --combine-cruft-below-size. Instead of specifying the maximum size of the output pack, it determines which existing cruft pack(s) are eligible to be combined. This is particularly helpful for repositories that have accumulated many unreachable objects spread across multiple cruft packs. With this new option, you can gradually reduce the number of cruft packs in your repository over time by combining existing ones together.

With the introduction of --combine-cruft-below-size, Git 2.50 repurposed --max-cruft-size to behave as a cruft pack-specific override for --max-pack-size. Now --max-cruft-size only determines the size of the outgoing pack, not which packs get combined into it.

Along the way, a bug was uncovered that prevented objects stored in multiple cruft packs from being “freshened” in certain circumstances. In other words, some unreachable objects don’t have their modification times updated when they are rewritten, leading to them being removed from the repository earlier than they otherwise would have been. Git 2.50 squashes this bug, meaning that you can now efficiently manage multiple cruft packs and freshen their objects to your heart’s content.

[source, source]

Incremental multi-pack reachability bitmaps

Back in our coverage of Git 2.47, we talked about preliminary support for incremental multi-pack indexes. Multi-pack indexes (MIDXs) act like a single pack *.idx file for objects spread across multiple packs.

Multi-pack indexes are extremely useful to accelerate object lookup performance in large repositories by binary searching through a single index containing most of your repository’s contents, rather than repeatedly searching through each individual packfile. But multi-pack indexes aren’t just useful for accelerating object lookups. They’re also the basis for multi-pack reachability bitmaps, the MIDX-specific analogue of classic single-pack reachability bitmaps. If neither of those are familiar to you, don’t worry; here’s a brief refresher. Single-pack reachability bitmaps store a collection of bitmaps corresponding to a selection of commits. Each bit position in a pack bitmap refers to one object in that pack. In each individual commit’s bitmap, the set bits correspond to objects that are reachable from that commit, and the unset bits represent those that are not.

Multi-pack bitmaps were introduced to take advantage of the substantial performance increase afforded to us by reachability bitmaps. Instead of having bitmaps whose bit positions correspond to the set of objects in a single pack, a multi-pack bitmap’s bit positions correspond to the set of objects in a multi-pack index, which may include objects from arbitrarily many individual packs. If you’re curious to learn more about how multi-pack bitmaps work, you can read our earlier post Scaling monorepo maintenance.

However, like cruft packs above, multi-pack indexes can be cumbersome to update as your repository grows larger, since each update requires rewriting the entire multi-pack index and its corresponding bitmap, regardless of how many objects or packs are being added. In Git 2.47, the file format for multi-pack indexes became incremental, allowing multiple multi-pack index layers to be layered on top of one another forming a chain of MIDXs. This made it much easier to add objects to your repository’s MIDX, but the incremental MIDX format at the time did not yet have support for multi-pack bitmaps.

Git 2.50 brings support for the multi-pack reachability format to incremental MIDX chains, with each MIDX layer having its own *.bitmap file. These bitmap layers can be used in conjunction with one another to provide reachability information about selected commits at any layer of the MIDX chain. In effect, this allows extremely large repositories to quickly and efficiently add new reachability bitmaps as new commits are pushed to the repository, regardless of how large the repository is.

This feature is still considered highly experimental, and support for repacking objects into incremental multi-pack indexes and bitmaps is still fairly bare-bones. This is an active area of development, so we’ll make sure to cover any notable developments to incremental multi-pack reachability bitmaps in this series in the future.

[source]

The `ORT` merge engine replaces `recursive`

This release also saw some exciting updates related to merging. Way back when Git 2.33 was released, we talked about a new merge engine called “ORT” (standing for “Ostensibly Recursive’s Twin”).

ORT is a from-scratch rewrite of Git’s old merging engine, called “recursive.” ORT is significantly faster, more maintainable, and has many new features that were difficult to implement on top of its predecessor.

One of those features is the ability for Git to determine whether or not two things are mergeable without actually persisting any new objects necessary to construct the merge in the repository. Previously, the only way to tell whether two things are mergeable was to run git merge-tree --write-tree on them. That works, but in this example merge-tree wrote any new objects generated by the merge into the repository. Over time, these can accumulate and cause performance issues. In Git 2.50, you can make the same determination without writing any new objects by using merge-tree’s new --quiet mode and relying on its exit code.

Most excitingly in this release is that ORT has entirely superseded recursive, and recursive is no longer part of Git’s source code. When ORT was first introduced, it was only accessible through git merge’s -s option to select a strategy. In Git 2.34, ORT became the default choice over recursive, though the latter was still available in case there were bugs or behavior differences between the two. Now, 16 versions and two and a half years later, recursive has been completely removed from Git, with its author, Elijah Newren, writing:

As a wise man once told me, “Deleted code is debugged code!”

As of Git 2.50, recursive has been completely ~~debugged~~ deleted. For more about ORT’s internals and its development, check out this five part series from Elijah here, here, here, here, and here.

[source, source, source]

If you’ve ever scripted around your repository’s objects, you are likely familiar with git cat-file, Git’s purpose-built tool to list objects and print their contents. git cat-file has many modes, like --batch (for printing out the contents of objects), or --batch-check (for printing out certain information about objects without printing their contents).
Oftentimes it is useful to dump the set of all objects of a certain type in your repository. For commits, git rev-list can easily enumerate a set of commits. But what about, say, trees? In the past, to filter down to just the tree objects from a list of objects, you might have written something like:
```
$ git cat-file --batch-check='%(objecttype) %(objectname)' \
    --buffer Git 2.50 brings Git’s object filtering mechanism used in partial clones to git cat-file, so the above can be rewritten a little more concisely like:$ git cat-file --batch-check='%(objectname)' --filter='object:type=tree' 
[source]
```
While we’re on the topic, let’s discuss a little-known git cat-file command-line option: --allow-unknown-type. This arcane option was used with objects that have a type other than blob, tree, commit, or tag. This is a quirk dating back a little more than a decade ago that allows git hash-object to write objects with arbitrary types. In the time since, this feature has gotten very little use. In fact, git cat-file -p --allow-unknown-type can’t even print out the contents of one of these objects!
```
$ oid="$(git hash-object -w -t notatype --literally /dev/null)"
$ git cat-file -p $oid
fatal: invalid object type
```
This release makes the --allow-unknown-type option silently do nothing, and removes support from git hash-object to write objects with unknown types in the first place.

[source]
The git maintenance command learned a number of new tricks this release as well. It can now perform a few new different kinds of tasks, like worktree-prune, rerere-gc, and reflog-expire. worktree-prune mirrors git gc’s functionality to remove stale or broken Git worktrees. rerere-gc also mirrors existing functionality exposed via git gc to expire old rerere entries from previously recorded merge conflict resolutions. Finally, reflog-expire can be used to remove stale unreachable objects from out of the reflog.
git maintenance also ships with new configuration for the existing loose-objects task. This task removes lingering loose objects that have since been packed away, and then makes new pack(s) for any loose objects that remain. The size of those packs was previously fixed at a maximum of 50,000, and can now be configured by the maintenance.loose-objects.batchSize configuration.

[source, source, source]
If you’ve ever needed to recover some work you lost, you may be familiar with Git’s reflog feature, which allows you to track changes to a reference over time. For example, you can go back and revisit earlier versions of your repository’s main branch by doing git show main@{2} (to show main prior to the two most recent updates) or main@{1.week.ago} (to show where your copy of the branch was at a week ago).
Reflog entries can accumulate over time, and you can reach for git reflog expire in the event you need to clean them up. But how do you delete the entirety of a branch’s reflog? If you’re not yet running Git 2.50 and thought “surely it’s git reflog delete”, you’d be wrong! Prior to Git 2.50, the only way to drop a branch’s entire reflog was to do git reflog expire $BRANCH --expire=all.
In Git 2.50, a new drop sub-command was introduced, so you can accomplish the same as above with the much more natural git reflog drop $BRANCH.

[source]
Speaking of references, Git 2.50 also received some attention to how references are processed and used throughout its codebase. When using the low-level git update-ref command, Git used to spend time checking whether or not the proposed refname could also be a valid object ID, making its lookups ambiguous. Since update-ref is such a low-level command, this check is no longer done, delivering some performance benefits to higher-level commands that rely on update-ref for their functionality.
Git 2.50 also learned how to cache whether or not any prefix of a proposed reference name already exists (for example, you can’t create a reference ref/heads/foo/bar/baz if either refs/heads/foo/bar or refs/heads/foo already exists).
Finally, in order to make those checks, Git used to create a new reference iterator for each individual prefix. Git 2.50’s reference backends learned how to “seek” existing iterators, saving time by being able to reuse the same iterator when checking each possible prefix.
[source]
If you’ve ever had to tinker with Git’s low-level curl configuration, you may be familiar with Git’s configuration options for tuning HTTP connections, like http.lowSpeedLimit and http.lowSpeedTime which are used to terminate an HTTP connection that is transferring data too slowly.
These options can be useful when fine-tuning Git to work in complex networking environments. But what if you want to tweak Git’s TCP Keepalive behavior? This can be useful to control when and how often to send keepalive probes, as well as how many to send, before terminating a connection that hasn’t sent data recently.
Prior to Git 2.50, this wasn’t possible, but this version introduces three new configuration options: http.keepAliveIdle, http.keepAliveInterval, and http.keepAliveCount which can be used to control the fine-grained behavior of curl’s TCP probing (provided your operating system supports it).
[source]
Git is famously portable and runs on a wide variety of operating systems and environments with very few dependencies. Over the years, various parts of Git have been written in Perl, including some commands like the original implementation of git add -i . These days, very few remaining Git commands are written in Perl.
This version reduces Git’s usage of Perl by removing it as a dependency of the test suite and documentation toolchain. Many Perl one-liners from Git’s test suite were rewritten to use other Shell functions or builtins, and some were rewritten as tiny C programs. For the handful of remaining hard dependencies on Perl, those tests will be skipped on systems that don’t have a working Perl.

[source, source]
This release also shipped a minor cosmetic update to git rebase -i. When starting a rebase, your $EDITOR might appear with contents that look something like:
```
pick c108101daa foo
pick d2a0730acf bar
pick e5291f9231 baz
```
You can edit that list to break, reword, or exec (among many others), and Git will happily execute your rebase. But if you change the commit message in your rebase’s TODO script, they won’t actually change!
That’s because the commit messages shown in the TODO script are just meant to help you identify which commits you’re rebasing. (If you want to rewrite any commit messages along the way, you can use the reword command instead). To clarify that these messages are cosmetic, Git will now prefix them with a # comment character like so:
```
pick c108101daa # foo
pick d2a0730acf # bar
pick e5291f9231 # baz
```
[source]
Long time readers of this series will recall our coverage of Git’s bundle feature (when Git added support for partial bundles), though we haven’t covered Git’s bundle-uri feature. Git bundles are a way to package your repositories contents: both its objects and the references that point at them into a single *.bundle file.
While Git has had support for bundles since as early as v1.5.1 (nearly 18 years ago!), its bundle-uri feature is much newer. In short, the bundle-uri feature allows a server to serve part of a clone by first directing the client to download a *.bundle file. After the client does so, it will try to perform a fill-in fetch to gather any missing data advertised by the server but not part of the bundle.
To speed up this fill-in fetch, your Git client will advertise any references that it picked up from the *.bundle itself. But in previous versions of Git, this could sometimes result in slower clones overall! That’s because up until Git 2.50, Git would only advertise the branches in refs/heads/* when asking the server to send the remaining set of objects.
Git 2.50 now includes advertises all references it knows about from the *.bundle when doing a fill-in fetch on the server, making bundle-uri-enabled clones much faster.
For more details about these changes, you can check out this blog post from Scott Chacon.
[source]
Last but not least, git add -p (and git add -i) now work much more smoothly in sparse checkouts by no longer having to expand the sparse index. This follows in a long line of work that has been gradually adding sparse-index compatibility to Git commands that interact with the index.
Now you can interactively stage parts of your changes before committing in a sparse checkout without having to wait for Git to populate the sparsified parts of your repository’s index. Give it a whirl on your local sparse checkout today!
[source]

The rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.50, or any previous version in the Git repository.

🎉 Git turned 20 this year! Celebrate by watching our interview of Linus Torvalds, where we discuss how it forever changed software development.

¹ It’s never truly safe to remove an unreachable object from a Git repository that is accepting incoming writes, because marking an object as unreachable can race with incoming reference updates, pushes, etc. At GitHub, we use Git’s –expire-to feature (which we wrote about in our coverage of Git 2.39) in something we call “limbo repositories” to quickly recover objects that shouldn’t have been deleted, before deleting them for good. ↩️

The post Highlights from Git 2.50 appeared first on The GitHub Blog.

How the GitHub CLI can now enable triangular workflows

Tyler McGoffin — Fri, 25 Apr 2025 16:00:37 +0000

Most developers are familiar with the standard Git workflow. You create a branch, make changes, and push those changes back to the same branch on the main repository. Git calls this a centralized workflow. It’s straightforward and works well for many projects.

However, sometimes you might want to pull changes from a different branch directly into your feature branch to help you keep your branch updated without constantly needing to merge or rebase. However, you’ll still want to push local changes to your own branch. This is where triangular workflows come in.

It’s possible that some of you have already used triangular workflows, even without knowing it. When you fork a repo, contribute to your fork, then open a pull request back to the original repo, you’re working in a triangular workflow. While this can work seamlessly on github.com, the process hasn’t always been seamless with the GitHub CLI.

The GitHub CLI team has recently made improvements (released in v2.71.2) to better support these triangular workflows, ensuring that the gh pr commands work smoothly with your Git configurations. So, whether you’re working on a centralized workflow or a more complex triangular one, the GitHub CLI will be better equipped to handle your needs.

If you’re already familiar with how Git handles triangular workflows, feel free to skip ahead to learn about how to use gh pr commands with triangular workflows. Otherwise, let’s get into the details of how Git and the GitHub CLI have historically differed, and how four-and-a-half years after it was first requested, we have finally unlocked managing pull requests using triangular workflows in the GitHub CLI.

First, a lesson in Git fundamentals

To provide a framework for what we set out to do, it’s important to first understand some Git basics. Git, at its core, is a way to store and catalog changes on a repository and communicate those changes between copies of that repository. This workflow typically looks like the diagram below:

Figure 1: A typical git branch setup

The building blocks of this diagram illustrate two important Git concepts you likely use every day, a ref and push/pull.

Refs

A ref is a reference to a repository and branch. It has two parts: the remote, usually a name like origin or upstream, and the branch. If the remote is the local repository, it is blank. So, in the example above, origin/branch in the purple box is a remote ref, referring to a branch named branch on the repository name origin, while branch in the green box is a local ref, referring to a branch named branch on the local machine.

While working with GitHub, the remote ref is usually the repository you are hosting on GitHub. In the diagram above, you can consider the purple box GitHub and the green box your local machine.

Pushing and pulling

A push and a pull refer to the same action, but from two different perspectives. Whether you are pushing or pulling is determined by whether you are sending or receiving the changes. I can push a commit to your repo, or you can pull that commit from my repo, and the references to that action would be the same.

To disambiguate this, we will refer to different refs as the headRef or baseRef, where the headRef is sending the changes (pushing them) and the baseRef is receiving the changes (pulling them).

Figure 2: Disambiguating headRef and baseRef for push/pull operations

When dealing with a branch, we’ll often refer to the headRef of its pull operations as its pullRef and the baseRef of its push operations as its pushRef. That’s because, in these instances, the working branch is the pull’s baseRef and the push’s headRef, so they’re already disambiguated.

The `@{push}` revision syntax

Turns out, Git has a handy built-in tool for referring to the pushRef for a branch: the @{push} revision syntax. You can usually determine a branch’s pushRef by running the following command:

git rev-parse --abbrev-ref @{push}

This will result in a human-readable ref, like origin/branch, if one can be determined.

Pull Requests

On GitHub, a pull request is a proposal to integrate changes from one ref to another. In particular, they act as a simple “pause” before performing the actual integration operation, often called a merge, when changes are being pushed from ref to another. This pause allows for humans (code reviews) and robots (GitHub Copilot reviews and GitHub Actions workflows) to check the code before the changes are integrated. The name pull request came from this language specifically: You are requesting that a ref pulls your changes into itself.

Figure 3: Demonstrating how GitHub Pull Requests correspond to pushing and pulling

Common Git workflows

Now that you understand the basics, let’s talk about the workflows we typically use with Git every day.

A centralized workflow is how most folks interact with Git and GitHub. In this configuration, any given branch is pushing and pulling from a remote ref with the same branch name. For most of us, this type of configuration is set up by default when we clone a repo and push a branch. It is the situation shown in Figure 1.

In contrast, a triangular workflow pushes to and pulls from different refs. A common use case for this configuration is to pull directly from a remote repository’s default branch into your local feature branch, eliminating the need to run commands like git rebase or git merge on your feature branch to ensure the branch you’re working on is always up to date with the default branch. However, when pushing changes, this configuration will typically push to a remote ref with the same branch name as the feature branch.

Figure 4: juxtaposing centralized workflows from triangular workflows.

We complete the triangle when considering pull requests: the headRef is the pushRef for the local ref and the baseRef is the pullRef for the local branch:

Figure 5: a triangular workflow

We can go one step further and set up triangular workflows using different remotes as well. This most commonly occurs when you’re developing on a fork. In this situation, you usually give the fork and source remotes different names. I’ll use origin for the fork and upstream for the source, as these are common names used in these setups. This functions exactly the same as the triangular workflows above, but the remotes and branches on the pushRef and pullRef are different:

Figure 6: juxtaposing triangular workflows and centralized workflows with different remotes such as with forks

Using a Git configuration file for triangular workflows

There are two primary ways that you can set up a triangular workflow using the Git configuration – typically defined in a `.git/config` or `.gitconfig` file. Before explaining these, let’s take a look at what the relevant bits of a typical configuration look like in a repo’s `.git/config` file for a centralized workflow:

[remote “origin”] 
    url = https://github.com/OWNER/REPO.git 
    fetch = +refs/heads/*:refs/remotes/origin/*  
[branch “default”]
    remote = origin  
    merge = refs/heads/default  
[branch “branch”]
    remote = origin 
    merge = refs/heads/branch

Figure 7: A typical Git configuration setup found in .git/config

The [remote “origin”] part is naming the Git repository located at github.com/OWNER/REPO.git to origin, so we can reference it elsewhere by that name. We can see that reference being used in the specific [branch] configurations for both the default and branch branches in their remote keys. This key, in conjunction with the branch name, typically makes up the branch’s pushRef: in this example, it is origin/branch.

The remote and merge keys are combined to make up the branch’s pullRef: in this example, it is origin/branch.

Setting up a triangular branch workflow

The simplest way to assemble a triangular workflow is to set the branch’s merge key to a different branch name, like so:

[branch “branch”]
    remote = origin
    merge = refs/heads/default

Figure 8: a triangular branch’s Git configuration found in .git/config

This will result in the branch pullRef as origin/default, but pushRef as origin/branch, as shown in Figure 9.

Figure 9: A triangular branch workflow

Setting up a triangular fork workflow

Working with triangular forks requires a bit more customization than triangular branches because we are dealing with multiple remotes. Thus, our remotes in the Git config will look different than the one shown previously in Figure 7:

[remote “upstream”]
    url = https://github.com/ORIGINALOWNER/REPO.git 
    fetch = +refs/heads/*:refs/remotes/upstream/* 
[remote “origin”]
    url = https://github.com/FORKOWNER/REPO.git  
    fetch = +refs/heads/*:refs/remotes/origin/*

Figure 10: a Git configuration for a multi-remote Git setup found in .git/config

Upstream and origin are the most common names used in this construction, so I’ve used them here, but they can be named anything you want¹.

However, toggling a branch’s remote key between upstream and origin won’t actually set up a triangular fork workflow—it will just set up a centralized workflow with either of those remotes, like the centralized workflow shown in Figure 6. Luckily, there are two common Git configuration options to change this behavior.

Setting a branch’s `pushremote`

A branch’s configuration has a key called pushremote that does exactly what the name suggests: configures the remote that the branch will push to. A triangular fork workflow config using pushremote may look like this:

[branch “branch”]
    remote = upstream  
    merge = refs/heads/default  
    pushremote = origin

Figure 11: a triangular fork’s Git config using pushremote found in .git/config

This assembles the triangular fork repo we see in Figure 12. The pullRef is upstream/default, as determined by combining the remote and merge keys, while the pushRef is origin/branch, as determined by combining the pushremote key and the branch name.

Figure 12: A triangular fork workflow

Setting a repo’s `remote.pushDefault`

To configure all branches in a repository to have the same behavior as what you’re seeing in Figure 12, you can instead set the repository’s pushDefault. The config for this is below:

[remote] 
    pushDefault = origin 
[branch “branch”]
    remote = upstream 
    merge = refs/heads/default

Figure 13: a triangular fork’s Git config using remote.pushDefault found in .git/config

This assembles the same triangular fork repo as shown in Figure 12 above, however this time the pushRef is determined by combining the remote.pushDefault key and the branch name, resulting in origin/branch.

When using the branch’s pushremote and the repo’s remote.pushDefault keys together, Git will preferentially resolve the branch’s configuration over the repo’s, so the remote set on pushremote supersedes the remote set on remote.pushDefault.

Updating the `gh pr` command set to reflect Git

Previously, the gh pr command set did not resolve pushRefs and pullRefs in the same way that Git does. This was due to technical design decisions that made this change both difficult and complex. Instead of discussing that complexity—a big enough topic for a whole article in itself—I’m going to focus here on what you can now do with the updated gh pr command set.

If you set up triangular Git workflows in the manner described above, we will automatically resolve gh pr commands in accordance with your Git configuration.

To be slightly more specific, when trying to resolve a pull request for a branch, the GitHub CLI will respect whatever @{push} resolves to first, if it resolves at all. Then it will fall back to respect a branch’s pushremote, and if that isn’t set, finally look for a repo’s remote.pushDefault config settings.

What this means is that the CLI is assuming your branch’s pullRef is the pull request’s baseRef and the branch’s pushRef is the pull requests headRef. In other words, if you’ve configured git pull and git push to work, then gh pr commands should just work.² The diagram below, a general version of Figure 5, demonstrates this nicely:

Figure 14: the triangular workflow supported by the GitHub CLI with respect to a branch’s pullRef and pushRef. This is the generalized version of Figure 5

Conclusion

We’re constantly working to improve the GitHub CLI, and we’d like the behavior of the GitHub CLI to reasonably reflect the behavior of Git. This was a team effort—everyone contributed to understanding, reviewing, and testing the code to enable this enhanced gh pr command set functionality.

It also couldn’t have happened without the support of our contributors, so we extend our thanks to them:

@Frederick888 for opening the original pull request
@benknoble for his support with pull request review and feedback
@phil-blain for highlighting the configurations we’ve talked about here on the original issue
@neutrinoceros and @rd-yan-farba for reporting a couple of bugs that the team fixed in v2.66.1
@pdunnavant for reporting the bug that we fixed in v2.71.1
@cs278 for reporting the bug that we fixed in v2.71.2.

CLI native support for triangular workflows was 4.5 years in the making, and we’re proud to have been able to provide this update for the community.

The GitHub CLI Team
@andyfeller, @babakks, @bagtoad, @jtmcg, @mxie, @RyanHecht, and @williammartin

Some commands in gh are opinionated about remote names and will resolve remotes in this order: upstream, github, origin, . There is a convenience command you can run to supersede this:* gh repo set-default [] to override the default behavior above and preferentially resolve as the default remote repo. ↩
If you find a git configuration that doesn’t work, please open an issue in the OSS repo so we can fix it. ↩

The post How the GitHub CLI can now enable triangular workflows appeared first on The GitHub Blog.

Git turns 20: A Q&A with Linus Torvalds

Taylor Blau — Mon, 07 Apr 2025 22:58:14 +0000

Exactly twenty years ago, on April 7, 2005, Linus Torvalds made the very first commit to a new version control system called Git. Torvalds famously wrote Git in just 10 days after Linux kernel developers lost access to their proprietary tool, BitKeeper, due to licensing disagreements. In fact, in that first commit, he’d written enough of Git to use Git to make the commit!

Git’s unconventional and decentralized design—nowadays ubiquitous and seemingly obvious—was revolutionary at the time, and reshaped how software teams collaborate and develop. (To wit, GitHub!)

To celebrate two decades of Git, we sat down with Linus himself to revisit those early days, explore the key design decisions behind Git’s lasting success, and discuss how it forever changed software development.

Check out the transcript of our interview below, and watch the full video above.

The following transcript has been lightly edited for clarity.

Taylor Blau: It’s been 20 years, almost to the hour, since Git was self-hosted enough to write its initial commit. Did you expect to be sitting here 20 years later, still using it and talking about it?

Linus Torvalds: Still using it, yes. Maybe not talking about it. I mean, that has been one of the big surprises—basically how much it took over the whole SCM world. I saw it as a solution to my problems, and I obviously thought it was superior. Even literally 20 years ago to the day, I thought that first version, which was pretty raw—to be honest, even that version was superior to CVS.

But at the same time, I’d seen CVS just hold on to the market—I mean, SVN came around, but it’s just CVS in another guise, right?—for many, many decades. So I was like, okay, this market is very sticky. I can’t use CVS because I hate it with a passion, so I’ll do my own thing. I couldn’t use BitKeeper, obviously, anymore. So I was like, okay, I’ll do something that works for me, and I won’t care about anybody else. And really that showed in the first few months and years—people were complaining that it was kind of hard to use, not intuitive enough. And then something happened, like there was a switch that was thrown.

“I’ll do something that works for me, and I won’t care about anybody else.”

Well, you mentioned BitKeeper. Maybe we can talk about that.

Sure.

Pretty famously, you wrote the initial version of Git in around 10 or so days as a replacement for the kernel.

Yes and no. It was actually fewer than—well, it was about 10 days until I could use it for the kernel, yes. But to be fair, the whole process started like December or November the year before, so 2004.

What happened was BitKeeper had always worked fairly well for me. It wasn’t perfect, but it was light years ahead of anything else I’ve tried. But BitKeeper in the kernel community was always very, like, not entirely welcomed by the community because it was commercial. It was free for open source use because Larry McVoy, who I knew, really liked open source. I mean, at the same time, he was making a business around it and he wanted to sell BitKeeper to big companies. [It] not being open source and being used for one of the biggest open source projects around was kind of a sticking point for a lot of people. And it was for me, too.

I mean, to some degree I really wanted to use open source, but at the same time I’m very pragmatic and there was nothing open source that was even remotely good enough. So I was kind of hoping that something would come up that would be better. But what did come up was that Tridge in Australia basically reverse engineered BitKeeper, which wasn’t that hard because BitKeeper internally was basically a good wrapper around SCCS, which goes back to the 60s. SCCS is almost worse than CVS.

But that was explicitly against the license rules for BitKeeper. BitKeeper was like, you can use this for open source, but you can’t reverse engineer it. And you can’t try to clone BitKeeper. And that made for huge issues. And this was all in private, so I was talking to Larry and I was emailing with Tridge and we were trying to come up with a solution, but Tridge and Larry were really on completely opposite ends of the spectrum and there was no solution coming up.

So by the time I started writing Git, I had actually been thinking about the issue for four months and thinking about what worked for me and thinking about “How do I do something that does even better than BitKeeper does but doesn’t do it the way BitKeeper does it?” I did not want to be in the situation where Larry would say, “Hey, you did the one thing you were not supposed to do.”

“…how do I do something that does even better than BitKeeper does, but doesn’t do it the way BitKeeper does it.”

So yes, the writing part was maybe 10 days until I started using Git for the kernel, but there was a lot of mental going over what the ideas should be.

I want to talk about maybe both of those things. We can start with that 10-day period. So as I understand it, you had taken that period as a time away from the kernel and had mostly focused on Git in isolation. What was that transition like for you to just be working on Git and not thinking about the kernel?

Well, since it was only two weeks, it ended up being that way. It wasn’t actually a huge deal. I’d done things like that just for—I’ve been on, like in the last 35 years, I’ve been on vacation a couple of times, right, not very many times. But I have been away from the kernel for two weeks at a time before.

And it was kind of interesting because it was—one of my reactions was how much easier it is to do programming in the userspace. There’s so much less you need to care about. You don’t need to worry about memory allocations. You don’t need to worry about a lot of things. And debugging is so much easier when you have all this infrastructure that you’re writing when you’re doing a kernel.

So it was actually somewhat—I mean, I wouldn’t say relaxing, but it was fun to do something userspace-y where I had a fairly clear goal of what I wanted. I mean, a clear goal in the sense I knew the direction. I didn’t know the details.

One of the things I find so interesting about Git, especially 20 years on, is it’s so… the development model that it encourages, to me, seems so simple that it’s almost obvious at this point. But I don’t say that as a reductive term. I think there must have been quite a lot of thought into distilling down from the universe of source control ideas down into something that became Git. Tell me, what were the non-obvious choices you made at the time?

The fact that you say it’s obvious now, I think it wasn’t obvious at the time. I think one of the reasons people found Git to be very hard to use was that most people who started without using Git were coming from a background of something CVS like. And the Git mindset, I came at it from a file system person’s standpoint, where I had this disdain and almost hatred of most source control management projects, so I was not at all interested in maintaining the status quo.

And like the biggest issue for me—well, there were two huge issues. One was performance—back then I still applied a lot of patches, which I mean, Git has made almost go away because now I just merge other people’s code.

But for me, one of the goals was that I could apply a patch series in basically half a minute, even when it was like 50, 100 patches.

You shouldn’t need a coffee to…

Exactly. And that was important to me because it’s actually a quality-of-life thing. It’s one of those things where if things are just instant, some mistake happens, you see the result immediately and you just go on and you fix it. And some of the other projects I had been looking at took like half a minute per patch, which was not acceptable to me. And that was because the kernel is a very large project and a lot of these SCMs were not designed to be scalable.

“And that was important to me because it’s actually a quality-of-life thing.”

So that was one of the issues. But one of the issues really was, I knew I needed it to be distributed, but it needed to be really, really stable. And people kind of think that using the SHA-1 hashes was a huge mistake. But to me, SHA-1 hashes were never about the security. It was about finding corruption.

Because we’d actually had some of that during the BitKeeper things, where BitKeeper used CRCs and MD5s, right, but didn’t use it for everything. So one of the early designs for me was absolutely everything was protected by a really good hash.

And that kind of drove the whole project—having two or three really fundamental design ideas. Which is why at a low level it is actually fairly simple right and then the complexities are in the details and the user interfaces and in all the things it has to be able to do—because everybody wants it to do crazy things. But having a low level design that has a few core concepts made it easier to write and much easier to think and also to some degree explain to people what the ideas are.

And I kind of compare it to Unix. Unix has like a core philosophy of everything is a process, everything is a file, you pipe things between things. And then the reality is it’s not actually simple. I mean, there’s the simple concepts that underlie the philosophy, but then all the details are very complicated.

I think that’s what made me appreciate Unix in the first place. And I think Git has some of the same kind of, there’s a fundamental core simplicity to the design and then there’s the complexity of implementation.

There’s a through line from Unix into the way that Git was designed.

Yes.

You mentioned SHA-1. One of the things that I think about in this week or two where you were developing the first version of Git is you made a lot of decisions that have stuck with us.

Yeah.

Were there any, including SHA-1 or not, that you regretted or wish you had done differently?

Well, I mean, SHA-1 I regret in the sense that I think it caused a lot of pointless churn with the whole “trying to support SHA-256 as well as SHA-1.” And I understand why it happened, but I do think it was mostly pointless.

I don’t think there was a huge, real need for it, but people were worried, so it was short. So I think there’s a lot of wasted effort there. There’s a number of other small issues. I think I made a mistake in how the index file entries are sorted. I think there’s these stupid details that made things harder than they should be.

But at the same time, many of those things could be fixed, but they’re small enough. It doesn’t really matter. All the complexities are elsewhere in the end.

So it sounds like you have few regrets. I think that’s good. Were there any moments where you weren’t sure that what you were trying to achieve was going to work or come together or be usable? Or did you already have a pretty clear idea?

I had a clear idea of the initial stages but I wasn’t sure how it would work in the long run. So honestly, after the first week, I had something that was good for applying patches, but not so much for everything else. I had the basics for doing merges, and the data structures were in place for that, but it actually took, I think it took an additional week before I did my first merge.

There were a number of things where I had kind of the big picture and result in mind, but I wasn’t sure if I’d get there. Yeah, the first steps, I mean the first week or two, I mean, you can go and look at the code—and people have—and it is not complicated code.

No.

I think the first version was 10,000 lines or something.

You can more or less read it in a single sitting.

Yeah, and it’s fairly straightforward and doesn’t do a lot of error checking and stuff like that. It’s really a, “Let’s get this working because I have another project that I consider to be more important than I need to get back to.” It really was. It happened where I would hit issues that required me to do some changes.

“There were a number of things where I had kind of the big picture and result in mind, but I wasn’t sure if I’d get there.”

The first version—I think we ended up doing a backwards incompatible object store transfer at one point. At least fsck complains about some of the old objects we had because I changed the data format.

I didn’t know where that came from.

Yeah, no. The first version just was not doing everything it needed to do.

And I forget if I actually did a conversion or not. I may not have ever needed to convert. And we just have a few warnings for a few objects in the kernel where fsck will say, “Hey, this is an old, no longer supported format.” That kind of thing. But on the other, on the whole, it really worked, I mean, surprisingly well.

The big issue was always people’s acceptance of it.

Right.

And that took a long time.

“But on the other, on the whole, it really worked, I mean, surprisingly well.”

Well, we talked a little bit about how merging was put in place but not functional until maybe week two or week three. What were the other features that you left out of the initial version that you later realized were actually quite essential to the project?

Well, it wasn’t so much “later realized,” it was stuff that I didn’t care about, but I knew that if this is going to go anywhere, somebody else will. I mean, the first week when I was using it for the kernel, I was literally using the raw, what are now called “plumbing commands” by hand.

Of course.

Because there was no so-called porcelain. There was nothing above that to make it usable. So to make a commit, you’d do these very arcane things.

Set your index, commit-tree.

Yeah, commit-tree, write, and that just returns an SHA that you write by hand into the head file and that was it.

Did hash-object exist in the first version?

I think that was one of the first binaries that I had where I could just check that I could hash everything by hand and it would return the hash to standard out, then you could do whatever you wanted to it. But it was like the early porcelain was me scripting shell scripts around these very hard-to-use things.

And honestly, it wasn’t easy to use even with my shell scripts.

But to be fair, the first initial target audience for this were pretty hardcore kernel people who had been using BitKeeper. They at least knew a lot of the concepts I was aiming for. People picked it up.

It didn’t take that long before some other kernel developers started actually using it. I was actually surprised by how quickly some source control people started coming in. And I started getting patches from the outside within days of making the first Git version public.

I want to move forward a bit. You made the decision to hand off maintainership to Junio pretty early on in the project. I wonder if you could tell me a little bit about what it’s been like to watch him run the project and really watch the community interact with it at a little bit of a distance after all these years?

I mean, to be honest, I maintained Git for like three or four months. I think I handed it off in August [of 2005] or something like that.

And when I handed it off, I truly just handed it off. I was like, “I’m still around.” I was still reading the Git mailing list, which I don’t do anymore. Junio wanted to make sure that if he asked me anything, I’d be okay.

But at the same time, I was like, this is not what I want to do. I mean, this is… I still feel silly. My oldest daughter went off to college, and two months later, she sends this text to me and says that I’m more well-known at the computer science lab for Git than for Linux because they actually use Git for everything there. And I was like, Git was never a big thing for me. Git was an “I need to get this done to do the kernel.” And it’s kind of ridiculous that, yes, I used four months of my life maintaining it, but now, at the 20 years later…

Yes, you should definitely talk to Junio, not to me because he’s been doing a great job and I’m very happy it worked out so well. But to be honest I’ll take credit for having worked with people on the internet for long enough that I was like—during the four months I was maintaining Git, I was pretty good at picking up who has got the good taste to be a good maintainer.

My oldest daughter went off to college, and two months later, she sends this text to me and says that I’m more well known at the computer science lab for Git than for Linux because they actually use Git for everything there.

That’s what it’s about—taste—for you.

For me, it’s hard to describe. You can see it in patches, you can see it in how they react to other people’s code, “how they think” kind of things. Junio was not the first person in the project, but he was one of the early ones that was around from pretty much week one after I had made it public.

So he was one of the early persons—but it wasn’t like you’re the first one, tag you’re it. It was more like okay, I have now seen this person work for three months and I don’t want to maintain this project. I will ask him if he wants to be the maintainer. I think he was a bit nervous at first, but it really has been working out.

Yeah he’s certainly run the project very admirably in the…

Yeah, I mean, so taste is to me very important, but practically speaking, the fact that you stick around with a project for 20 years, that’s the even more important part, right? And he has.

I think he’s knowledgeable about almost every area of the tree to a surprising degree.

Okay, so we’ve talked a lot about early Git. I want to talk a little bit about the middle period of Git maybe, or maybe even the period we’re in now.

One of the things that I find so interesting about the tool, given how ubiquitous it’s become, it’s clearly been effective at aiding the kernel’s development, but it’s also been really effective for university students writing little class projects on their laptops. What do you think was unique about Git that made it effective at both extremes of the software engineering spectrum?

So the distributed nature really ends up making so many things so easy and that was one big part that set Git apart from pretty much all SCMs before, was… I mean there had been distributed SCMs, but there had, as far as I know, never been something where it was like the number one design goal—I mean, along with the other number one design goals—where it means that you can work with Git purely locally and then later if you want to make it available in any other place it’s so easy.

And that’s very different from, say, CVS where you have to set up this kind of repository and if you ever want to move it anywhere else it’s just very very painful and you can’t share it with somebody else without losing track of it.

Or there’s always going to be one special repository when you’re using a traditional SCM and the fact that Git didn’t do that, and very much by design didn’t do that, I mean that’s what made services like GitHub trivial. I mean I’m trivializing GitHub because I realized there’s a lot of work in making all the infrastructure around Git, but at the same time the basic Git hosting site is basically nothing because the whole design of Git is designed around making it easy to copy, and every repository is the same and equal.

And I think that ended up being what made it so easy to then use as an individual developer. When you make a new Git repository, it’s not a big deal. It’s like you do git init and you’re done. And you don’t need to set up any infrastructure and you don’t need to do any of the stuff that you traditionally needed to do with an SCM. And then if that project ever grows to be something where you decide, “Oh, maybe I want other people to work with it,” that works too. And again, you don’t have to do anything about it. You just push it to GitHub and again, you’re done.

That was something I very much wanted. I didn’t realize how many other people wanted it, too. I thought people were happy with CVS and SVN. Well, I didn’t really think that, but I thought they were sufficient for most people, let’s put it that way.

I’ve lived my whole life with version control as part of software development, and one of the things I’m curious about is how you see Git’s role in shaping how software development gets done today.

That’s too big of a question for me. I don’t know. It wasn’t why I wrote Git. I wrote it for my own issues.

I think GitHub and the other hosting services have made it clear how easy it is now to make all these random small projects in ways that it didn’t used to be. And that has resulted in a lot of dead projects too. You find these one-off things where somebody did something and left it behind and it’s still there.

But does that really change how software development is done in the big picture? I don’t know. I mean, it changes the details. It makes collaboration easier to some degree. It makes it easier to do these throwaway projects. And if they don’t work, they don’t work. And if they do work, now you can work together with other people. But I’m not sure it changed anything fundamentally in software development.

“It makes collaboration easier to some degree.”

Moving ahead a little bit, modern software development has never been changing faster than it is today…

Are you going to say the AI word?

I’m not going to say the AI word, unless you want me to.

No, no, no.

…what are some of the areas of the tool that you think have evolved or maybe still need to evolve to continue to support the new and demanding workflows that people are using it for?

I’d love to see more bug tracking stuff. I mean, everybody is doing that. I mean, there are, whether you call it bug tracking or issues or whatever you want to call it, I’d love to see that be more unified. Because right now it’s very fragmented where every single hosting site does their own version of it.

And I understand why they do it. A, there is no kind of standard good base. And B, it’s also a way to do the value add and keep people in that ecosystem even when Git itself means that it’s really easy to move the code.

But I do wish there was a more unified thing where bug tracking and issues in general would be something that would be more shared among the hosting sites.

You mentioned earlier that it’s at least been a while since you regularly followed the mailing list.

Yeah.

In fact, it’s been a little bit of time since you even committed to the project. I think by my count, August of 2022 was the last time…

Yeah, I have a few experimental patches in my tree that I just keep around. So these days I do a pull of the Git sources and I have, I think, four or five patches that I use myself. And I think I’ve posted a couple of them to the Git mailing list, but they’re not very important. They’re like details that tend to be very specific to my workflow.

But honestly, I mean, this is true of the Linux kernel, too. I’ve been doing Linux for 35 years, and it did everything I needed in the first year—right? And the thing that keeps me going on the kernel side is, A, hardware keeps evolving, and a kernel needs to evolve with that, of course. But B, it’s all the needs of other people. Never in my life would I need all of the features that the kernel does. But I’m interested in kernels, and I’m still doing that 35 years later.

When it came to Git, it was like Git did what I needed within the first year. In fact, mostly within the first few months. And when it did what I needed, I lost interest. Because when it comes to kernels, I’m really interested in how they work, and this is what I do. But when it comes to SCMs, it’s like—yeah, I’m not at all interested.

“When it came to Git, it was like Git did what I needed within the first year. In fact, mostly within the first few months.”

Have there been any features that you’ve followed in the past handful of years from the project that you found interesting?

I liked how the merge strategies got slightly smarter. I liked how some of the scripts were finally rewritten in C just to make them faster, because even though I don’t apply, like, 100 patch series anymore, I do end up doing things like rebasing for test trees and stuff like that and having some of the performance improvements.

But then, I mean, those are fairly small implementation details in the end. They’re not the kind of big changes that, I mean—I think the biggest change that I was still tracking a few years ago was all the multiple hashes thing, which really looks very painful to me.

Have there been any tools in the ecosystem that you’ve used alongside? I mean, I’m a huge tig user myself. I don’t know if you’ve ever used this.

I never—no, even early on when we had, like when Git was really hard to use and they were like these add-on UIs, the only wrapper around Git I ever used was gitk. And that was obviously integrated into Git fairly quickly, right? But I still use the entire command language. I don’t use any of the editor integration stuff. I don’t do any of that because my editor is too stupid to integrate with anything, much less Git.

I mean, I occasionally do statistics on my Git history usage just because I’m like, “What commands do I use?” And it turns out I use five Git commands. And git merge and git blame and git log are three of them, pretty much. So, I’m a very casual user of Git in that sense.

I have to ask about what the other two are.

I mean obviously git commit and git pull. I did this top five thing at some point and it may have changed, but there’s not a lot of—I do have a few scripts that then do use git rev-list and go really low and do statistics for the project…

In terms of your interaction with the project, what do you feel like have been some of the features in the project either from early on or in the time since that maybe haven’t gotten the appreciation they deserve?

I mean Git has gotten so much more appreciation than it deserves. But that’s the reverse of what I would ask me. A big thing for me was when people actually started appreciating what Git could do instead of complaining about how different it was.

And that, I mean, that was several years after the initial Git. I think it was these strange web developers who started using Git in a big way. It’s like Ruby on Rails, I think. Which I had no idea, I still don’t know what Ruby even is. But the Ruby on Rails people started using Git sometime in 2008, something like this.

It was strange because it brought in a completely new kind of Git user—at least one that I hadn’t seen before. It must have existed in the background, it just made it very obvious that suddenly you had all these young people who had never used SCM in their life before and Git was the first thing they ever used and it was what the project they were using was using, so it was kind of the default thing.

And I think it changed the dynamics. When you didn’t have these old timers who had used a very different SCM their whole life, and suddenly you had young people who had never seen anything else and appreciated it, and instead of saying, “Git is so hard,” I started seeing these people who were complaining about “How do I do this when this old project is in CVS?” So, that was funny.

But yeah, no. The fact that people are appreciating Git, I mean, way more than I ever thought. Especially considering the first few years when I got a lot of hate for it.

Really?

Oh, the complaints kept coming.

Tell me about it.

Oh, I mean, it’s more like I can’t point to details. You’d have to Google it. But the number of people who sent me, “Why does it do this?” And the flame wars over my choice of names. For example, I didn’t have git status, which actually is one of the commands I use fairly regularly now.

It’s in the top five?

It’s probably not in the top five, but it’s still something fairly common. I don’t think I’d ever used it with CVS because it was so slow.

And people had all these expectations. So I just remember the first few years, the complaints about why the names of the subcommands are different for no good reason. And the main reason was I just didn’t like CVS very much, so I did things differently on purpose sometimes.

And the shift literally between 2007 and 2010—those years, when people went from complaining about how hard Git was to use to really appreciating some of the power of Git, was interesting to me.

I want to spend maybe just a moment thinking about the future of the project. In your mind, what are the biggest challenges that Git either is facing or will face?

I don’t even know. I mean, it has just been so much more successful than I ever… I mean, the statistics are insane. It went from use for the kernel and a couple of other projects to being fairly popular to now being like 98% of the SCMs used. I mean, that’s a number I saw in some report from last year.

So, I mean, it’s—I don’t know how true that is, but it’s like big. And in that sense, I wouldn’t worry about challenges because I think SCMs, there is a very strong network effect. And that’s probably why, once it took off, it took off in a big way. Just when every other project is using Git, by default, all the new projects will use Git, too. Because the pain of having two different SCMs for two different projects to work on is just not worth it.

So I would not see that as a challenge for Git as much as I would see it as a challenge for anybody else who thinks they have something better. And honestly, because Git does everything that I need, the challenges would likely come from new users.

I mean, we saw some of that. We saw some of that with people who used Git in ways that explicitly were things I consider to be the wrong approach. Like Microsoft, the monorepo for everything, which showed scalability issues. I’m not saying Microsoft was wrong to do that. I’m saying this is literally what Git was not designed to do.

I assume most of those problems have been solved because I’m not seeing any complaints, but at the same time I’m not following the Git mailing list as much as I used to.

I don’t even know if the large file issue is considered to be solved. If you want to put a DVD image in Git, that was like, why would you ever want to do that?

But, I mean, that’s the challenge. When Git is everywhere, you find all these people who do strange things that you would never imagine—that I didn’t imagine and that I consider to be actively wrong.

But hey, I mean, that’s a personal opinion. Clearly other people have very different personal opinions. So that’s always a challenge. I mean, that’s something I see in the kernel, too, where I go, why the hell are you doing that? I mean, that shouldn’t work, but you’re clearly doing it.

“When Git is everywhere, you find all these people who do strange things that you would never imagine—that I didn’t imagine and that I consider to be actively wrong.”

We talked about how Git is obviously a huge dominant component in software development. At the same time, there are new version control upstarts that seem to pop up. Pijul comes to mind, Jujutsu, Piper, and things like that. I’m curious if you’ve ever tried any of them.

No, I don’t. I mean, literally, since I came from this, from being completely uninterested in source control, why would I look at alternatives now that I have something that works for me?

I really came into Git not liking source control, and now I don’t hate it anymore. And I think that databases are my particular—like, that’s the most boring-thing-in-life thing. But SCMs still haven’t been something I’m really interested in.

“I really came into Git not liking source control, and now I don’t hate it anymore.”

You’ve given me a little bit of an end to my last question for you. So on schedule, Linux came about 34 years ago, Git 20…

Oh, that question.

And so we’re maybe five or so years overdue for the next big thing.

No, no, I see it the other way around. All the projects that I’ve had to make, I had to make because I couldn’t find anything better that somebody else did.

But I much prefer other people solving my problems for me. So me having to come up with a project is actually a failure of the world—and the world just hasn’t failed in the last 20 years for me.

I started doing Linux because I needed an operating system and there was nothing that suited my needs. I started doing Git for the same reason. And there hasn’t been any… I started Subsurface, which is my divelog, well, no longer my divelog software, but that was so specialized that it never took off in a big way. And that solved one particular problem, but my computer use is actually so limited that I think I’ve solved all the problems.

Part of it is probably, I’ve been doing it so long that I can only do things in certain ways. I’m still using the same editor that I used when I was in college because my fingers have learned one thing and there’s no going back. And I know the editor is crap and I maintain it because it’s a dead project that nobody else uses.

“But I much prefer other people solving my problems for me. So me having to come up with a project is actually a failure of the world—and the world just hasn’t failed in the last 20 years for me.“

So, I have a source tree and I compile my own version every time I install a new machine and I would suggest nobody ever use that editor but I can’t. I’ve tried multiple times finding an editor that is more modern and does fancy things like colorize my source code and do things like that. And every time I try it, I’m like, “Yeah, these hands are too old for this.” So I really hope there’s no project that comes along that makes me go, “I have to do this.”

Well, on that note.

On that note.

Thank you for 20 years of Git.

Well, hey, I did it for my own very selfish reasons. And really—I mean, this is the point to say again that yes, out of the 20 years, I spent four months on it. So really, all the credit goes to Junio and all the other people who are involved in Git that have by now done so much more than I ever did.

In any event, thank you.

The post Git turns 20: A Q&A with Linus Torvalds appeared first on The GitHub Blog.

Highlights from Git 2.49

Taylor Blau — Fri, 14 Mar 2025 17:19:46 +0000

The open source Git project just released Git 2.49 with features and bug fixes from over 89 contributors, 24 of them new. We last caught up with you on the latest in Git back when 2.48 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Faster packing with name-hash v2

Many times over this series of blog posts, we have talked about Git’s object storage model, where objects can be written individually (known as “loose” objects), or grouped together in packfiles. Git uses packfiles in a wide variety of functions, including local storage (when you repack or GC your repository), as well as when sending data to or from another Git repository (like fetching, cloning, or pushing).

Storing objects together in packfiles has a couple of benefits over storing them individually as loose. One obvious benefit is that object lookups can be performed much more quickly in pack storage. When looking up a loose object, Git has to make multiple system calls to find the object you’re looking for, open it, read it, and close it. These system calls can be made faster using the operating system’s block cache, but because objects are looked up by a SHA-1 (or SHA-256) of their contents, this pseudo-random access isn’t very cache-efficient.

But most interesting to our discussion is that since loose objects are stored individually, we can only compress their contents in isolation, and can’t store objects as deltas of other similar objects that already exist in your repository. For example, say you’re making a series of small changes to a large blob in your repository. When those objects are initially written, they are each stored individually and zlib compressed. But if the majority of the file’s content remains unchanged among edit pairs, Git can further compress these objects by storing successive versions as deltas of earlier ones. Roughly speaking, this allows Git to store the changes made to an object (relative to some other object) instead of multiple copies of nearly identical blobs.

But how does Git figure out which pairs of objects are good candidates to store as delta-base pairs? One useful proxy is to compare objects that appear at similar paths. Git does this today by computing what it calls a “name hash”, which is effectively a sortable numeric hash that weights more heavily towards the final 16 non-whitespace characters in a filepath (source). This function comes from Linus all the way back in 2006, and excels at grouping functions with similar extensions (all ending in .c, .h, etc.), or files that were moved from one directory to another (a/foo.txt to b/foo.txt).

But the existing name-hash implementation can lead to poor compression when there are many files that have the same basename but very different contents, like having many CHANGELOG.md files for different subsystems stored together in your repository. Git 2.49 introduces a new variant of the hash function that takes more of the directory structure into account when computing its hash. Among other changes, each layer of the directory hierarchy gets its own hash, which is downshifted and then XORed into the overall hash. This creates a hash function which is more sensitive to the whole path, not just the final 16 characters.

This can lead to significant improvements both in packing performance, but also in the resulting pack’s overall size. For instance, using the new hash function was able to improve the time it took to repack microsoft/fluentui from ~96 seconds to ~34 seconds, and slimming down the resulting pack’s size from 439 MiB to just 160 MiB (source).

While this feature isn’t (yet) compatible with Git’s reachability bitmaps feature, you can try it out for yourself using either git repack’s or git pack-objects’s new --name-hash-version flag via the latest release.

[source]

Backfill historical blobs in partial clones

Have you ever been working in a partial clone and gotten this unfriendly output?

$ git blame README.md
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (1/1), 1.64 KiB | 8.10 MiB/s, done.
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (1/1), 1.64 KiB | 7.30 MiB/s, done.
[...]

What happened here? To understand the answer to that question, let’s work through an example scenario:

Suppose that you are working in a partial clone that you cloned with --filter=blob:none. In this case, your repository is going to have all of its trees, commit, and annotated tag objects, but only the set of blobs which are immediately reachable from HEAD. Put otherwise, your local clone only has the set of blobs it needs to populate a full checkout at the latest revision, and loading any historical blobs will fault in any missing objects from wherever you cloned your repository.

In the above example, we asked for a blame of the file at path README.md. In order to construct that blame, however, we need to see every historical version of the file in order to compute the diff at each layer to figure out whether or not a revision modified a given line. But here we see Git loading in each historical version of the object one by one, leading to bloated storage and poor performance.

Git 2.49 introduces a new tool, git backfill, which can fault in any missing historical blobs from a --filter=blob:none clone in a small number of batches. These requests use the new path-walk API (also introduced in Git 2.49) to group together objects that appear at the same path, resulting in much better delta compression in the packfile(s) sent back from the server. Since these requests are sent in batches instead of one-by-one, we can easily backfill all missing blobs in only a few packs instead of one pack per blob.

After running git backfill in the above example, our experience looks more like:

$ git clone --sparse --filter=blob:none [email protected]:git/git.git[...] # downloads historical commits/trees/tags
$ cd git
$ git sparse-checkout add builtin
[...] # downloads current contents of builtin/
$ git backfill --sparse
[...] # backfills historical contents of builtin/
$ git blame -- builtin/backfill.c
85127bcdeab (Derrick Stolee 2025-02-03 17:11:07 +0000 1) /* We need this macro to access core_apply_sparse_checkout */
85127bcdeab (Derrick Stolee 2025-02-03 17:11:07 +0000 2) #define USE_THE_REPOSITORY_VARIABLE
85127bcdeab (Derrick Stolee 2025-02-03 17:11:07 +0000 3)
[...]

But running git backfill immediately after cloning a repository with --filter=blob:none doesn’t bring much benefit, since it would have been more convenient to simply clone the repository without an object filter enabled in the first place. When using the backfill command’s --sparse option (the default whenever the sparse checkout feature is enabled in your repository), Git will only download blobs that appear within your sparse checkout, avoiding objects that you wouldn’t checkout anyway.

To try it out, run git backfill in any --filter=blob:none clone of a repository using Git 2.49 today!

[source, source]

We discussed above that Git uses compression powered by zlib when writing loose objects, or individual objects within packs and so forth. zlib is an incredibly popular compression library, and has an emphasis on portability. Over the years, there have been a couple of popular forks (like intel/zlib and cloudflare/zlib) that contain optimizations not present in upstream zlib.

The zlib-ng fork merges many of the optimizations made above, as well as removes dead code and workarounds for historical compilers from upstream zlib, placing a further emphasis on performance. For instance, zlib-ng has support for SIMD instruction sets (like SSE2, and AVX2) built-in to its core algorithms. Though zlib-ng is a drop-in replacement for zlib, the Git project needed to update its compatibility layer to accommodate zlib-ng.

In Git 2.49, you can now build Git with zlib-ng by passing ZLIB_NG when building with the GNU Make, or the zlib_backend option when building with Meson. Early experimental results show a ~25% speed-up when printing the contents of all objects in the Git repository (from ~52.1 seconds down to ~40.3 seconds).

[source]

This release marks a major milestone in the Git project with the first pieces of Rust code being checked in. Specifically, this release introduces two Rust crates: libgit-sys, and libgit which are low- and high-level wrappers around a small portion of Git’s library code, respectively.

The Git project has long been evolving its code to be more library-oriented, doing things like replacing functions that exit the program with ones that return an integer and let the caller decide to exit or, cleaning up memory leaks, etc. This release takes advantage of that work to provide a proof-of-concept Rust crate that wraps part of Git’s config.h API.

This isn’t a fully-featured wrapper around Git’s entire library interface, and there is still much more work to be done throughout the project before that can become a reality, but this is a very exciting step along the way.

[source]

Speaking of the “libification” effort, there were a handful of other related changes that went into this release. The ongoing effort to move away from global variables like the_repository continues, and many more commands in this release use the provided repository instead of using the global one.

This release also saw a lot of effort being put into squelching -Wsign-compare warnings, which occur when a signed value is compared against an unsigned one. This can lead to surprising behavior when comparing, say, negative signed values against unsigned ones, where a comparison like -1 < 2 (which should return true) ends up returning false instead.

Hopefully you won’t notice these changes in your day-to-day use of Git, but they are important steps along the way to bringing the project closer to being able to be used as a standalone library.

[source, source, source, source, source]

Long-time readers might remember our coverage of Git 2.39 where we discussed git repack’s new --expire-to option. In case you’re new around here or could use a refresher, we’ve got you covered. The --expire-to option in git repack controls the behavior of unreachable objects which were pruned out of the repository. By default, pruned objects are simply deleted, but --expire-to allows you to move them off to the side in case you want to hold onto them for backup purposes, etc.

git repack is a fairly low-level command though, and most users will likely interact with Git’s garbage collection feature through git gc. In large part, git gc is a wrapper around functionality that is implemented in git repack, but up until this release, git gc didn’t expose its own command-line option to use --expire-to. That changed in Git 2.49, where you can now experiment with this behavior via git gc --expire-to!

[source]

You may have read that Git’s help.autocorrect feature is too fast for Formula One drivers. In case you haven’t, here are the details. If you’ve ever seen output like:

$ git psuh
git: 'psuh' is not a git command. See 'git --help'.

The most similar command is
push

…then you have used Git’s autocorrect feature. But its configuration options don’t quite match the convention of other, similar options. For instance, in other parts of Git, specifying values like “true”, “yes”, “on”, or “1” for boolean-valued settings all meant the same thing. But help.autocorrect deviates from that trend slightly: it has special meanings for “never”, “immediate”, and “prompt”, but interprets a numeric value to mean that Git should automatically run whatever command it suggests after waiting that many deciseconds.

So while you might have thought that setting help.autocorrect to “1” would enable the autocorrect behavior, you’d be wrong: it will instead run the corrected command before you can even blink your eyes¹. Git 2.49 changes the convention of help.autocorrect to interpret “1” like other boolean-valued commands, and positive numbers greater than 1 as it would have before. While you can’t specify that you want the autocorrect behavior in exactly 1 decisecond anymore, you probably never meant to anyway.

[source, source]

You might be aware of git clone’s --branch option, which allows you to clone a repository’s history leading up to a specific branch or tag instead of the whole thing. This option is often used in CI farms when they want to clone a specific branch or tag for testing.

But what if you want to clone a specific revision that isn’t at any branch or tag in your repository, what do you do? Prior to Git 2.49, the only thing you could do is initialize an empty repository and fetch a specific revision after adding the repository you’re fetching from as a remote.

Git 2.49 introduces a convenient alternative to round out --branch‘s functionality with a new --revision option, which fetches history leading up to the specified revision, regardless of whether or not there is a branch or tag pointing at it.

[source]

Speaking of remotes, you might know that the git remote command uses your repository’s configuration to store the list of remotes that it knows about. You might not know that there were actually two different mechanisms which preceded storing remotes in configuration files. In the very early days, remotes were configured via separate files in $GIT_DIR/branches (source). A couple of weeks later, the convention changed to use $GIT_DIR/remote instead of the /branches directory (source).

Both conventions have long since been deprecated and replaced with the configuration-based mechanism we’re familiar with today (source, source). But Git has maintained support for them over the years as part of its backwards compatibility. When Git 3.0 is eventually released, these features will be removed entirely.

If you want to learn more about Git’s upcoming breaking changes, you can read all about them in Documentation/BreakingChanges.adoc. If you really want to live on the bleeding edge, you can build Git with the WITH_BREAKING_CHANGES compile time switch, which compiles out features that will be removed in Git 3.0.

[source, source]

Last but not least, the Git project had two wonderful Outreachy interns that recently completed their projects! Usman Akinyemi worked on adding support to include uname information in Git’s user agent when making HTTP requests, and Seyi Kuforiji worked on converting more unit tests to use the Clar testing framework.

You can learn more about their projects here and here. Congratulations, Usman and Seyi!

[source, source, source, source]

🎉 Join us at Git Merge 2025

To celebrate Git’s 20th anniversary, we’re hosting Git Merge 2025 at GitHub HQ in San Francisco on September 29–30, 2025. It’s a conference dedicated to the version control tool that started it all—and the people who use it every day. The call for speakers is open until May 13, so if you’ve got a great Git story to tell, we’d love to hear it.

The full schedule will be available in July. See you there!

The rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for 2.49, or any previous version in the Git repository.

It’s true. It takes humans about 100-150 milliseconds to blink their eyes, and setting help.autocorrect to “1” will run the suggested command after waiting only 100 milliseconds (1 decisecond). ↩

The post Highlights from Git 2.49 appeared first on The GitHub Blog.

Git security vulnerabilities announced

Taylor Blau — Tue, 14 Jan 2025 18:04:36 +0000

Today, the Git project released new versions to address a pair of security vulnerabilities, CVE-2024-50349 and CVE-2024-52006, that affect all prior versions of Git.

CVE-2024-50349

When Git needs to fill in credentials interactively without the use of a credential helper, it prints out the hostname and asks the user to fill in the appropriate username/password pair for that host. However, Git prints out the hostname after URL-decoding it. This allows an attacker to craft URLs containing ANSI escape sequences that may be used to construct an intentionally misleading prompt. The attacker may then tweak the prompt to trick a user into providing credentials for a different Git host back to the attacker.

[source]

CVE-2024-52006

When using a credential helper (as opposed to asking the user for their credentials interactively as above), Git uses a line-based protocol to pass information between itself and the credential helper. A specially-crafted URL containing a carriage return can be used to inject unintended values into the protocol stream, causing the helper to retrieve the password for one server while sending it to another.

This vulnerability is related to CVE-2020-5260, but relies on behavior where single carriage return characters are interpreted by some credential helper implementations as newlines.

[source]

Upgrade to the latest Git version

The most effective way to protect against these vulnerabilities is to upgrade to Git 2.48.1. If you can’t upgrade immediately, reduce your risk by taking the following steps:

Avoid running git clone with --recurse-submodules against untrusted repositories.
Avoid using the credential helper by only cloning publicly available repositories.

In order to protect users against attacks related to these vulnerabilities, GitHub has taken proactive steps. Specifically, we have scheduled releases of GitHub Desktop (CVE-2025-23040), Git LFS (CVE-2024-53263), and Git Credential Manager (CVE-2024-50338) that prevent exploiting this vulnerability for today, January 14.

GitHub has also proactively patched our products that were affected by similar vulnerabilities, including GitHub Codespaces and the GitHub CLI.

CVE-2024-50349 and CVE-2024-52006 were both reported by RyotaK. The fixes for both CVEs were developed by Johannes Schindelin, with input and review from members of the private git-security mailing list.

The post Git security vulnerabilities announced appeared first on The GitHub Blog.

The latest on Git updates - The GitHub Blog

Highlights from Git 2.52

Tree-level blame information

Advanced repository maintenance strategies

The tip of the iceberg…

…the rest of the iceberg

20 Years of Git, 2 days at GitHub HQ: Git Merge 2025 highlights 🎉

Day 1: Talks for everyone

Day 2: Community at the center

Thank you

What’s next for Git? 20 years in, the community is still pushing forward

AI meets Git: New workflows, new friction

Highlights from Git 2.51

Cruft-free multi-pack indexes

Smaller packs with path walk

Stash interchange format

All that…

…and a bag of chips

Git security vulnerabilities announced

Vulnerabilities in Git

CVE-2025-48384

CVE-2025-48385

CVE-2025-48386 (Windows only)

Vulnerabilities in Git GUI and Gitk

CVE-2025-27613 (Gitk)

CVE-2025-27614 (Gitk)

CVE-2025-46334 (Git GUI, Windows only)

CVE-2025-46835 (Git GUI)

Upgrade to the latest Git version

Highlights from Git 2.50

Improvements for multiple cruft packs

Incremental multi-pack reachability bitmaps

The ORT merge engine replaces recursive

The rest of the iceberg

How the GitHub CLI can now enable triangular workflows

Git turns 20: A Q&A with Linus Torvalds

Highlights from Git 2.49

Git security vulnerabilities announced

The `ORT` merge engine replaces `recursive`