<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Matteo Bernardini]]></title><description><![CDATA[Finding my path between Music & Computer Science]]></description><link>https://teobe.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!UjEY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3d1f621-bca1-4d21-a75b-fe67e44cb4d7_1200x1200.png</url><title>Matteo Bernardini</title><link>https://teobe.substack.com</link></image><generator>Substack</generator><lastBuildDate>Tue, 14 Apr 2026 11:45:20 GMT</lastBuildDate><atom:link href="https://teobe.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Matteo Bernardini]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[teobe@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[teobe@substack.com]]></itunes:email><itunes:name><![CDATA[Matteo Bernardini]]></itunes:name></itunes:owner><itunes:author><![CDATA[Matteo Bernardini]]></itunes:author><googleplay:owner><![CDATA[teobe@substack.com]]></googleplay:owner><googleplay:email><![CDATA[teobe@substack.com]]></googleplay:email><googleplay:author><![CDATA[Matteo Bernardini]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[git-annex: Version Control of Large Files in git]]></title><description><![CDATA[Why git-annex is the most flexible solution for handling large files in git, instead of LFS? Because of its distributed nature and support for a multitude of "special remotes" for storing your large files like AWS S3, Google Drive, Dropbox, etc.]]></description><link>https://teobe.substack.com/p/git-annex-version-control-of-large</link><guid isPermaLink="false">https://teobe.substack.com/p/git-annex-version-control-of-large</guid><dc:creator><![CDATA[Matteo Bernardini]]></dc:creator><pubDate>Tue, 29 Jul 2025 05:07:57 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/cb0b3a64-db39-4ccd-8935-c6e3349420d0_6017x4003.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong>Background and motivation</strong></h2><p><code>git</code> is the most commonly used version control system for software development<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a>, also thanks to popular central hubs like GitHub, GitLab, BitBucket, etc.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> which greatly simplify collaboration for OSS projects or provide a central place for proprietary software development in IT companies.</p><p>There are scenarios in software development that may require or can benefit from versioning <em>large</em> files into the development cycle, for example:</p><ul><li><p>game development (graphics, soundtracks or sound effects, video clips, etc.)</p></li><li><p>machine learning (training data, processed data, model networks, etc.)</p></li><li><p>audio processing applications (audio plugins, eg. VSTs, sound assets, etc.)</p></li></ul><p>Sourcing large files or vendoring external assets in the same repository where the rest of the codebase lives can greatly reduce integration friction and simplify deployment reproducibility, since all source code and related assets updates can be kept in sync and committed together<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a>.</p><p>However, while git excels at handling thousands of small source text files, its design struggles when it comes to handling large binary files. There have been attempts at improving performance in this aspect<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a>, but git remains a suboptimal choice for <em>large file management</em> when used alone.</p><p>External solutions have thus been developed to handle large files by extending git's abilities. Their common principle of operation relies on git hooks, specifically the &#8220;smudge/clean interface&#8221; that allows to automatically swap between the <em>actual</em> file at tree checkout and a replacement &#8220;pointer file&#8221; that gets stored into git&#8217;s database at commit time. The actual large files get offloaded and managed separately by the git extension, essentially creating a side-versioning system to git for large files only. Using this approach, dealing with large files becomes almost transparent for the user who can continue to follow their usual git versioning workflow.</p><p>The most two popular options include <a href="https://git-lfs.com/">Git LFS</a> (developed by GitHub, since 2014) and <a href="https://git-annex.branchable.com/">git-annex</a> (developed by Joey Hess, since 2010)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a>. I want to focus and promote git-annex over LFS for the following reasons:</p><ul><li><p>git-annex predates LFS, hence is more battle tested given both projects are still being actively maintained<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a>.</p></li><li><p>git-annex is more flexible than LFS when it comes to <em>where</em> to store large files and <em>when</em> to retrieve them - LFS behaves more <em>automagically</em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a>, and while this may make it easier to use at the beginning, it becomes a hard wall when more flexibility is needed. git-annex may need more tuning, but its flexibility pays off in the long term.</p></li><li><p>git-annex embraces the distributed nature of git, preventing any lock-in to a specific remote or hosting provider, and instead allowing data replication among clones and "special remotes" at will<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a>. LFS is tightly integrated into the most common git hosting providers (GitHub, GitLab, Bitbucket) which also provide storage for your large files, enforcing a centralised architecture and binding you to their pricing policies<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a>.</p></li><li><p>git-annex&#8217;s multitude of "special remotes"<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-10" href="#footnote-10" target="_self">10</a> allow for more flexible configurations including storing files in AWS S3, Google Drive, Dropbox, your external hard drive, you name it - among them, it also has an LFS backend, making migration seamless.</p></li></ul><h2><strong>Introduction to git-annex</strong></h2><p>Before diving into a concrete example of how to setup git-annex for a project, let me introduce a foundation for how to understand git-annex in 2025. Specifically, this article is based on <code>git v2.39.5</code> and <code>git-annex v10.20250416</code> so be mindful of features and deprecations in case you read this article far in the future.</p><p>git-annex is a relatively complex system, and it went over several <em>best practices</em> over the years, mostly to accomodate with specific file-system constraints (e.g. support for symlinks) and improvements in git hooks (e.g. &#8220;direct mode&#8221;, now deprecated in favour of &#8220;unlocked files&#8221;). Moreover, Joey attempted to promote git-annex as a mechanism for cloud storage (<em>a-la-Dropbox</em>), introducing a bunch of commands that behave so <em>automagically</em> that even bypass vanilla git behaviour (i.e. auto-commit, auto-pull, auto-push, auto-merge, etc.), so you may want to stay away from <code>git annex webapp</code>, <code>git annex assistant</code> and relatives &#8212; <code>git annex sync</code> is probably as automatic as you want to go (but see the configuration and caveats <a href="https://teobe.substack.com/i/167150783/initial-setup">in the following section</a>).</p><p>If however we put the &#8220;cloud storage&#8221; interface aside, git-annex is pretty straightforward and robust when it comes to its lower-level commands, which are probably the only commands you ever need to care about:</p><ul><li><p><code>git annex add</code></p></li><li><p><code>git annex get</code></p></li><li><p><code>git annex drop</code></p></li><li><p><code>git annex copy</code></p></li><li><p><code>git annex move</code></p></li></ul><p>In git-annex lingo, large files are referred to as "annexed files" since the rules to define what goes to <em>regular</em> git and what goes to the <em>annex</em> are arbitrary and totally up to you<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-11" href="#footnote-11" target="_self">11</a>, so for the rest of the rest of the article I'll simply refer to them as "annexed files".</p><h3>How are annexed files stored</h3><p>git-annex offloads annexed files for a repository by placing them in its <code>.git/annex/objects/</code> directory according to their hash (i.e. their <em>key</em>)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-12" href="#footnote-12" target="_self">12</a>. This is convenient, because it means that a <em>git-annex-enabled</em> repository is <strong>self-contained</strong>, you can move it around in the file system without breaking its integrity.</p><p>Another implication of this model is that annexed files get <em>attached</em> to a clone. This is in contrast with Git LFS, where there&#8217;s a single <em>source-of-truth</em> for large files which would be the central repository (e.g. GitHub), while local copies are usually considered as a <em>cache</em>.</p><p>In git-annex, each clone is treated equally, and annexed files can be copied or moved among clones arbitrarily: this is achieved by assigning a uuid to each clone and keeping a log of which clone <em>uuid</em> contains which annexed file <em>key</em> via a special <em>orphan</em> branch<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-13" href="#footnote-13" target="_self">13</a> named &#8220;git-annex&#8221;. Any given clone can thus have a set of &#8220;present&#8221; and &#8220;missing&#8221; keys.</p><p>The default policy for git-annex is to guarantee that <strong>at least one clone</strong> in the network contains any given annexed file, so that <em>dropping</em> files from a clone can be done securely by checking the &#8220;git-annex&#8221; branch and explicitly communicating with available remotes to ensure the file to be dropped is replicated elsewhere<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-14" href="#footnote-14" target="_self">14</a>.</p><p>Like for regular git, the contents stored in the internal database (as objects) and the contents of the <em>checked-out</em> working tree are independent. In git-annex&#8217;s case, the working tree can reference to the annexed object in two ways: as &#8220;unlocked&#8221; file or &#8220;locked&#8221; file.</p><ul><li><p><strong>locked form</strong>: the commited tree is saved as an absolute symlink that starts with <code>/annex/objects/...</code>, while the checked out file is a <strong>relative</strong> symlink that points to the file inside of the <code>.git/annex/objects/</code> directory. If the annexed file is not <em>present</em> in the current clone, then the symlink appears <em>broken</em> on checkout. Since the file inside of the objects directory is write-protected, this mechanism creates a &#8220;read-only&#8221; access to annexed files (hence why it&#8217;s called &#8220;locked form&#8221;)<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-15" href="#footnote-15" target="_self">15</a>.</p></li><li><p><strong>unlocked form</strong>: the commited tree is still saved as an absolute symlink like above, but its file mode bits are set differently to differentiate the form, while the checked out file is a <strong>copy</strong> of the file inside of the <code>.git/annex/objects/</code> directory, allowing you to perform edits over the file (which can then get committed under a new key). If the annexed file is not <em>present</em> in the current clone, then you get a little text file on checkout containing the key of the annexed file, instead of a broken symlink, which can be a bit confusing but can behave more transparently for programs that don&#8217;t like broken symlinks.</p></li></ul><h2>Example case: sourcing trained ML models and VSTs for an audio application</h2><p>For simplicity, we'll start from a classic centralised configuration where developers work on their clones and sync back to a single repository representing the source of truth of the codebase. This is the typical scenario for projects hosted on GitHub or similar. As for annexed files instead, since GitHub doesn't provide support for annex storage, we'll use an S3 bucket &#8212; given the pricing is quite affordable, compared to GitHub's LFS storage plan &#8212; via the <a href="https://git-annex.branchable.com/special_remotes/S3/">S3 git-annex special remote</a>.</p><h3>Initial Setup</h3><p>To start using git annex in an existing or new repository, simply issue:</p><pre><code>git annex init</code></pre><p>This command will create the "git-annex" branch and assign a uuid to the current clone. Next are a couple of <em>personal taste</em> configuration options that make git-annex more <em>well behaved</em> in respect to a traditional git workflow:</p><pre><code># for git annex add / git add
git annex config --set annex.addunlocked true

# for git annex sync
git annex config --set annex.autocommit false
git annex config --set annex.synccontent true
git annex config --set annex.synconlyannex true
git annex config --set annex.resolvemerge false</code></pre><p>These options only need to be configured once, since they will be recorded to the &#8220;git-annex&#8221; branch and be applied to every clone. Let me explain what they do:</p><ul><li><p>By default, annexed files are added in <em>locked</em> form (i.e. as symbolic links, as <a href="https://teobe.substack.com/i/167150783/how-are-annexed-files-stored">described earlier</a>). Some software may misbehave when symlinks are involved (especially if symlinks are broken, i.e. when annexed files are missing from the clone), so you may get more <em>transparent</em> behaviour when using &#8220;unlocked&#8221; files instead (which become simple small text files when annexed files are missing). The only tradeoff is that your data would get duplicated between the annex database and the working tree &#8212; but this is also true for any other file in a regular git repository (one blob in the git database, one file in the working tree) &#8212; so annexed files will take twice the amount of storage unless you use a CoW file-system (eg. btrfs, zfs, apfs).</p></li><li><p><code>git annex sync</code> is the easiest command to let git-annex manage itself (i.e. sync the git-annex branch) and transfer files among remotes as needed. You may find surprising, however, that it comes with a bunch of questionable default behaviours, specifically:</p><ul><li><p>if your working tree is dirty, git annex sync will auto-commit your changes with an auto-generated commit message &#8212; you may prefer to have complete control over what you commit.</p></li><li><p>despite the command name (sync), annexed files don&#8217;t get transferred between repositories (this however may change in a future version of git-annex<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-16" href="#footnote-16" target="_self">16</a>)</p></li><li><p>sync will push/pull your checked out branch (!) and create a bunch of <code>synced/&#8230;</code> prefixed branches for each existing branch in your repository &#8212; <code>synconlyannex</code> is a <strong>must</strong> to make git-annex care only about the annex and not interfere with your traditional <code>git</code> workflow in any other way.</p></li><li><p>similarly, <code>git annex sync</code> will attempt to automatically resolve merge conflicts involving annexed files by adding variants of the files to the working tree rather than keeping your index in a &#8220;conflict&#8221; state (that you can later manage by hand) &#8212; to be honest, I didn&#8217;t come across a conflicting scenario yet, so I&#8217;m unsure whether this option is better set on or off (feedback appreciated).</p></li></ul></li></ul><h3>Configure S3 remote</h3><p>Next step is to configure an S3 bucket to use for storing the annexed files in a central location, so that they can be shared with your colleagues. There&#8217;s no special requirement for creating the bucket, so I&#8217;ll skip the details (just use any infra worfklow you&#8217;re already familiar with, e.g. sst, terraform, or even manually).</p><p>You will need a mechanism to export these environment variables: <code>AWS_ACCESS_KEY_ID</code>, <code>AWS_SECRET_ACCESS_KEY</code>, and <code>AWS_SESSION_TOKEN</code> (if you use temporary credentials), when interfacing with git-annex. I use this convenience script, but you can use any other compatible workflow:</p><pre><code># if you use AWS profiles, specify it here
export AWS_PROFILE="&lt;your_profile&gt;"

# check if session is still valid, and login otherwise
# note: enforcing code-based auth so we can login from a remote ssh device
if ! aws sts get-caller-identity &gt; /dev/null; then
    echo "Session expired, login again..."
    aws sso login --use-device-code
fi

# export AWS env vars to be used by every other tool depending on them (e.g. git-annex)
eval "$(aws configure export-credentials --format env)"
echo "AWS environment variables exported correctly"
echo "Your AWS session will expire:" $(date -d $AWS_CREDENTIAL_EXPIRATION)</code></pre><p>which must be <em>sourced</em>, rather than invoked, like so:</p><pre><code>source aws-setup</code></pre><p><em>&#128161; Remember to run this command whenever you use a new shell or your credentials expire, otherwise git-annex operations that involve the s3 bucket will fail.</em></p><p>Let&#8217;s assume you have a bucked configured with the following properties:</p><pre><code><code>name:    acmecompany-myproject-annex
region:  eu-west-1</code></code></pre><p>To make git-annex be able to store files on it, we&#8217;ll add an <em>S3 special remote</em>. This operation only needs to be done once, since the configuration will be stored in the &#8220;git-annex&#8221; branch (note: &#8220;s3storage&#8221; is an abritrary name identifying the remote and it&#8217;s up to you).</p><pre><code>git annex initremote s3storage type=S3 encryption=none datacenter=eu-west-1 bucket=acmecompany-myproject-annex protocol=https partsize=200MiB signature=v4</code></pre><p>Additionally, to let <code>git annex sync</code> copy files automatically to the remote, and prevent colleagues from accidentally dropping files from the remote, we can enforce required content settings like so:</p><pre><code>git annex group s3storage backup
git annex required s3storage standard</code></pre><h3>Committing files into the annex</h3><p>Now that git-annex is configured and tamed, the easiest way to source large files into the annex is by explicitly adding them via:</p><pre><code>git annex add assets/lsp-plugins.vst3 assets/classifier.pt</code></pre><p>while any other <em>regular</em> file can be added to git in the usual way with:</p><pre><code>git add src/classifier.py</code></pre><p>and finally commit in the usual way:</p><pre><code>git commit -m "feat: updated classifier to run on band-passed audio via LSP"</code></pre><h3>Push &amp; Pull of commits and annexed files</h3><p>Remember that moving files around with git-annex is a manual operation, so simply running:</p><pre><code>git push</code></pre><p>will only push your git commit history to the remote (i.e. GitHub), no matter the remote is a git-annex-enabled remote or not. To explicitly <em>push</em> the files to the S3 bucket, you can run:</p><pre><code>git annex sync</code></pre><p>Similarly, when your collegues will want to integrate your changes, they&#8217;ll have to run the following three steps:</p><pre><code>git pull
git annex sync
git annex get</code></pre><h3>Local management of annexed files</h3><p>The manual control of git-annex is extremely flexible. Say you&#8217;re working on a large monorepo containing several projects, each containing hundreds of MB of annexed files (e.g. test fixtures), but you only care about working on one specific project. Contrary to LFS, you&#8217;re not forced to an <em>all-or-nothing</em> situation, but instead can selectively <code>git annex get &lt;path/to/files&gt;</code> and <code>git annex drop &lt;path/to/files&gt;</code> according to your storage and network bandwidth constraints.</p><p>Moreover, since the annex is independent from your checked out branch, you may want to get rid of old versions of files that are not reachable by any branch anymore with <code>git annex find --unused</code> and <code>git annex drop --unused</code> (or, alternatively, <code>git annex move --unused --to s3storage</code>, in case the drop fails because files were not replicated).</p><p>In the scenario where you want to safely delete a clone, you can do the following:</p><pre><code># ensure all your commits are pushed
git push --all
git push --tags

# migrate all historical annexed files to S3, leaving the current clone with no annexed files left
git annex move -A --to s3storage

# propagate git-annex updates
git annex sync -g

# mark this clone uuid as dead to hide it from annex logs, etc.
git annex dead here

# finally delete the repository
cd .. &amp;&amp; rm -rf &lt;your-repo&gt;</code></pre><h3>Dealing with annexed files in CI</h3><p>git-annex assigns a uuid to each clone in order to map where annexed files are located. However, CI workflows usually run in a disposable environment, so we can avoid the overhead of letting git-annex track each one-time usage clone by adjusting the local git config <strong>before</strong> initialising git-annex on the CI cloned repo:</p><pre><code>git config annex.private true
git annex init
git annex enableremote s3storage
...</code></pre><p>And since we&#8217;re setting up CI, why not also checking if annexed files have been pushed to the S3 remote? This can be a handy PR check to ensure your colleagues pushed their annexed files. Here&#8217;s an example GitHub Workflow:</p><pre><code>name: "git-annex checks"

on:
  - "pull_request"

jobs:
  check-annexed-files:
    runs-on: ubuntu-latest
    timeout-minutes: 5

    steps:
      - name: Install git-annex
        uses: awalsh128/cache-apt-pkgs-action@latest
        with:
          packages: git-annex
          version: 1.0

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::&lt;your-company-id&gt;:role/GitHub
          aws-region: &lt;your-bucket-region&gt;

      - name: Checkout code
        uses: actions/checkout@v4
    
      - name: Setup git-annex
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          git config annex.private true
          git fetch origin git-annex
          git annex init
          git annex enableremote s3storage

      - name: Check annexed files in s3
        run: |
          missing=$(git annex find --not --in s3storage)
          if [ -n "$missing" ]; then
             echo "Un-pushed annexed files found:" "$missing"
             exit 1
          fi</code></pre><h2>Conclusions</h2><p>In this article I overviewed how to source &#8220;large files&#8221; in git thanks to git-annex, explored the basic internals of git-annex and walked through an example use case.</p><p>Hopefully, I&#8217;ve convinced you that git-annex is quite powerful and can enrich your DevOps workflow, without having to rely on Git LFS as the only option.</p><p>Let me know how&#8217;s your experience in the comments section and I can enrich the article accordingly :)</p><p>Thank you for reading!</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p><a href="https://rhodecode.com/blog/156/version-control-systems-popularity-in-2025">RhodeCode &#8212; Version Control Systems Popularity in 2025</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p><a href="https://gitprotect.io/blog/top-git-hosting-services/">GitProject &#8212; Top Git Hosting Services for 2025</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p><a href="https://gamedev.stackexchange.com/questions/51649/source-control-for-storing-everything-of-game-project">Game Development (StackExchange) &#8212; Source control for storing everything of game project?</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p><a href="https://lwn.net/Articles/774125/">LWM.net &#8212; Large files with Git: LFS and git-annex</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>First commit of git-annex: <a href="http://source.git-annex.branchable.com/?p=source.git;a=commit;h=91d319e849ca912e1ff77046cb277985db5844d3">91d319e</a></p><p>First commit of Git LFS: <a href="https://github.com/git-lfs/git-lfs/commit/d8f780329b64e789553bc8ccccfb993ebc430325">d8f7803</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>An extensive use of git-annex in the wild is via the DataLad project, which robustly sources terabytes of data for AI or Medical applications:</p><p>See: <a href="https://www.datalad.org/in-the-wild.html">DataLad in the wild</a>, <a href="https://openneuro.org/">OpenNeuro</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>See &#8220;Getting Started&#8221; section on <a href="https://git-lfs.com/">git-lfs.com</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p><a href="https://git-annex.branchable.com/how_it_works/">git-annex &#8212; how it works</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p><a href="https://stackoverflow.com/questions/32927704/how-to-specify-where-git-lfs-files-will-be-stored">StackOverflow &#8212; How to specify where Git LFS files will be stored?</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-10" href="#footnote-anchor-10" class="footnote-number" contenteditable="false" target="_self">10</a><div class="footnote-content"><p><a href="https://git-annex.branchable.com/special_remotes/">git-annex &#8212; special remotes</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-11" href="#footnote-anchor-11" class="footnote-number" contenteditable="false" target="_self">11</a><div class="footnote-content"><p><a href="https://git-annex.branchable.com/git-annex-matching-expression/">git-annex &#8212; git-annex-matching-expression</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-12" href="#footnote-anchor-12" class="footnote-number" contenteditable="false" target="_self">12</a><div class="footnote-content"><p>By default, git-annex uses SHA256 hashing, but the hashing back-end can be configured, even on a per-file basis (via git-annex matching expressions).</p><p>See: <a href="https://git-annex.branchable.com/backends/">git-annex &#8212; backends</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-13" href="#footnote-anchor-13" class="footnote-number" contenteditable="false" target="_self">13</a><div class="footnote-content"><p>An orphan branch is disconnected from any other branch in the repository and follows its own timeline. This effectively makes your git history become a graph (or more specifically, two disconnected graphs) rather than a tree. You can hide the <code>git-annex</code> branch from your git-log operations, then your git history will look exactly the same as usual.</p><p>See also: <a href="https://graphite.dev/guides/git-orphan-branches">Understanding orphan branches in Git</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-14" href="#footnote-anchor-14" class="footnote-number" contenteditable="false" target="_self">14</a><div class="footnote-content"><p>Beware however that if you forcefully delete a whole clone without ever copying its annexed files to other remotes, you can incurr in data loss. This is not a bug and is the same model for any regular git repository, where you can lose commits (and hence trees and objects) if you never pushed them elsewhere. If you use a POSIX system, git-annex places permissions on its database directory to at least warn you if you issue a <code>rm -r</code> over a repository.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-15" href="#footnote-anchor-15" class="footnote-number" contenteditable="false" target="_self">15</a><div class="footnote-content"><p>But beware that this read-only access is weak, because you can totally re-link the path to a new file with the same name (depending on the nature of the file, your editor may even do so automatically). So &#8212; despite the name &#8212; don&#8217;t rely on this mechanism to create <em>unmodifiable</em> files.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-16" href="#footnote-anchor-16" class="footnote-number" contenteditable="false" target="_self">16</a><div class="footnote-content"><p><a href="https://git-annex.branchable.com/git-annex-sync/">git-annex &#8212; git-annex-sync</a></p></div></div>]]></content:encoded></item></channel></rss>