Skip to content

WCOW Base layer creation and export#901

Closed
TBBle wants to merge 3 commits intomicrosoft:mainfrom
TBBle:base-layer-manipulation
Closed

WCOW Base layer creation and export#901
TBBle wants to merge 3 commits intomicrosoft:mainfrom
TBBle:base-layer-manipulation

Conversation

@TBBle
Copy link
Contributor

@TBBle TBBle commented Dec 2, 2020

Fixes: #853

Implementation of base-layer export and base-layer creation, intended to support FROM scratch non-bootable flows like #750, or supporting the containerd snapshotter in producing mountable, exportable layers which are not based on an existing MS-provided OCI image, so that the containerd snapshotter test suite can pass without significant changes.

It should work with bootable layers, or at least I've confirmed that passing a nanoserver tarball through these two new features produces something functionally-identical to the original nanoserver tarball.

So in theory, this could be used to "squash" several layers into a new single base layer, but I suspect that's not feasible for anyone except MS due to licensing requirements, and based on my learnings here, MS are clearly using a different tool internally to produce current container images.

Things I don't know, and will ignore unless told-otherwise:

  • Should bcd.bak and related backups be included in the exported tarstream, replacing the modified UtilityVM files they were backups of?
  • What does lack of a UtilityVM in a base layer actually mean?

Minimal base-layer export is working.

My test case for this stage is importing mcr.microsoft.com/windows/nanoserver:10.0.19042.630's base (and only) layer using wclayer import, then exporting it to a tarball and importing that tarball to a new directory.

Current differences/issues

No tests

It doesn't appear we have any wclayer tests, beyond some usage for wclayer.CreateScratchLayer on top of nanoserver or busyboxw, relying on Docker to provide the rest of the stack of images.

Certainly nothing testing import/export.

Files that are not round-tripping correctly: Probably working as intended, one open question.

Importing from my generated tarball ends up with different bcd.bak and bcd.log.bak compared to importing from the MS-built tarball. This is because the BCD in the tree is different after untarring, and the bcd.bak is a copy of the bcd from the original tarball.

❔ It's not clear if the exported tarball should invert the backup step, and redirect the BCD and BCD.LOG to use the bcd.bak and bcd.log.bak (etc) files that were saved during import. (Edit: See mutatedFiles and its usage. However, I don't think this is relevant to this PR anyway, as we're working at a lower layer than this.)

A bunch of other files differ, but they differ every time the same tarball is imported anyway.

And really, since my target here is not bootable images but FROM scratch for e.g., #750 or supporting containerd WCOW snapshots sanely, none of the being-changed files matter to me.

Details of differing files

Comparing a containerd import to the wclayer import from the same (MS-generated tarball), the following files differ:

$ diff -ru ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1 nanoorig/
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/blank.vhdx and nanoorig/blank.vhdx differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/blank-base.vhdx and nanoorig/blank-base.vhdx differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/Files/EFI/Microsoft/Boot/BCD and nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG and nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/SystemTemplate.vhdx and nanoorig/UtilityVM/SystemTemplate.vhdx differ
Binary files ../containerd-data/root/io.containerd.snapshotter.v1.windows/snapshots/1/UtilityVM/SystemTemplateBase.vhdx and nanoorig/UtilityVM/SystemTemplateBase.vhdx differ

We know that the BCD files are modified by the import process, and the rest are presumably generated on-the-fly and end up with timestamps inside them or something.

Comparing the wclayer import of the original tarball to the wclayer import of the tarball I generated with wclayer export, the following files differ:

Binary files nanoorig/bcd.bak and newnano/bcd.bak differ
Binary files nanoorig/bcd.log.bak and newnano/bcd.log.bak differ
Binary files nanoorig/blank.vhdx and newnano/blank.vhdx differ
Binary files nanoorig/blank-base.vhdx and newnano/blank-base.vhdx differ
Binary files nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD and newnano/UtilityVM/Files/EFI/Microsoft/Boot/BCD differ
Binary files nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG and newnano/UtilityVM/Files/EFI/Microsoft/Boot/BCD.LOG differ
Binary files nanoorig/UtilityVM/SystemTemplate.vhdx and newnano/UtilityVM/SystemTemplate.vhdx differ
Binary files nanoorig/UtilityVM/SystemTemplateBase.vhdx and newnano/UtilityVM/SystemTemplateBase.vhdx differ

And nailing down what all the BCD files are

$ sha256sum */bcd.bak */UtilityVM/Files/EFI/Microsoft/Boot/BCD
71b8db95fdc7339e00bc6ac87505d5cf21987fcdd171184a88e91836a884a83b *nanoorig/bcd.bak
6d6597ec802a43cb9b53b3bddb2e6eccd806abc2871cbb454495a3e38a087227 *newnano/bcd.bak
6d6597ec802a43cb9b53b3bddb2e6eccd806abc2871cbb454495a3e38a087227 *newnano2/bcd.bak
6d6597ec802a43cb9b53b3bddb2e6eccd806abc2871cbb454495a3e38a087227 *nanoorig/UtilityVM/Files/EFI/Microsoft/Boot/BCD
3546633c9b234e14b30fc06b24fd0757a474b36771382764db7db9f6a0de3a7e *newnano/UtilityVM/Files/EFI/Microsoft/Boot/BCD
f084cc86c3b06a96e497d354003b0436a6210b4412c26cf65f5bff3d262b6764 *newnano2/UtilityVM/Files/EFI/Microsoft/Boot/BCD

We can see that the bcd.bak that comes from importing my generated tarball is the BCD from the from-MS imported Utility VM; bcd.bak in the from-MS import is presumably the BCD that's actually inside the tarball.

This is an oddity of ociwriter.ImportLayer. It backs up BCD and BCD.LOG*, but doesn't do anything with those backups, they are not referenced anywhere else in the hcsshim code base. I'm not sure if the intent was to copy them back after importing, or if they are supposed to be included in the export (i.e. invert the backup process).

File order is different: Only annoying when doing comparisons of the tar tv output.

I don't know what tool is being used to create the files published by MS, but clearly it's not filepath.Walk in Go, as they disagree about the ordering of case. I suspect the most-important thing here is that Files comes before UtilityVM. Order of processing files will affect hard-links in the tarball, but only cosmetically.

Hard links: Works for base layers, but not non-base layers (but the latter is unrelated to this PR).

Resolved, for this use-case. The layer export code now recognises hard-links on the filesystem, and adds them to the tar stream. The algorithm roughly matches the one given at the OCI Image Spec "Image Layer Filesystem Changeset" notes on hard-links.

The hard-links present in the nanoserver image are also present if that layer is reexported, but:

  • FIle order means a different file might be "the file" in the tarball, because the data is recorded against the first-seen name.
  • Somehow the nanoserver tarball from mcr contains hardlinks with different modtimes to their targets (observed in links from UtilityVM/Files/ to Files/). That information is lost when importing the layer (those details are a factor of the single file, not the multiple directory entries linking to it) so that's not possible to replicate. Nor desirable, I think. I'm not sure how that is possible in the first place, short of stale directory entries on the source system when creating the tarball.

In testing, I noticed that hard-links that exist in a non-base layer do not appear as hard-links when exporting, because modvmcompute.NewProc("ExportLayer") does not preserve hard links when copying the layer data into a temporary directory. It does not create BACKUP_LINK streams, and even if it did, go-winio's backuptar doesn't handle them anyway. It's not obvious if the BACKUP_LINK data format is well-defined but not documented, or is supposed to be application-defined, in which case the content would have to be part of ExportLayer's API and somehow make sense both intra-layer and inter-layer.

I also found that New-Item -ItemType HardLink didn't work to create cross-layer links (which makes sense, since I assume it wants to change the 'link count' on a file in a read-only layer), so I suspect that at this time, hcsshim will never export hard-links except when exporting a base layer (i.e. this new code). Maybe this used to work, since the code exists to handle both inter-layer and intra-layer hard links in tarballs during import.

Hard-link support also pulls in a fairly noisy Microsoft/go-winio bump which will resolvably conflict with #941, and since I was running go mod tidy as a matter of course, I think I inlined #942 in passing. If the latter lands first, this'll be less noisy, but the go-winio bump will still generate noise in both modules because it has bumped golang.org/x/sys quite a long way, which flows through into the vendor directory. (This was broken out into separate PRs)

Notes from before this was implemented

The tarball from MS for nanoserver's base layer includes lots of hardlinks, some between entries in Files, but most are from the UtilityVM/Files back to Files/. Not capturing these hard links when exporting the layer takes an uncompressed nanoserver base layer from about 280MB to about 430MB.

This one's a little intricate. winio.NewBackupFileReader doesn't generate winio.BackupLink, that's up to the code walking the tree, and I haven't yet worked out how, from Go, to actually identify that a file is a HardLink, which PowerShell can tell me, and track down the other links, which fsutil hardlinks can tell me. So the data's there, but the API has escaped my minimal searching. I am guessing I want BY_HANDLE_FILE_INFORMATION to tell me when something has hard-links, and match them by file index.

Dagnabbit: Go's os.File.Stat actually fetches exactly this structure, but doesn't grab the number of links, and the index values are not exported. So we'll be making syscalls of our own to achieve this. os.SameFile uses the indexes, but I can't see a good way to apply that here.

Also, github.com/Microsoft/go-winio/backuptar.WriteTarFileFromBackupStream doesn't handle winio.BackupLink anyway. So that would have to be implemented too, or intercept the call to backuptar.WriteTarFileFromBackupStream in ociwclayer.writeTarFromLayer like we do for deleted files. That's probably necessary anyway, as per the MS docs, the actual data in the winio.BackupLink object is up to the application. It would also mirror ociwclayer.writeLayerFromTar which similarly intercepts the call to writeBackupStreamFromTarAndSaveMutatedFiles for tar.TypeLink headers.

I did a quick test of just tarballing-up the tree using Windows-bundled tar, and it did find hardlinks correctly, although it disagreed about which one in the tarball was the "original", not that it matters. It also failed to include UtilityVM but did include UtilityVM\Files, but that could probably be fixed with a better command-line and more than the zero effort I put into this test:

tar cf simplenano.tar -C .\nanoorig Files UtilityVM\Files
No permission bits on the exported tarballs: This was already the case for the non-base layers.

backuptar.WriteTarFileFromBackupStream just doesn't have any way to set the permission bits on the tar headers it creates. I considered making it automatically apply 0644 for files and 0755 for directories to match the MS-generated tarballs, but (a) that's way over in go-winio, and is separate to this; and (b) I'm only 80% sure that won't invalidate every Windows Containers build-cache on the planet, and I don't think 20% is low-enough to risk having to change my name and go into hiding before an angry mob of CI/CD engineers comes looking for me.

Weirdly, the MS-built tarball doesn't have permission bits on the Files, UtilityVM, or UtilityVM/Files directories, which suggests that whatever is done to generate these tarballs, it's doing something fairly specialised and a bit unusual. And also suggests this isn't important.

Punted to microsoft/go-winio#184 for discussion/feedback.


Creating a base layer from some files on-disk is working.

This introduces a new API, hcsshim.ConvertToBaseLayer:

  • If any of the below files are missing, create them as empty files so hcsshim.ProcessBaseLayer will complete:
    • Files/Windows/System32/config/DEFAULT
    • Files/Windows/System32/config/SAM
    • Files/Windows/System32/config/SECURITY
    • Files/Windows/System32/config/SOFTWARE
    • Files/Windows/System32/config/SYSTEM
  • Checks that if UtilityVM/Files exists, it contains UtilityVM/Files/EFI/Microsoft/Boot/BCD so that hcschim.ProcessUtilityVMImage will complete (assuming the BCD is valid).
  • Runs ProcessBaseLayer and if necessary, ProcessUtilityVMImage
    • Basically, this is the part of (w *baseLayerWriter) Close() after reapplyDirectoryTimes.

I'm not sure when a UtilityVM is necessary. At the moment, I'm assuming it's only needed if the layer is used as the base of a container's scratch layer, and that's not the target use-case for this work.

As it happens, the test TestSCSIAddRemoveWCOW is a passable candidate for exercising this API and particularly the use-case I'm targeting, as it creates scratch layers for attaching to a container as SCSI disks. Before this change, those scratch layers are based on the image used to create the container in the first place, i.e. nanoserver. With this API, it is much more similar to TestSCSIAddRemoveLCOW which uses trivial empty EXT4 scratch layers.

All of the above could be handled by the hcsshim user, e.g., this was done (but never merged) by refactoring the containerd snapshot testsuite in containerd/containerd#2366 to use hcsshim.ProcessBaseLayer and fstest.Base, including defining fstest.Base in containerd/continuity in order to record an undocumented internal hcsshim requirement.

I propose instead to make containerd's snapshot Commit API call hcsshim.ConvertToBaseLayer at the appropriate point so that to callers of the snapshot system, no behaviour difference is seen between Windows and non-Windows behaviours.

My fallback is to take all the code I wrote here, and add it to containerd instead, which of course means no one else benefits, and the knowledge needed to call hcsshim.Process* remains buried over in some non-hcsshim project.

Current differences/issues

Test-case TODO question

testutilities.CreateWCOWBlankRWLayer has a TODO:

// TODO: This is wrong. Need to search the folders.

It appears that it intends to search the provided layers to find one with a UtilityVM folder, but it's not clear why, as its current use-case is for scratch layers attached as SCSI devices via the Utility VM, which does not, as far as I can see, require a Utility VM.

It's a TODO here because before this change, the top layer has a Utility VM, but with this change, it does not.

The test passes so I think I'm okay, but it's possible there's a downlevel issue, as I'm testing on Windows 10 20H2.

cd tests
go test -v -tags "uvmscsi" -run TestSCSIAddRemoveWCOW  ./...

There is no step 3

Well, there is, but not here. Assuming that this work is completed and merged, I need to revendor into containerd, and rebase my work there (see containerd/containerd#4415 (comment)) to take advantage of this new API, and hopefully get the snapshot integration test suite to start passing, which creates arbitrary base layers in every test.

@TBBle TBBle force-pushed the base-layer-manipulation branch from 7b8341c to ecb0129 Compare December 6, 2020 07:31
@TBBle TBBle changed the title Base layer manipulation other than 'import' WCOW Base layer creation and export Dec 6, 2020
@TBBle TBBle force-pushed the base-layer-manipulation branch 3 times, most recently from cff971b to 7a04cff Compare December 12, 2020 07:26
@TBBle TBBle marked this pull request as ready for review December 12, 2020 13:57
@TBBle TBBle requested a review from a team as a code owner December 12, 2020 13:57
@TBBle TBBle force-pushed the base-layer-manipulation branch from 2ed80aa to 269aad0 Compare December 19, 2020 12:58
@TBBle TBBle force-pushed the base-layer-manipulation branch from 269aad0 to 9f2e17d Compare January 2, 2021 08:12
@TBBle
Copy link
Contributor Author

TBBle commented Jan 2, 2021

Per discussion in #916, the new API hcsshim.ConvertToBaseLayer might want to be exposed via pkgs instead. Although perhaps that is true of the entire layer.go API, which is already just thin bounces into internal/ anyway.

@TBBle
Copy link
Contributor Author

TBBle commented Jan 2, 2021

Another API thought. Since this is currently being added to layer.go where almost all the APIs take info DriverInfo, layerId string rather than path string, should ConvertToBaseLayer be the same?

I didn't do this initially because the two current exceptions to this API structure are the functions the new API wraps, ProcessBaseLayer and ProcessUtilityVMImage.

@TBBle TBBle force-pushed the base-layer-manipulation branch from 9f2e17d to 014672b Compare January 10, 2021 04:34
@TBBle TBBle force-pushed the base-layer-manipulation branch from 014672b to 57a0abb Compare January 16, 2021 04:46
@slonopotamus
Copy link
Contributor

I will post a comment here if we need to make any changes and then we can merge this. Thanks for your patience.

@ambarve It looks like that @TBBle is very patient, he's waiting for half a year already. Maybe it's time to finally admit that legal issue is not in fact an issue because layer exporting does not redistribute anything to anyone and move on with this PR that currently blocks BuildKit implementation for Windows containers?

@ambarve
Copy link
Contributor

ambarve commented Mar 15, 2022

First of all, apologies for the delay in responses on this thread. We had reservations about the legal aspects of this PR for two reasons. First the restrictions around creating (& publishing) empty scratch layers that are not bootable container layers. Most probably this won't be an issue but we are working with our internal teams to get a confirmation on it. I will try to provide an update here ASAP (hopefully within a week). Second, for the base layer export we still need to make sure that the exported base layer is correctly marked as a foreign layer (Non distributable layer) so that it won't be pushed to a repository. @TBBle can you please confirm if the base layers that are exported will be marked as foreign layers? Can you also please rebase this PR? Otherwise, this PR looks good and ready to be merged as soon as we have answers for both of these questions.

@TBBle
Copy link
Contributor Author

TBBle commented Mar 16, 2022

There's nothing in what I have here that would mark an exported base layer as foreign, because the intended use-case is not for exporting layers that were imported from elsewhere, but for exporting layers built on 'scratch', which would not be marked foreign by default.

In fact, while you can re-export an imported base layer, you cannot export an imported non-base layer, i.e. if I pull a Windows Server image of any kind, I get the base layer plus a layer for the updates, but due to the on-disk storage, I can only export that first base layer. (This is not something I plan to solve, as it's intrinsic to HCS at the moment, and so tooling here can't really do anything about it. This only affects the HCS-driven layer export process, a generic diff of course can still re-export imported layers without Windows-specific metadata.)

hcsshim layer-handling has no concept of "foreign layer" at this time, that would have to be handled by (or in concert with) a caller like containerd; I'm not sure if containerd carries the 'foreign layer' concept down into the layer storage (Snapshotter) system, since that's an attribute of the layer reference in the manifest, not intrinsic to the layer itself, as we recently discovered in a BuildKit discussion on the same topic: See moby/buildkit#2561 and moby/buildkit#2555.

I haven't checked containerd, so I don't know if the Content Store even maintains the content-type of an object, and how it would handle seeing the same object (by hash) with different content types in different contexts. That BuildKit PR touches upon stargz-based layers which actually have a similar problem.

As an idea, a newly-created empty layer could have a marker that indicates it's safe for push, but it would still be on higher-level tools to recognise this annotation (it could be exposed by hcsshim import API) and apply the appropriate foreign content type when creating a manifest that references the layer.

Of course, all of this is only if one uses the hcsshim code and/or honours the resulting flags. There's no way to prevent someone exporting a layer into a manifest, editing the content type or ignoring some flag, and pushing it to a registry. After all, if one wanted to mark a layer as foreign that isn't currently (i.e. I built it myself, and don't want it repushed), it's the same process.

Technically, there's no requirement in OCI that foreign and non-foreign layers are actually interchangeable; so a marker could be included inside the tar stream to carry a "from-scratch" layer flag.

However, all discussion of flag files on-disk or in tar streams is complicated by the existence of both base and non-base layers without that flag, created by existing tools in the ecosystem.


Side-note: I don't think I'll have time to do the rebasing this week. My current repo is a mess as I haven't touched it in ages and last time I did, I was unsuccessfully trying to recreate a Windows Server 2019-seen issue in the containerd Snapshotter test suite. I also need to find a way to work out if TBBle@1803a21 is required for Windows Server 2022.

@TBBle
Copy link
Contributor Author

TBBle commented Mar 26, 2022

Hmm. Rebase built and passed, but I guess we still aren't running the actual tests on CI, and I don't have access to a Windows Server 2022 machine to verify if TBBle@1803a21 is needed.

I'm not even sure if the current tests would trigger that case, I'll try and put together a test-case that does trigger it. I've relied on the containerd snapshotter test suite to test it before now, but that work needs a bunch more rebasing (and I know it has conflicts) so it makes more sense to add the tests here.

@TBBle
Copy link
Contributor Author

TBBle commented Apr 18, 2022

Gaaah. So I finally think I found the reason I was having problems on RS5 in the containerd Snapshotter test suite, and it was me all along, leaking a reference to an opened file: https://github.com/microsoft/hcsshim/compare/0ce0a83127aa7ac90eae33b7b835781b70233333..3577daea1d5860867edce38a0129f8cd3cfafe45

Looks like somewhere after RS5 (and before 19042), this leaked reference stopped preventing deletion of the directory, which is why the error never reproduced for me locally. And any tests with the wclayer CLI binary closed the file when the owning process closed, so that never reproduced either. I'm actually surprised that the containerd test suite ever worked, so perhaps the leaked reference was only a problem for a short period of time, long enough to fail the integration tests I wrote here, and sometimes fail the containerd test suite, but short enough to not always fails the containerd test suite.

I also now have some integration tests for the base layer creation changes here which I plan to pull into this PR soon.

However, I suspect based on #901 (comment) that I'll need to split this PR into two, one for non-bootable base layer creation, where it sounds like the concern was minor:

First the restrictions around creating (& publishing) empty scratch layers that are not bootable container layers. Most probably this won't be an issue but we are working with our internal teams to get a confirmation on it. I will try to provide an update here ASAP (hopefully within a week).

and one for base layer export where the concern was legal. In this latter case, the worst-case is that there's an existing ecosystem-wide legal issue that my PR has surfaced:

Second, for the base layer export we still need to make sure that the exported base layer is correctly marked as a foreign layer (Non distributable layer) so that it won't be pushed to a repository.

My non-reassuring reply was focussed on this part.

For purposes of BuildKit functionality and (IIRC) the containerd snapshotter test suite, we don't actually need to export base layers, just be able to create them. And I reckon I could just punt 8778526 to bring us into that state, so this wouldn't be hard to do, if it's enough to get the rest of this PR merged.

I still need to write some import/export tests to validate if TBBle@1803a21 is needed on Server 2022 though, since that affects the base-layer creation code, not the base-layer export code.

@TBBle
Copy link
Contributor Author

TBBle commented Apr 26, 2022

Latest rebase pushes the changes I've broken out into #1374 to the front, so that (assuming that lands soon) this PR is reduced to just those things stuck behind the legal concerns.

@brasmith-ms
Copy link

Hey folks, jumping in here to provide an update and clear up some confusion from the Microsoft side. What held this up for so long were some reservations from our legal team on enabling an open-source tool to easily package and redistribute Windows binaries. With the secure supply chain executive order, and our general company policies on security, we are moving very cautiously on topics like these. With that said, I'm here to bring the good news that these issues have resolved, and this PR is good to move forward.

I know it's been incredibly frustrating for the community here. I want to emphasize that we are here to support the community and do the best we can to make things easy for you. This definitely took longer than it should have and we're changing how things are done on our end to ensure it doesn't happen again. If you have concerns or questions, always feel welcome to reach out to me. (Brandon Smith (MS) on K8s Slack)

@ambarve would you be able to move this forward from here?

@TBBle
Copy link
Contributor Author

TBBle commented Jul 20, 2022

Glad to hear that. I've clearly got some rebasing and updating to do, I'll try and make some time for that soon but I'm pretty snowed under at the moment. Maybe on the coming weekend I can flush the rebasing through, at least.

Due to #1375, I can't easily test this with the containerd Snapshotter test-suite as I did in the past. So I'll also have to pull-in and complete the tests I mentioned in April, and ensure everything's working on both LTSC 2019 and LTSC 2022 (although I'm confident it will be, since I squashed one bug and know of only one other likely stumbling point, for which I have a proposed fix). At least those platforms are available on GHA now.

@dcantah
Copy link
Contributor

dcantah commented Jul 20, 2022

@TBBle I'm trying to resolve #1375 with this #1463 if you want to take a look. Still going back and forth on that

@TBBle
Copy link
Contributor Author

TBBle commented Jul 23, 2022

Sorry, I haven't had a chance to look closely at #1463 yet. It shouldn't block this (unless an API I need changes...) as I think it's better to do tests in this repo anyway. I've rebased my existing work (it wasn't as bad as I feared, most of the big scary conflict warnings turned out to be delete/edit over test/vendor, and they're trivial to resolve.

I've pushed my WIP tests over to https://github.com/TBBle/hcsshim/tree/layer-manipulation-tests and will perhaps go over them tonight or tomorrow, and try and get them clean-enough (and passing-enough, and high-coverage-enough) to pull into this PR, in which case I'd be confident this code is ready-to-go.

I see we now have a containerd integration test in the CI suite, but my Go practice is a little rusty, and I'm not sure if it will pick up the checked-out hcsshim in its containerd build as I suspect it will prefer the vendored version. So once #1375 has been resolved in some way, if we force teach the containerd integration test to build against the checked-out hcsshim, then we get the containerd snapshotter test suite also running for us, once that side lands too in containerd/containerd#4419 or similar. (There's significant runway before that PR lands, which is another good reason to land this code here with its own tests.)

@dcantah
Copy link
Contributor

dcantah commented Jul 23, 2022

Sorry, I haven't had a chance to look closely at #1463 yet.

No worries at all. I found out I'm calling that FormatWritableLayerVhd function internally in a helper in the same package also which is rather annoying.. Hoping to have a clearer picture this coming week on what we wanna do there.

I see we now have a containerd integration test in the CI suite, but my Go practice is a little rusty, and I'm not sure if it will pick up the checked-out hcsshim in its containerd build as I suspect it will prefer the vendored version.

The containerd integration suite here should run against/build a shim from whatever is in the current PR. So containerd main + your branch for hcsshim.

@TBBle
Copy link
Contributor Author

TBBle commented Jul 24, 2022

The problem for the integration test is that the code I'm working on isn't in the shim, it's an in-tree snapshotter in containerd linking against hcsshim directly. If it was an external snapshotter instead, that'd be a different question (and also solves the manifesting blocker in-passing). Although there isn't currently support as I recall an external mounter in containerd, so we'd be a significant refactoring away from being able to decouple fully anyway, if that was a useful direction.

On option would be to programmatically add a replace rule to containerd's go.mod to use the local checkout of hcsshim for its in-tree work. That (but manually) is how my testing of the snapshotter worked in the containerd tree. That whole idea though is somewhat flawed, as unlike the runtime shim, there isn't a stable API/ABI boundary for this part, so such tests would fail when hcsshim evolves its API in breaking ways, ala 0.9 and (I assume) 0.10.

@dcantah
Copy link
Contributor

dcantah commented Jul 24, 2022

@TBBle Ahh gotcha gotcha, you meant the containerd suite. Yea it will still use whatever is in its go.mod for code it uses hcsshim as a library for. External snapshotter solving the manifesting issue is an interesting thought also.. That's actually one of the suggestions we heard internally, but I'm not sure it makes sense for the windows snapshotter to not be built in

@TBBle
Copy link
Contributor Author

TBBle commented Jul 24, 2022

I also had "take the snapshotter external" as an option in the first "hcsshim ABI broke hard" ticket; in a green-field development, knowing what I know now, I'd have built the Windows snapshotter as an external snapshotter in this repo, since it parallels the external runtime shim and better-isolates containerd from vagaries of HCS/computerstorage evolution. (And this is the direction the OS X containerd work was strongly encouraged by the containerd team, which is what put it on my radar at all; in that case, they also farmed off mounting by defining a CLI utility as the API boundary).

But given the half-decade-or-so weight of code for the Windows Snapshotter already in containerd, as well as having access to the snapshotter test suite once we get that working, it'd be a time-costly and burden-shifting change that'd need more justification and stakeholder buy-in than I'm currently aware of.

TBBle added 3 commits August 11, 2022 01:26
This is the inverse of the baseLayerWriter: It walks Files/ and
UtilityVM/Files/ (if present) and ignores the rest of the layer data,
as it will be recreated when the layer is imported.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
This API allows turning any collection of files into a WCOW base layer.

It will create the necessary files in Files/ for
hcsshim.ProcessBaseLayer to function, validate the necessary files for
hcsshim.ProcessUtilityVMImage if UtilityVM/ exists, and then call those
two APIs to complete the process.

Calling this on a directory containing an untarred base layer OCI
tarball, gives a very similar outcome to passing the tar stream through
ociwclayer.ImportLayer.

The new API is used in `TestSCSIAddRemoveWCOW` to create nearly-empty
base layers for the scratch layers attached and removed from the utility
VM.

A wclayer command is also introduced: `makebaselayer` for testing and
validation purposes.

Signed-off-by: Paul "TBBle" Hampson <[email protected]>
@TBBle
Copy link
Contributor Author

TBBle commented Aug 11, 2022

Carrying back results from the now-unblocked testing with my containerd PR that depends on this work, and it fails in what looks like #901 (comment) even on Windows Server 2019. So either something else has gone wrong, or there's been a platform update that pulled in whatever change introduced that in later Windows builds.

So for this PR, I'll need to finish my existing test suite work, and get it to the point where I can replicate that issue in this repo directly, and then rebase/reinstate/recreate TBBle@1803a21 into this PR as well, since it looks like it's all-versions necessary now. (A quick test suggests the cherry-pick will be easy, it didn't conflict anywhere except go.mod/go.sum and vendoring)

Note to self: The repro case seen in containerd is the import from tar of an exported tarball (i.e. wclayer export -> wclayer import; the latter hits internal/wclayer.ImportLayer via pkg/ociwclayer.ImportLayerFromTar) so it won't be hard to repro in the tests.

Later edit: I had a few minutes to do the rebase of the "hives" commit (on a new branch, https://github.com/TBBle/hcsshim/tree/base-layer-manipulation-with-hives, to keep things separate) and although the code itself rebased fine, I noticed a couple of issues:

  • go mod vendor behaviour has changed, and the relevant data is no longer added to the vendor directory.
  • When pulling hcsshim into containerd, it pulled and vendored the hive-manipulation library.

This suggests I should rework the go generate code to just fetch the blob from GitHub. Anyway, the important thing is to confirm that this leaves us in a good state for the containerd snapshotter, or reveals some other bitrot or API drift that's gone unnoticed while that PR was blocked by time or circumstances.

@TBBle
Copy link
Contributor Author

TBBle commented Sep 27, 2022

Converted to draft as I'm currently moving countries, and hence don't have time or computing power for at least the next month (and maybe longer... we'll see) to rebase and resolve merge conflicts, let-alone complete the work described in #901 (comment).

@gabriel-samfira
Copy link
Contributor

gabriel-samfira commented Sep 29, 2022

So for this PR, I'll need to finish my existing test suite work, and get it to the point where I can replicate that issue in this repo directly, and then rebase/reinstate/recreate TBBle@1803a21 into this PR as well, since it looks like it's all-versions necessary now. (A quick test suggests the cherry-pick will be easy, it didn't conflict anywhere except go.mod/go.sum and vendoring)

To generate the minimal hive, how about something like this:

package main

import (
	"fmt"
	"log"
	"os"
	"syscall"

	winio "github.com/Microsoft/go-winio"
	"golang.org/x/sys/windows/registry"
)

//go:generate go run golang.org/x/sys/windows/mkwinsyscall -output zsyscall_windows.go main.go
//sys   RegSaveKeyW(key syscall.Handle, file *uint16, access uint32) (regerrno error) = advapi32.RegSaveKeyW

func main() {
	privileges := []string{"SeBackupPrivilege"}
	if err := winio.EnableProcessPrivileges(privileges); err != nil {
		log.Fatal(err)
	}
	defer winio.DisableProcessPrivileges(privileges)

	k, existing, err := registry.CreateKey(syscall.HKEY_CURRENT_USER, "tempKey", registry.ALL_ACCESS)
	if err != nil {
		log.Fatal(err)
	}
	defer registry.DeleteKey(syscall.HKEY_CURRENT_USER, "tempKey")

	fmt.Println(k, existing)

	saveTo := "C:\\Users\\Administrator\\reg\\save.dat"
	if _, err := os.Stat(saveTo); err == nil {
		os.Remove(saveTo)
	}

	if err := RegSaveKeyW(syscall.Handle(k), syscall.StringToUTF16Ptr(saveTo), 0); err != nil {
		log.Fatal(err)
	}
}

This exports an empty key. That exported file should be usable as a valid hive. The tempKey becomes the root of the new hive. I think we can also just embed the empty hive as a base64 string in the code if there are no licensing issues.

@riverar
Copy link

riverar commented Sep 29, 2022

Use the offline registry library functions instead https://learn.microsoft.com/windows/win32/devnotes/offline-registry-library-functions

@gabriel-samfira
Copy link
Contributor

Use the offline registry library functions instead https://learn.microsoft.com/windows/win32/devnotes/offline-registry-library-functions

That's a lot better! Thanks for the link!

@gabriel-samfira
Copy link
Contributor

@TBBle

Created a PR against your branch to generate empty hives using the offline registry library. I know you don't have time now, so have a look at your earliest convenience.

@TBBle
Copy link
Contributor Author

TBBle commented Mar 2, 2023

#1637, the successor for this PR, has merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is it possible to create an empty scratch layer?