Conversation
|
Review requested:
|
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
Nice! This is a great addition. Since it's such a large PR, this will take me some time to review. Will try to tackle it over the next week. |
| */ | ||
| existsSync(path) { | ||
| // Prepend prefix to path for VFS lookup | ||
| const fullPath = this.#prefix + (StringPrototypeStartsWith(path, '/') ? path : '/' + path); |
| validateObject(files, 'options.files'); | ||
| } | ||
|
|
||
| const { VirtualFileSystem } = require('internal/vfs/virtual_fs'); |
There was a problem hiding this comment.
Shouldn't we import this at the top level / lazy load it at the top level?
| ArrayPrototypePush(this.#mocks, { | ||
| __proto__: null, | ||
| ctx, | ||
| restore: restoreFS, |
There was a problem hiding this comment.
| restore: restoreFS, | |
| restore: ctx.restore, |
nit
lib/internal/vfs/entries.js
Outdated
| * @param {object} [options] Optional configuration | ||
| */ | ||
| addFile(name, content, options) { | ||
| const path = this._directory.path + '/' + name; |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const dirPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const parentPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
| } | ||
| callback(null, content); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
lib/internal/vfs/virtual_fs.js
Outdated
| const bytesToRead = Math.min(length, available); | ||
| content.copy(buffer, offset, readPos, readPos + bytesToRead); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
|
|
||
| callback(null, bytesToRead, buffer); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
|
Left an initial review, but like @Ethan-Arrowood said, it'll take time for a more in depth look |
|
It's nice to see some momentum in this area, though from a first glance it seems the design has largely overlooked the feedback from real world use cases collected 4 years ago: https://github.com/nodejs/single-executable/blob/main/docs/virtual-file-system-requirements.md - I think it's worth checking that the API satisfies the constraints that users of this feature have provided, to not waste the work that have been done by prior contributors to gather them, or having to reinvent it later (possibly in a breaking manner) to satisfy these requirements from real world use cases. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61478 +/- ##
==========================================
- Coverage 91.64% 89.65% -2.00%
==========================================
Files 337 691 +354
Lines 140455 213208 +72753
Branches 21779 40576 +18797
==========================================
+ Hits 128716 191144 +62428
- Misses 11517 14175 +2658
- Partials 222 7889 +7667
🚀 New features to boost your workflow:
|
|
And why not something like OPFS aka whatwg/fs? const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.mount('/app', rootHandle) // to make it work with fs
fs.readFileSync('/app/config.json')OR const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.readFileSync('sandbox:/config.json')fs.createVirtual seems like something like a competing specification |
5e317de to
977cc3d
Compare
I generally prefer not to interleave with WHATWG specs as much as possible for core functionality (e.g., SEA). In my experience, they tend to perform poorly on our codebase and remove a few degrees of flexibility. (I also don't find much fun in working on them, and I'm way less interested in contributing to that.) On an implementation side, the core functionality of this feature will be identical (technically, it's missing writes that OPFS supports), as we would need to impact all our internal fs methods anyway. If this lands, we can certainly iterate on a WHATWG-compatible API for this, but I would not add this to this PR. |
|
Small prior art: https://github.com/juliangruber/subfs |
8d711c1 to
73c18cd
Compare
|
I also worked on this a bit on the side recently: Qard@73b8fc6 That is very much in chaotic ideation stage with a bunch of LLM assistance to try some different ideas, but the broader concept I was aiming for was to have a module.exports = new VirtualFileSystem(new LocalProvider())I intended for it to be extensible for a bunch of different interesting scenarios, so there's also an S3 provider and a zip file provider there, mainly just to validate that the model can be applied to other varieties of storage systems effectively. Keep in mind, like I said, the current state is very much just ideation in a branch I pushed up just now to share, but I think there are concepts for extensibility in there that we could consider to enable a whole ecosystem of flexible storage providers. 🙂 Personally, I would hope for something which could provide both read and write access through an abstraction with swappable backends of some variety, this way we could pass around these virtualized file systems like objects and let an ecosystem grow around accepting any generalized virtual file system for its storage backing. I think it'd be very nice for a lot of use cases like file uploads or archive management to be able to just treat them like any other readable and writable file system. |
just a bit off topic... but this reminds me of why i created this feature request: Would not lie, it would be cool if NodeJS also provided some type of static example that would only work in NodeJS (based on how it works internally) const size = 26
const blobPart = BlobFrom({
size,
stream (start, end) {
// can either be sync or async (that resolves to a ReadableStream)
// return new Response('abcdefghijklmnopqrstuvwxyz'.slice(start, end)).body
// return new Blob(['abcdefghijklmnopqrstuvwxyz'.slice(start, end)]).stream()
return fetch('https://httpbin.dev/range/' + size, {
headers: {
range: `bytes=${start}-${end - 1}`
}
}).then(r => r.body)
}
})
blobPart.text().then(text => {
console.log('a-z', text)
})
blobPart.slice(-3).text().then(text => {
console.log('x-z', text)
})
const a = blobPart.slice(0, 6)
a.text().then(text => {
console.log('a-f', text)
})
const b = a.slice(2, 4)
b.text().then(text => {
console.log('c-d', text)
})An actual working PoC(I would not rely on this unless it became officially supported by nodejs core - this is a hack) const blob = new Blob()
const symbols = Object.getOwnPropertySymbols(blob)
const blobSymbol = symbols.map(s => [s.description, s])
const symbolMap = Object.fromEntries(blobSymbol)
const {
kHandle,
kLength,
} = symbolMap
function BlobFrom ({ size, stream }) {
const blob = new Blob()
if (size === 0) return blob
blob[kLength] = size
blob[kHandle] = {
span: [0, size],
getReader () {
const [start, end] = this.span
if (start === end) {
return { pull: cb => cb(0) }
}
let reader
return {
async pull (cb) {
reader ??= (await stream(start, end)).getReader()
const {done, value} = await reader.read()
cb(done ^ 1, value)
}
}
},
slice (start, end) {
const [baseStart] = this.span
return {
span: [baseStart + start, baseStart + end],
getReader: this.getReader,
slice: this.slice,
}
}
}
return blob
}currently problematic to do: also need to handle properly clone, serialize & deserialize, if this where to be sent of to another worker - then i would transfer a MessageChannel where the worker thread asks main frame to hand back a transferable ReadableStream when it needs to read something. but there are probably better ways to handle this internally in core with piping data directly to and from different destinations without having to touch the js runtime? - if only getReader could return the reader directly instead of needing to read from the ReadableStream using js? |
Restore isSea as a C++ function (was incorrectly changed to a boolean property), fixing cctest failures where helpers.js calls isSea(). Align all files under lib/internal/modules/ with upstream/main so VFS introduces zero changes to the module loader internals.
sea.js was treating isSea as a boolean property, but the C++ binding was restored to expose isSea as a function. This mismatch caused getAssetKeys() to crash with a C++ assertion failure when called outside a SEA, because the JS guard `if (!isSeaFlag)` never triggered (a function reference is always truthy). Restore sea.js from upstream/main to use isSea() as a function call.
Use wrapModuleLoad instead of custom embedderRunCjs code when VFS is enabled. This gives proper CJS cycle handling and standard module loading behavior through the registered VFS hooks. - Restore embedderRunCjs to match upstream (no VFS-specific code) - Restore embedderRequire to use loadBuiltinModuleForEmbedder - Add VFS entry path in embedderRunEntryPoint using wrapModuleLoad with path.posix.join for cross-platform path construction - Auto-include main script as VFS asset during blob generation so wrapModuleLoad can find it at the VFS mount point - Expose mainCodePath from SEA binding to construct correct VFS path
Add loaderStat(), loaderReadFile(), and setLoaderFsOverrides() to helpers.js, and modify toRealPath() to support a VFS toggle. Replace direct internalFsBinding.internalModuleStat() and fs.readFileSync() calls in the CJS loader, ESM resolver, ESM loader, translators, and package_json_reader with these wrappers. The VFS module_hooks.js now calls setLoaderFsOverrides() first in installHooks(), making loader fs interception order-independent and eliminating conflicts with cached fs method references. Fix two pre-existing bugs in esm/resolve.js finalizeResolution(): - StringPrototypeEndsWith() was called with internalFsBinding as first arg instead of path - StringPrototypeSlice(path, -1) returned the last char instead of stripping the trailing slash (now correctly uses path, 0, -1) Existing fs patches for user-facing operations are kept unchanged.
- Remove redundant #getBaseName/#getParentPath from MemoryProvider, use pathPosix.basename/dirname directly - Remove redundant getBaseName/getParentPath/splitPath from router.js, keep only functions with non-trivial VFS-specific logic - Convert RealFSProvider constant getters (readonly, supportsSymlinks) to readonly properties via ObjectDefineProperty - Fix joinVFSPath/normalizeVFSPath to detect Windows drive-letter paths by checking for ':' at position 1 instead of checking for leading '/', so bare '/' is always treated as a POSIX VFS path - Update test-vfs-internals.js to match router.js export changes
Use path.resolve() instead of pathPosix.normalize() for VFS path normalization so mount points resolve correctly on Windows (e.g. /virtual -> C:\virtual). Use path.sep and path.relative() in router for cross-platform mount point matching. Remove normalizeVFSPath and joinVFSPath wrappers in favor of direct path utility calls. Update tests to use path.resolve()/path.normalize() for platform-portable assertions. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Remove blank lines between JSDoc description and @param tags (jsdoc/tag-lines) and remove unused `common` variable assignment (no-unused-vars).
Wrap modulesBinding.readPackageJSON(), getNearestParentPackageJSON(), getPackageScopeConfig(), and getPackageType() with toggleable overrides so that VFS-mounted package.json files are read from virtual storage instead of the real filesystem. This fixes syntax detection and error decoration that re-read package.json bypassing VFS. The implementation follows the existing loaderStat/loaderReadFile toggle pattern in helpers.js.
Remove the vfs-mount/vfs-unmount events emitted on process, as they are insufficient as a security control and a proper permission system is needed instead.
Add tests covering require() of ESM modules from VFS with package.json type detection (.js with type:module, nested directory walk-up, ESM-to-ESM imports) and .mjs extension-based ESM loading without type:module.
Add SEA config validation to error when useVfs is enabled together with useSnapshot, useCodeCache, or mainFormat: "module", as these combinations are not supported.
Instead of monkey-patching fs.readFileSync, fs.statSync, etc. at
runtime in module_hooks.js, add VFS handler guards directly inside the
fs method bodies. This fixes the destructuring problem where capturing
a reference before vfs.mount() would bypass VFS:
const { readFileSync } = require('fs');
vfs.mount('/app');
readFileSync('/app/file.txt'); // now works correctly
The approach:
- Add shared vfsState object and setVfsHandlers() in internal/fs/utils
- Add a null-check guard at the top of 9 sync methods in lib/fs.js
and 3 async methods in lib/internal/fs/promises.js
- Replace installFsPatches() with a handler object registered via
setVfsHandlers() — same logic, no monkey-patching
- Rename module_hooks.js to setup.js (no longer does monkey-patching)
When no VFS is active, the overhead is a single null comparison per
call. When the last VFS is unmounted, handlers are cleared to restore
the zero-overhead path.
Remove ~550 lines of duplicated module resolution functions that reimplemented canonical Node.js resolver behavior (extension trying, package exports resolution, bare specifier walking, directory entry resolution, format detection). The canonical resolvers already consult VFS through the loader overrides installed by installModuleLoaderOverrides(), making the Module.registerHooks() resolve/load hooks redundant. Also add clearStatCache() to the CJS loader, called on VFS unmount to prevent stale stat cache entries from causing incorrect resolution after a VFS is deregistered.
Previously only 9 sync methods in fs.js and 3 async methods in promises.js were intercepted, causing callback-based and promise-based operations on VFS paths to fall through to the real filesystem. Add interception for ~40 fs methods across sync, callback, promise, FD-based, and stream APIs. Add VirtualWriteStream, writeSync, rmSync, and rm to the VFS class. Document all intercepted methods in vfs.md.
Add 14 new test blocks covering symlink ops, callback write ops, FD write callbacks, string writeSync, ReadStream start/end options, stream open events, VFS property getters, rmSync force, and promises.rm recursive. Fix must-call-assert lint error in test-vfs-destructuring.js.
Add VFS interceptions for truncate, ftruncate, link, mkdtemp, opendir, openAsBlob, chmod, and utimes. Create VirtualDir class for opendir support. Fix cp/cpSync to bypass C++ fast paths (cpSyncCopyDir, cpSyncOverrideFile, cpSyncCheckPaths, internalModuleStat) when operating on VFS paths, since C++ bindings cannot see virtual files. Add chmod and utimes as no-ops for VFS paths since VFS does not track permissions or timestamps. Add hard link support to MemoryProvider. Add comprehensive tests for all new interceptions including cp and glob on VFS.
Intercept the remaining ~18 fs methods that still called C++ bindings directly when operating on VFS paths/FDs: Path-based no-ops (VFS doesn't track permissions/ownership/timestamps): - chown/chownSync, lchown/lchownSync, lutimes/lutimesSync FD-based no-ops (VFS FDs don't need sync/permissions/ownership): - fchmod/fchmodSync, fchown/fchownSync, futimes/futimesSync, fdatasync/fdatasyncSync, fsync/fsyncSync Real implementations: - statfs/statfsSync (return synthetic StatFs for VFS mount) - readv/readvSync (scatter read via readSync loop) - writev/writevSync (gather write via writeSync loop) Promise API: - lchmod, lchown, chown, lutimes, statfs Also moves the VFS check in promises.lchmod() before the O_SYMLINK gate so it works on Linux where O_SYMLINK is undefined.
Convert isVfsEnabled from a boolean property set during Initialize() to a SetMethod function, matching the isSea() pattern. This addresses review feedback requesting consistency in the C++ binding API.
|
@joyeecheung this should align more of what you are asking for. Thanks for the direct feedback because it seemed an endless chase on the line comments as no direction seemed to go well. (Sorry for the extra 2000 lines of code in this huge PR). |
I don't think this is a fair statement. @mcollina is obviously invested in working on this, is obviously intending to keep iterating, and is obviously someone we can trust to follow up. As someone who has had to read through multiple iterations of this massive PR multiple times now, I'd really appreciate if we can get this landed and iterate in smaller steps moving forward. |
| > Stability: 1 - Experimental | ||
|
|
||
| In addition to using the `node:sea` API to access individual assets, you can use | ||
| the Virtual File System (VFS) to access bundled assets through standard `node:fs` |
There was a problem hiding this comment.
Nit: the Virtual File System (VFS) here could be a link to that doc. But non-blocking
I believe so too, but unfortunately module loader has an unusual risk surface in terms of compatibility and maintainability; if the implementation largely reuses existing code (like it currently does), "having someone committed to follow up" may be enough; having a 3rd diverging implementation like this PR once did, however, IMO would pass threshold for human wills to be enough to mitigate the risk and necessitate gates like runtime flags to prevent a 3rd implementation from incurring even more maintenance issues, since 2 diverging duplicate implementations are already difficult to manage and there needs to be a line. |
joyeecheung
left a comment
There was a problem hiding this comment.
The new version is much better than before, thanks! The module loading part mostly LGTM other than two missing intercepts.
| setLoaderFsOverrides, | ||
| setLoaderPackageOverrides, | ||
| } = require('internal/modules/helpers'); | ||
| setLoaderFsOverrides({ |
There was a problem hiding this comment.
This is missing the internalBinding('fs') overrides (specifically legacyMainResolve and getFormatOfExtensionlessFile ).
There was a problem hiding this comment.
From a glance there seem to be other fs intercepts missing too: e.g. fsPromises.open, realpathSync.native/realpath.native, globSync/glob/fsPromises.glob. But since this is getting big and having missing intercepts is something we can fix later without too much technical debt (not that I can think of), I will just leave the comments here in case they got forgotten later; for the purpose of landing an initial implementation to follow up later, LGTM, and thanks bearing with my reviews :)
…xtensionlessFile
Both internalBinding('fs') methods bypass VFS fs-level overrides since
they use uv_fs_stat/read directly in C++. Route them through toggleable
wrappers in helpers.js so the VFS can handle paths under active mounts.
|
Seems the Am I the only one here concerned about the security implications of being able to shadow file system access across the entire thread from any loaded JS file? Malicious dependencies could easily hijack file loads in just a few lines. They could technically do that already with a bunch of patching of the file system module, but this API makes it basically trivial. I'll concede if TSC folks think this is outside our threat model, but even so it seems a bit troubling to me. 🤔 |
ESM legacyMainResolve is only triggered via bare specifier package resolution, not directory path imports. Use VFS node_modules with wrapper .mjs modules that re-export from bare specifiers to properly exercise the VFS legacyMainResolve override.
Somewhere in this thread I asked @RafaelGSS to verify and Rafael concludes that as long as it's in-memory it's okay. The VFS stuff only allows user code to change where the sys calls go to at most, but when it gets to actually invoking the sys calls, that will still be gated by permission model e.g. VFS allows redirecting Even then, note that our threat model states that permission models are...
|
A first-class virtual file system module (
node:vfs) with a provider-based architecture that integrates with Node.js's fs module and module loader.Key Features
Provider Architecture - Extensible design with pluggable providers:
MemoryProvider- In-memory file system with full read/write supportSEAProvider- Read-only access to Single Executable Application assetsVirtualProvider- Base class for creating custom providersStandard fs API - Uses familiar
writeFileSync,readFileSync,mkdirSyncinstead of custom methodsMount Mode - VFS mounts at a specific path prefix (e.g.,
/virtual), clear separation from real filesystemModule Loading -
require()andimportwork seamlessly from virtual filesSEA Integration - Assets automatically mounted at
/seawhen running as a Single Executable ApplicationFull fs Support - readFile, stat, readdir, exists, streams, promises, glob, symlinks
Example
SEA Usage
When running as a Single Executable Application, bundled assets are automatically available:
Public API
Disclaimer: I've used a significant amount of Claude Code tokens to create this PR. I've reviewed all changes myself.
Fixes #60021