Conversation
…/batch-creation
…into feat/batch-creation
…at/batch-creation
…to feat/batch-creation
…at/batch-creation
…into feat/batch-creation
|
this is now working, so I would appreciate some feedback on the design. The basic design is the same as what I outlined earlier in this PR: there are two new functions that take a approachbasically the same as concurrent group members listing, except we don't need any recursion. I'm scheduling writes and using new functions
Implicit groupsPartial hierarchies like streaming v2 vs v3 node creationcreating v3 arrays / groups requires writing 1 metadata document, but v2 requires 2. To get the most concurrency I await the write of each metadata document separately, which means that Overlap with metadata consolidation logicthere's a lot of similarity between the stuff in this PR and routines used for consolidated metadata. it would be great to find ways to factor out some of the overlap areas still to do:
|
That sounds fine as it's clear that the |
…at/batch-creation
…at/batch-creation
in the interest of a narrow scope, I've limited the public api to just |
dcherian
left a comment
There was a problem hiding this comment.
Nice. The public API create_hierarchy looks nice to me.
|
test failure is unrelated to this PR (looks like an fsspec thing) |
This PR adds a few routines for creating a collection of arrays and groups (i.e., a dict with path-like keys and
ArrayMetadata/GroupMetadatavalues) in storage concurrently.create_hierarchytakes a dict representation of a hierarchy, parses that dict to ensure that there are no implicit groups (creating group metadata documents as needed), then invokescreate_nodesand yields the resultscreate_nodesconcurrently writes metadata documents to storage, and yields the createdAsyncArray/AsyncGroupinstances.I still need to wire up concurrency limits, and test them.
TODO: