This section will help you get up and running with the Zarr library in Python to efficiently manage and analyze multi-dimensional arrays.
To get started, you can create a simple Zarr array:
import shutil
shutil.rmtree('data', ignore_errors=True)
import numpy as np
from pprint import pprint
import io
import warnings
warnings.filterwarnings(
"ignore",
message="Numcodecs codecs are not in the Zarr version 3 specification*",
category=UserWarning
)
np.random.seed(0)import zarr
import numpy as np
# Create a 2D Zarr array
z = zarr.create_array(
store="data/example-1.zarr",
shape=(100, 100),
chunks=(10, 10),
dtype="f4"
)
# Assign data to the array
z[:, :] = np.random.random((100, 100))
print(z.info)Here, we created a 2D array of shape (100, 100), chunked into blocks of
(10, 10), and filled it with random floating-point data. This array was
written to a LocalStore in the data/example-1.zarr directory.
Zarr supports data compression and filters. For example, to use Blosc compression:
# Create a 2D Zarr array with Blosc compression
z = zarr.create_array(
store="data/example-2.zarr",
shape=(100, 100),
chunks=(10, 10),
dtype="f4",
compressors=zarr.codecs.BloscCodec(
cname="zstd",
clevel=3,
shuffle=zarr.codecs.BloscShuffle.shuffle
)
)
# Assign data to the array
z[:, :] = np.random.random((100, 100))
print(z.info)This compresses the data using the Blosc codec with shuffle enabled for better compression.
Zarr allows you to create hierarchical groups, similar to directories:
# Create nested groups and add arrays
root = zarr.group("data/example-3.zarr")
foo = root.create_group(name="foo")
bar = root.create_array(
name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
)
spam = foo.create_array(name="spam", shape=(10,), dtype="i4")
# Assign values
bar[:, :] = np.random.random((100, 10))
spam[:] = np.arange(10)
# print the hierarchy
print(root.tree())This creates a group hierarchy with a group (foo) and two arrays (bar and spam).
Zarr provides tools for creating a collection of arrays and groups with a single function call. Suppose we want to copy existing groups and arrays into a new storage backend:
# Create nested groups and add arrays
root = zarr.group("data/example-4.zarr", attributes={'name': 'root'})
foo = root.create_group(name="foo")
bar = root.create_array(
name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
)
nodes = {'': root.metadata} | {k: v.metadata for k,v in root.members()}
# Report nodes
output = io.StringIO()
pprint(nodes, stream=output, width=60, depth=3)
result = output.getvalue()
print(result)
# Create new hierarchy from nodes
new_nodes = dict(zarr.create_hierarchy(store=zarr.storage.MemoryStore(), nodes=nodes))
new_root = new_nodes['']
assert new_root.attrs == root.attrsNote that [zarr.create_hierarchy][] will only initialize arrays and groups -- copying array data must
be done in a separate step.
Zarr supports persistent storage to disk or cloud-compatible backends. While examples above
utilized a [zarr.storage.LocalStore][], a number of other storage options are available.
Zarr integrates seamlessly with cloud object storage such as Amazon S3 and Google Cloud Storage using external libraries like s3fs or gcsfs:
import s3fs
z = zarr.create_array("s3://example-bucket/foo", mode="w", shape=(100, 100), chunks=(10, 10), dtype="f4")
z[:, :] = np.random.random((100, 100))A single-file store can also be created using the [zarr.storage.ZipStore][]:
# Store the array in a ZIP file
store = zarr.storage.ZipStore("data/example-5.zip", mode="w")
z = zarr.create_array(
store=store,
shape=(100, 100),
chunks=(10, 10),
dtype="f4"
)
# write to the array
z[:, :] = np.random.random((100, 100))
# the ZipStore must be explicitly closed
store.close()To open an existing array from a ZIP file:
# Open the ZipStore in read-only mode
store = zarr.storage.ZipStore("data/example-5.zip", read_only=True)
z = zarr.open_array(store, mode='r')
# read the data as a NumPy Array
print(z[:])Read more about Zarr's storage options in the User Guide.