Python bindings and integrations for the excellent object_store crate.
The main idea is to provide a common interface to various storage backends including the
objects stores from most major cloud providers. The APIs are very focussed and taylored
towards modern cloud native applications by hiding away many features (and complexities)
encountered in full fledges file systems.
Among the included backend are:
- Amazon S3 and S3 compliant APIs
- Google Cloud Storage Buckets
- Azure Blob Gen1 and Gen2 accounts (including ADLS Gen2)
- local storage
- in-memory store
The object-store-python package is available on PyPI and can be installed via
poetry add object-store-pythonor using pip
pip install object-store-pythonThe main ObjectStore API mirrors the native object_store
implementation, with some slight adjustments for ease of use in python programs.
from object_store import ObjectStore, ObjectMeta
# we use an in-memory store for demonstration purposes.
# data will not be persisted and is not shared across store instances
store = ObjectStore("memory://")
store.put("data", b"some data")
data = store.get("data")
assert data == b"some data"
blobs = store.list()
meta: ObjectMeta = store.head("data")
range = store.get_range("data", start=0, length=4)
assert range == b"some"
store.copy("data", "copied")
copied = store.get("copied")
assert copied == dataAs much as possible we aim to make access to various storage backends dependent only on runtime configuration.
from object_store import ObjectStore
storage_options = {
"azure_storage_account_name": "<my-account-name>",
"azure_client_id": "<my-client-id>",
"azure_client_secret": "<my-client-secret>",
"azure_tenant_id": "<my-tenant-id>"
}
store = ObjectStore("az://<container-name>", storage_options)We can provide the same configuration via the environment.
import os
from object_store import ObjectStore
os.environ["AZURE_STORAGE_ACCOUNT_NAME"] = "<my-account-name>"
os.environ["AZURE_CLIENT_ID"] = "<my-client-id>"
os.environ["AZURE_CLIENT_SECRET"] = "<my-client-secret>"
os.environ["AZURE_TENANT_ID"] = "<my-tenant-id>"
store = ObjectStore("az://<container-name>")from pathlib import Path
import numpy as np
import pyarrow as pa
import pyarrow.fs as fs
import pyarrow.dataset as ds
import pyarrow.parquet as pq
from object_store import ArrowFileSystemHandler
table = pa.table({"a": range(10), "b": np.random.randn(10), "c": [1, 2] * 5})
base = Path.cwd()
store = fs.PyFileSystem(ArrowFileSystemHandler(str(base.absolute())))
pq.write_table(table.slice(0, 5), "data/data1.parquet", filesystem=store)
pq.write_table(table.slice(5, 10), "data/data2.parquet", filesystem=store)
dataset = ds.dataset("data", format="parquet", filesystem=store)If you do not have just installed and do not wish to install it,
have a look at the justfile to see the raw commands.
To set up the development environment, and install a dev version of the native package just run:
just initThis will also configure pre-commit hooks in the repository.
To run the rust as well as python tests:
just test