AIStore Blog

View as Markdown

2026

Eliminating Cluster Authentication Risks: AIStore with RSA and OIDC Issuer Discovery — Apr 09, 2026 · by Aaron Wilson

Back in February 1997, RFC 2104 introduced HMAC as a mechanism for authenticating messages based on a shared secret key.

Native Bucket Inventory: Up to 17x Faster Remote Bucket Listing — Apr 06, 2026 · by Tony Chen, Abhishek Gaikwad

AIStore 4.3 introduces Native Bucket Inventory (NBI), a new mechanism for accelerating large remote-bucket listings by turning a repeatedly expensive operation into a local, reusable metadata path. In…

Parallel Download: 9x Lower Latency for Large-Object Reads — Mar 25, 2026 · by Tony Chen

In AIStore 4.3, we introduced parallel download APIs to accelerate reads of large objects in an AIS cluster. Instead of pulling the entire object through one long sequential GET request stream, parall…

2025

The Many Lives of a Dataset Called ‘data’ — Dec 15, 2025 · by Alex Aizman

For whatever reason, a bucket called s3://data shows up with remarkable frequency as we deploy AIStore (AIS) clusters and populate them with user datasets. Likely for the same reason that `password …

Blob Downloader: Accelerate Remote Object Fetching with Concurrent Range-Reads — Nov 26, 2025 · by Tony Chen

In AIStore 4.1, we extended blob downloader to leverage the chunked object representation and speed up fetching remote objects. T…

GetBatch API: faster data retrieval for ML workloads — Oct 06, 2025 · by Abhishek Gaikwad

ML training and inference typically operate on batches of samples or data items. To simplify such workflows, AIStore 4.0 introduces the GetBatch API.

Automated API Documentation Generation with GenDocs — Aug 29, 2025 · by Anshika Ojha

Maintaining accurate and up-to-date HTTP API documentation is critical for the developer experience when building and debugging SDKs. Clear HTTP documentation saves developers from digging through AIS…

AIStore + HuggingFace: Distributed Downloads for Large-Scale Machine Learning — Aug 22, 2025 · by Nihal Nooney

Machine learning teams increasingly rely on large datasets from HuggingFace to power their models. But traditional download tools struggle with terabyte-scale datasets conta…

Single-Object Copy/Transform Capability — Jul xx, 2025 · by Tony Chen

In version 3.30, AIStore introduced a lightweight, flexible API to copy or transform a single object between buckets. It provides a simpler alternative to existing batch-style operations, ideal for fa…

The Perfect Line — Jul 26, 2025 · by Alex Aizman

I didn’t want to write this blog.

Single-Object Copy/Transform Capability — Jul 25, 2025 · by Tony Chen

In version 3.30, AIStore introduced a lightweight, flexible API to copy or transform a single object between buckets. It provides a simpler alternative to existing batch-style operations, ideal for fa…

AIStore v3.28: Boost ETL Performance with Optimized Data Movement and Specialized Web Server Framework — May 15, 2025 · by Tony Chen, Abhishek Gaikwad

The current state of the art involves executing data pre-processing, augmentation, and a wide variety of custom ETL workflows on individual client machines. This approach lacks scalability and often r…

AIStore Python SDK: Maintaining Resilient Connectivity During Lifecycle Events — Apr 02, 2025 · by Abhishek Gaikwad

In distributed systems, maintaining seamless connectivity during lifecycle events is a key challenge. If the cluster’s state changes while read operat…

Unified Rate Limiting: Frontend and Backend — Mar 19, 2025 · by Alex Aizman

AIStore v3.28 introduces a unified rate-limiting capability that works at both the frontend (client-facing) and backend (cloud-facing) layers. It enables proactive control to prevent hitting limit…

Comparing OCI’s Native Object Storage and S3 API Backends — Feb 26, 2025 · by Ed McClanahan

The newly available support for Oracle Cloud Infrastructure (“OCI”) Object Storage was made

Split-brain is Inevitable — Feb 16, 2025 · by Alex Aizman

Split-brain is inevitable. The way it approaches varies greatly but there are telltale signs that, in hindsight, you wish you’d taken more seriously.

Arrival of native backed OCI Object Storage support — Feb 06, 2025 · by Ed McClanahan

Oracle Cloud Infrastructure (“OCI”) has been supported via OCI’s Amazon S3 Compatibility

2024

Adding Data to AIStore — PUT Performance — Nov 22, 2024 · by Aaron Wilson

AI training workloads primarily read data, and lots of it.

Enhancing ObjectFile Performance with Zero-Copy Techniques — Nov 21, 2024 · by Ryan Koo, Aaron Wilson

In our previous blog post, we introduced ObjectFile, a resilient, file-like interface in the AIStore Python SDK des…

Resilient Data Loading with ObjectFile — Sep 26, 2024 · by Ryan Koo, Aaron Wilson

Massively parallel loading of terabytes of data in a distributed system presents reliability challenges. This holds true even for data centers where network stability is supposed to be stellar. Consid…

Google Colab + AIStore: Easier Cloud Data Access for AI/ML Experiments — Sep 18, 2024 · by Abhishek Gaikwad

Working with data stored in cloud services like GCP, AWS, Azure, and OCI in Google Colab can be challenging. The entire process—from installing libraries and conf…

Accelerating AI Workloads with AIStore and PyTorch — Aug 28, 2024 · by Soham Manoli

As AI workloads are becoming increasingly demanding, our models need more and more data to train.[1] These massive datasets can overwhelm filesystems, both local and network-…

Initial Sharding of Machine Learning Datasets — Aug 16, 2024 · by Tony Chen, Alex Aizman

Over the past decade, and especially in the last 3-4 years, the size of AI datasets has grown significantly, often exceeding the combined capacity of block storage devices that can be attached to a si…

Very large — May 20, 2024 · by Alex Aizman

The idea of extremely large is constantly shifting, evolving. As time passes by we quickly adopt its new numeric definition and only rarely, with a mild sense of amusement, recall the old one.

AIS on NFS — Mar 30, 2024 · by Alex Aizman

This is an excerpt from an article that I posted at storagetarget.com. The full text can be found at:

Maximizing Cluster Bandwidth with AIS Multihoming — Feb 16, 2024 · by Aaron Wilson

Identifying bottlenecks in high-performance systems is critical to optimize the hardware and associated costs.

2023

AIStore as a Fast Tier Storage Solution: Enhancing Petascale Deep Learning Across Remote Cloud Backends — Nov 27, 2023 · by Abhishek Gaikwad, Aaron Wilson, Alex Aizman

The challenges associated with loading petascale datasets, crucial for training models in both vision and language processing, pose significant hurdles in the field of deep learning. These datasets, o…

AIStore with WebDataset Part 3 — Building a Pipeline for Model Training — Jun 09, 2023 · by Aaron Wilson

In the previous posts (pt1, pt2), we discu…

AIStore with WebDataset Part 2 — Transforming WebDataset Shards in AIS — May 11, 2023 · by Aaron Wilson

Note: This blog post references init_code which has been removed and replaced with init_class. For the most up-to-date ETL initialization methods, please refer to the [init_class documentati…

AIStore with WebDataset Part 1 — Storing WebDataset format in AIS — May 08, 2023 · by Aaron Wilson

Training AI models is expensive, so it’s important to keep GPUs fed with all the data they need as fast as they can consume it. WebDataset and AIStore each address different parts of this problem indi…

Transforming non-existing datasets — Apr 10, 2023 · by Alex Aizman

There’s an old trick that never quite gets old: you run a high-velocity exercise that generates a massive amount of traffic through some sort of a multi-part system, whereby some of those parts are (s…

AIStore SDK & ETL: Transform an image dataset with AIS SDK and load into PyTorch — Apr 03, 2023 · by Aaron Wilson

Note: This blog post references init_code which has been removed and replaced with init_class. For the most up-to-date ETL initialization methods, please refer to the [init_class documentati…

2022

AIStore 3.12 Release Notes — Nov 13, 2022 · by Alex Aizman

This AIStore release, version 3.12, has been in development for almost four months. It includes a number of significant changes that can be further detailed and grouped as follows:

AIStore: Data Analysis w/ DataFrames — Aug 15, 2022 · by Ryan Koo

Dask is a new and flexible open-source Python library for parallel/distributed computing and optimized memory usage. Dask extends many of today’s popular Python libraries …

Python SDK: Getting Started — Jul 20, 2022 · by Ryan Koo

Python has grounded itself as a popular language of choice among data scientists and machine learning developers. Python’s recent popularity in the field can be attributed to Python’s general *ease-of…

PyTorch: Loading Data from AIStore — Jul 11, 2022 · by Abhishek Gaikwad

Note: The torchdata.datapipes module has been deprecated and removed in recent versions of

Promoting local and shared files — Mar 17, 2022 · by Alex Aizman

When it comes to working with files, the first question often is how? How to easily and quickly move or copy existing file datasets into AIS clusters?

What’s new in AIS v3.9 — Mar 15, 2022 · by Alex Aizman

AIS v3.9 is substantial productization and performance-improving release. Much of the codebase has been refactored for consistency, with micro…

2021

What’s new in AIS v3.8 — Dec 15, 2021 · by Alex Aizman

AIStore v3.8 is a significant upgrade delivering long-awaited features, stabilization fixes, and performance improvements. There’s also the cumula…

Copying existing file datasets in two easy steps — Dec 07, 2021 · by Alex Aizman

AIStore supports numerous ways to copy, download, or otherwise transfer existing datasets. Much depends on where is

AIStore & ETL: Using WebDataset to train on a sharded dataset (post #3) — Oct 29, 2021 · by Prashanth Dintyala, Janusz Marcinkiewicz, Alex Aizman, Aaron Wilson

Deprecated — WDTransform is no longer included as part of the AIS client, so this post only remains for educational purposes. ETL is in development and additional transformation tools will be inc…

AIStore & ETL: Using AIS/PyTorch connector to transform ImageNet (post #2) — Oct 22, 2021 · by Janusz Marcinkiewicz, Prashanth Dintyala, Alex Aizman

The goal now is to deploy our first ETL and have AIStore run it on each storage node, harnessing the distributed power (and close to data - meaning, fast). For the problem statement, background an…

AIStore & ETL: Introduction (post #1) — Oct 21, 2021 · by Alex Aizman, Janusz Marcinkiewicz, Prashanth Dintyala

AIStore (AIS) is a reliable lightweight storage cluster that deploys anywhere, runs user containers and functions, and scales linearly with no limitation. The deve…

Go: append a file to a TAR archive — Aug 10, 2021 · by Vladimir Markelov

AIStore supports a whole gamut of “archival” operations that allow to read, write, and list archives such as .tar, .tgz, and .zip. When we started working on appending content to existing archives…

Integrated Storage Stack for Training, Inference, and Transformations — Jul 30, 2021 · by Alex Aizman

In the end, the choice, like the majority of important choices, comes down to a binary: either this or that. Either you go to storage, or you don’t. Either you cache a dataset in question (and then tr…

AIStore: an open system for petascale deep learning — Jul 30, 2021 · by Alex Aizman

AIStore (or AIS) has been in development for more than three years so far and has accumulated a fairly long list of capabilities, all duly noted via release notes on the corresponding GitHub pages. At…