Skip to main content

A file utility for accessing both local and remote files through a unified interface

Project description

cached-path

A file utility library that provides a unified, simple interface for accessing both local and remote files. This can be used behind other APIs that need to access files agnostic to where they are located.

CI PyPI Documentation Status License

Quick links

Installation

cached-path requires Python 3.7 or later.

Installing with pip

cached-path is available on PyPI. Just run

pip install cached-path

Installing from source

To install cached-path from source, first clone the repository:

git clone https://github.com/allenai/cached_path.git
cd cached_path

Then run

pip install -e .

Usage

from cached_path import cached_path

Given something that might be a URL or local path, cached_path() determines which. If it's a remote resource, it downloads the file and caches it to the cache directory, and then returns the path to the cached file. If it's already a local path, it makes sure the file exists and returns the path.

For URLs, http://, https://, s3:// (AWS S3), gs:// (Google Cloud Storage), and hf:// (HuggingFace Hub) are all supported out-of-the-box. Optionally beaker:// URLs in the form of beaker://{user_name}/{dataset_name}/{file_path} are supported, which requires beaker-py to be installed.

For example, to download the PyTorch weights for the model epwalsh/bert-xsmall-dummy on HuggingFace, you could do:

cached_path("hf://epwalsh/bert-xsmall-dummy/pytorch_model.bin")

For paths or URLs that point to a tarfile or zipfile, you can also add a path to a specific file to the url_or_filename preceeded by a "!", and the archive will be automatically extracted (provided you set extract_archive to True), returning the local path to the specific file. For example:

cached_path("model.tar.gz!weights.th", extract_archive=True)

Using custom headers for HTTP requests

You can provide custom headers for HTTP requests, which is useful for accessing private resources that require authentication:

# Using an API token for private resources (e.g. Hugging Face)
headers = {"Authorization": f"Bearer {hf_token}"}
cached_path("https://huggingface.co/api/models/private-model/resolve/main/model.bin", headers=headers)

This is particularly useful for downloading private files from services like Hugging Face, GitHub, or any other API that uses Bearer token authentication.

Cache directory

By default the cache directory is ~/.cache/cached_path/, however there are several ways to override this setting:

  • set the environment variable CACHED_PATH_CACHE_ROOT,
  • call set_cache_dir(), or
  • set the cache_dir argument each time you call cached_path().

Team

cached-path is developed and maintained by the AllenNLP team, backed by the Allen Institute for Artificial Intelligence (AI2). AI2 is a non-profit institute with the mission to contribute to humanity through high-impact AI research and engineering. To learn more about who specifically contributed to this codebase, see our contributors page.

License

cached-path is licensed under Apache 2.0. A full copy of the license can be found on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cached_path-1.8.10.tar.gz (33.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cached_path-1.8.10-py3-none-any.whl (37.9 kB view details)

Uploaded Python 3

File details

Details for the file cached_path-1.8.10.tar.gz.

File metadata

  • Download URL: cached_path-1.8.10.tar.gz
  • Upload date:
  • Size: 33.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for cached_path-1.8.10.tar.gz
Algorithm Hash digest
SHA256 ce80db439e25619800330dcbf1f0516c0ee70a27bd65ffca33aac9f55a56ef1c
MD5 75f3058296bf0551ff73a7f580e49c84
BLAKE2b-256 9bda59aa2a3f5d92415d1c072a35c94c7ada2251679b3bca3cf69d420ea95ac4

See more details on using hashes here.

File details

Details for the file cached_path-1.8.10-py3-none-any.whl.

File metadata

  • Download URL: cached_path-1.8.10-py3-none-any.whl
  • Upload date:
  • Size: 37.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for cached_path-1.8.10-py3-none-any.whl
Algorithm Hash digest
SHA256 a7a80c4a77859e40080ed3450bc1ca5434a74fc51566361677a75fd2f5b8fea8
MD5 04a6a3c5e29007fcffd321f59ca9c525
BLAKE2b-256 7bdbccb08109c7056b0670384fc5042be5eacd1391eb12485188e92ca97ced21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page