When using a FileSource that is in Parquet format, if the source happens to be a directory of partitioned Parquet files, the following lines throw an error:
|
schema = ParquetFile( |
|
path if filesystem is None else filesystem.open_input_file(path) |
|
).schema_arrow |
OSError: Expected file path, but /home/ubuntu/project/data/driver_stats_partitioned is a directory
How to replicate:
- Start with a demo feast project (
feast init)
- Create a partitioned Parquet Dataset. Use the following to create a dataset with only a single timestamp for inference
import pyarrow.parquet as pq
df = pq.read_table("./data/driver_stats.parquet")
df = df.drop(["created"])
pq.write_to_dataset(df, "./data/driver_stats_partitioned")
- Update the file source in
example.py to look like this:
driver_hourly_stats = FileSource(
path="/home/ubuntu/cado-feast/feature_store/exciting_sunbeam/data/driver_stats_partitioned2",
)
- Run
feast apply
For now, I've been able to fix by updating the above lines to:
schema = ParquetDataset(
path if filesystem is None else filesystem.open_input_file(path)
).schema.to_arrow_schema()