perf(postgres): Optimizing feast offline Store for date-range multi-FV retrieval by Vperiodt · Pull Request #6057 · feast-dev/feast

Vperiodt · 2026-03-04T07:05:32Z

What this PR does / why we need it:

Replaces the date-range multi-FV path in the Postgres offline store (previously base_entities + one LEFT JOIN LATERAL per feature view) with a set-based LOCF (Last Observation Carried Forward) implementation
Uses a single timeline: stack (UNION ALL of spine + feature rows), COUNT for group boundaries, FIRST_VALUE for forward-fill, then filter to spine and apply per-FV TTL.

Which issue(s) this PR fixes:

Fixes slow get_historical_features on the Postgres offline store for date-range retrieval with multiple feature views. The LATERAL/inequality joins had O(N×M) cost. This PR switches to a LOCF path: one stacked timeline of size L, sort O(L log L), and O(L) window passes, which makes total cost as O(L log L).

Misc

Scenario: get_historical_features(features=[...], start_date=2023-01-01, end_date=2023-01-07) with no entity DataFrame and two feature views:

driver_fv: entity driver_id, TTL 1 day, feature score
customer_fv: entity customer_id, TTL 0, feature amount

Inputs

Parameter	Value
`start_date`	2023-01-01
`end_date`	2023-01-07
`entity_df`	`None` (non-entity mode)
FVs	driver_fv (driver_id, score, TTL 1d), customer_fv (customer_id, amount, TTL 0)

1. Feature data window (`__data_raw`)

Feature data is explicitly pulled from lookback to end_date so LOCF can carry the last observation before start_date.

Time window used for feature data

FV	TTL		Time window
driver_fv	1 day		[lookback_start_date, end_date]
customer_fv	0		[start_date, end_date]

With lookback_start_date = start_date - max_ttl (e.g. 2022-12-31 for 1-day TTL), driver_fv rows before 2023-01-01 are now included so LOCF can fill correctly.

-- Per-FV __data_raw: feature rows from lookback_start_date..end_date
"driver_fv__data_raw" AS (
  SELECT "ts" AS event_timestamp, "driver_id", "score" AS "score"
  FROM "driver_table" AS sub
  WHERE "ts" BETWEEN '2023-01-01 00:00:00+00' - interval '1 day' AND '2023-01-07 00:00:00+00'
),
"customer_fv__data_raw" AS (
  SELECT "ts" AS event_timestamp, "customer_id", "amount" AS "amount"
  FROM "customer_table" AS sub
  WHERE "ts" BETWEEN '2023-01-01 00:00:00+00' AND '2023-01-07 00:00:00+00'
),

2. Spine: unified (entity, timestamp) grid

Spine is built from each FV’s __data / __data_raw over [start_date, end_date]. When FVs have different entity sets, the template emits NULL AS "entity" for the FV that doesn’t have that entity:

spine AS (
  SELECT DISTINCT d.event_timestamp, d."driver_id", NULL AS "customer_id"
  FROM "driver_fv__data_raw" d
  WHERE d.event_timestamp BETWEEN '2023-01-01 00:00:00+00' AND '2023-01-07 00:00:00+00'
  UNION
  SELECT DISTINCT d.event_timestamp, NULL AS "driver_id", d."customer_id"
  FROM "customer_fv__data_raw" d
  WHERE d.event_timestamp BETWEEN '2023-01-01 00:00:00+00' AND '2023-01-07 00:00:00+00'
),

Example spine shape (

Spine = distinct (event_timestamp, driver_id, customer_id) from both FVs; missing entity is NULL.

event_timestamp	driver_id	customer_id
2023-01-01 00:00:00+00	101	NULL
2023-01-01 00:00:00+00	NULL	201
2023-01-02 00:00:00+00	101	NULL
…	…	…

3. TTL in the final SELECT

-- driver_fv has TTL 86400 (1 day)
CASE WHEN (spine.event_timestamp - "driver_fv__f"."driver_fv__filled_ts") <= make_interval(secs => 86400)
     THEN "driver_fv__f"."score" ELSE NULL END AS "score",
-- customer_fv has TTL 0: no CASE
"customer_fv__f"."amount" AS "amount"

Example final output shape

After LOCF and TTL: one row per (event_timestamp, driver_id, customer_id); score is NULL when outside TTL.

event_timestamp	driver_id	customer_id	score	amount
2023-01-01 00:00:00+00	101	NULL	0.8	NULL
2023-01-01 00:00:00+00	NULL	201	NULL	99.5
2023-01-02 00:00:00+00	101	NULL	0.9	NULL
…	…	…	…	…

Signed-off-by: Vanshika Vanshika <[email protected]> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

tokoko · 2026-03-04T08:32:18Z

@Vperiodt thanks for the PR. if it's not too much trouble, can you give a small example of result queries for say.. 2 fv scenario and highlight differences this PR introduces? there jinja templates are hard enough to read through even when one knows what the intent is 😄

Vperiodt · 2026-03-04T09:07:27Z

@Vperiodt thanks for the PR. if it's not too much trouble, can you give a small example of result queries for say.. 2 fv scenario and highlight differences this PR introduces? there jinja templates are hard enough to read through even when one knows what the intent is 😄

sure ! will add it in the description itself

Signed-off-by: Vanshika Vanshika <[email protected]> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

YassinNouh21

The LOCF approach is a solid improvement over the LATERAL join path — O(L log L) window passes vs O(N×M) inequality joins is a real win for large date-range queries. The per-FV independent pipeline is clean and avoids cross-FV interference.

A few issues to address before merge — one pre-existing bug that this PR should fix since it's modifying the non-entity path, dead code in the template, and some test coverage gaps.

YassinNouh21 · 2026-03-15T17:14:29Z