Skip to content

JSONB subscript index not used anymore after sqlalchemy 2.0.42 #12868

@valumapps

Description

@valumapps

Describe the bug

SQLAlchemy 2.0.42 changed the behavior regarding JSONB subscripting.
Unfortunately, this causes indexes on JSONB subscripts NOT to used anymore.

Optional link from https://docs.sqlalchemy.org which documents the behavior that is expected

No response

SQLAlchemy Version in Use

2.0.43

DBAPI (i.e. the database driver)

psycopg2

Database Vendor and Major Version

postgresql 17

Python Version

3.13

Operating system

linux

To Reproduce

We will use this example code with different PG versions:

#!/usr/bin/env python3
# imports
from sqlalchemy import Column, Index, Integer, func, select
from sqlalchemy.dialects import postgresql
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.engine import create_engine
from sqlalchemy.orm import declarative_base, sessionmaker

# model declaration
Base = declarative_base()


class Entity(Base):
    __tablename__ = "entity"
    id = Column(Integer, primary_key=True)
    data = Column(JSONB)

    Index("ix_entity_json_abc_text", data["a"]["b"]["c"].astext)


engine = create_engine("postgresql://")
engine = engine.execution_options(isolation_level="AUTOCOMMIT")

Session = sessionmaker()
Session.configure(bind=engine)
session = Session()

# --- create tables
Base.metadata.create_all(engine)

# --- add a few entities
if session.scalars(select(func.count()).select_from(Entity)).one() == 0:
    for i in range(100000):
        session.add(Entity(data={"a": {"b": {"c": f"something {i}"}}}))
    session.flush()


# --- show query
q = select(Entity).where(Entity.data["a"]["b"]["c"].astext == "something 42")
print(q.compile(dialect=postgresql.dialect()))

Let's spawn an empty PG server:

% docker run --name pgsqlajsonindex --rm -e POSTGRES_PASSWORD=pg -d -p 3333:5432 docker.io/postgres:17

Start with a prior sqlalchemy version:

% pip install SQLAlchemy==2.0.41

Populate database and print the query filtering on the indexed data:

% PGDATABASE=postgres PGUSER=postgres PGPASSWORD=pg PGHOST=127.0.0.1 PGPORT=3333 python3 sqla-json-index.py
SELECT entity.id, entity.data 
FROM entity 
WHERE ((((entity.data -> %(data_1)s) -> %(param_1)s)) ->> %(param_2)s) = %(param_3)s

Let's see how our index was created, and also EXPLAIN the query:

% PGDATABASE=postgres PGUSER=postgres PGPASSWORD=pg PGHOST=127.0.0.1 PGPORT=3333 psql                      

postgres^# \d+ entity
                                                       Table "public.entity"
 Column │  Type   │ Collation │ Nullable │              Default               │ Storage  │ Compression │ Stats target │ Description 
────────┼─────────┼───────────┼──────────┼────────────────────────────────────┼──────────┼─────────────┼──────────────┼─────────────
 id     │ integer │           │ not null │ nextval('entity_id_seq'::regclass) │ plain    │             │              │ 
 data   │ jsonb   │           │          │                                    │ extended │             │              │ 
Indexes:
    "entity_pkey" PRIMARY KEY, btree (id)
    "ix_entity_json_abc_text" btree ((((data -> 'a'::text) -> 'b'::text) ->> 'c'::text))
Access method: heap

postgres^# explain analyze SELECT entity.id, entity.data 
FROM entity 
WHERE ((((entity.data -> 'a') -> 'b')) ->> 'c') = 'something 99';
                                                           QUERY PLAN                                                            
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 Index Scan using ix_entity_json_abc_text on entity  (cost=0.42..8.44 rows=1 width=64) (actual time=0.032..0.033 rows=1 loops=1)
   Index Cond: ((((data -> 'a'::text) -> 'b'::text) ->> 'c'::text) = 'something 99'::text)
 Planning Time: 0.147 ms
 Execution Time: 0.046 ms
(4 rows)

Upgrade sqlalchemy:

% pip install SQLAlchemy==2.0.43

Re-run the script (which won't populate database this time):

% PGDATABASE=postgres PGUSER=postgres PGPASSWORD=pg PGHOST=127.0.0.1 PGPORT=3333 python3 sqla-json-index.py
SELECT entity.id, entity.data 
FROM entity 
WHERE ((entity.data[%(data_1)s][%(param_1)s]) ->> %(param_2)s) = %(param_3)s

As expected by reading the changelog, the query changed.

But now the index isn't used anymore:

% PGDATABASE=postgres PGUSER=postgres PGPASSWORD=pg PGHOST=127.0.0.1 PGPORT=3333 psql                      

postgres^# explain analyze SELECT entity.id, entity.data 
FROM entity 
WHERE ((entity.data['a']['b']) ->> 'c') = 'something 99';
                                              QUERY PLAN                                               
───────────────────────────────────────────────────────────────────────────────────────────────────────
 Seq Scan on entity  (cost=0.00..2725.00 rows=500 width=64) (actual time=0.042..18.784 rows=1 loops=1)
   Filter: ((data['a'::text]['b'::text] ->> 'c'::text) = 'something 99'::text)
   Rows Removed by Filter: 99999
 Planning Time: 0.234 ms
 Execution Time: 18.810 ms
(5 rows)

I suspect there's a problem in postgres itself not using the index. Or maybe [] and -> aren't equivalent after all?
Anyway, the change in sqlalchemy sort of broke the index usage.

Error

postgres^# explain analyze SELECT entity.id, entity.data 
FROM entity 
WHERE ((entity.data['a']['b']) ->> 'c') = 'something 99';
                                              QUERY PLAN                                               
───────────────────────────────────────────────────────────────────────────────────────────────────────
 Seq Scan on entity  (cost=0.00..2725.00 rows=500 width=64) (actual time=0.042..18.784 rows=1 loops=1)
   Filter: ((data['a'::text]['b'::text] ->> 'c'::text) = 'something 99'::text)
   Rows Removed by Filter: 99999
 Planning Time: 0.234 ms
 Execution Time: 18.810 ms
(5 rows)

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationgreat mcveAn issue with a great mcvejsonthings to do with JSONpostgresqlregressionsomething worked and was broken by a change

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions