FTS_SDK_PLAN.md

Full Text Search SDK Implementation Plan - High Level

Overview

This document outlines the high-level changes needed in the Python SDK to support Full Text Search (FTS) features from the 2026-01.alpha API version. The implementation will add support for schema-based index creation and document search operations while maintaining backward compatibility with existing functionality.

UX Philosophy

Core Principles:

Zero Breaking Changes - All existing code continues to work unchanged
Progressive Enhancement - New features are additive, not replacements
Developer Experience First - Reduce boilerplate, provide helpful errors, enable autocomplete
Flexible Access Patterns - Support multiple ways to accomplish the same task
Smart Defaults - Minimize required parameters with sensible defaults

Key UX Strategies:

Builder Patterns - Fluent APIs for complex objects (schemas, filters)
Convenience Methods - Simple helpers for common use cases
Factory Functions - Easy creation of query objects
Flexible Input - Accept single items or lists, dicts or objects
Helpful Errors - Clear messages with suggestions and migration hints
Type Safety - Extensive type hints for IDE support and error prevention

Current State

SDK currently uses 2025-10 stable API version
OpenAPI client code is generated in pinecone/core/openapi/
No support for alpha API versions yet
Existing index creation uses spec-based approach
Existing search uses vector-based queries only

Required Changes

1. OpenAPI Code Generation for Alpha API

What: Generate Python client code from 2026-01.alpha API specifications

Where:

codegen/apis/_build/2026-01.alpha/ contains the OAS files
Need to generate code to pinecone/core/openapi/db_control_alpha/ and pinecone/core/openapi/db_data_alpha/

Key Generated Models:

Schema, SchemaField (control plane)
DocumentSearchRequest, DocumentSearchResponse, Document (data plane)
TextQuery, VectorQuery, ScoreByQuery (data plane)
DocumentUpsertRequest, DocumentUpsertResponse (data plane)

Action: Run OpenAPI codegen script to generate alpha API client code

2. API Version Management

What: Add support for using 2026-01.alpha API version alongside stable version

Changes Needed:

Create alpha API client instances when FTS features are used
Update API client setup to support version selection
Ensure alpha API version header (X-Pinecone-Api-Version: 2026-01.alpha) is sent

Files to Modify:

pinecone/utils/setup_openapi_client.py - Add alpha API version support
pinecone/db_control/db_control.py - Use alpha client for schema operations
pinecone/db_data/index.py - Use alpha client for document operations

3. Schema Models (Control Plane)

What: Create user-friendly Python models for index schema definition

New Files:

pinecone/db_control/models/schema.py - Schema container and field models
pinecone/db_control/models/schema_field.py - Field type definitions
pinecone/db_control/models/schema_builder.py - Fluent builder for schemas

Key Models:

Schema - Container for field definitions (DictLike for easy conversion)
StringField - String field with filterable and full_text_searchable options
IntegerField - Integer field with filterable option
DenseVectorField - Vector field with dimension and metric
SparseVectorField - Sparse vector field
SemanticTextField - Integrated inference field
SchemaBuilder - Fluent builder for constructing schemas

UX Features:

Builder pattern: SchemaBuilder().string("title", full_text_searchable=True).build()
Dict-like access: schema["title"] or schema.fields["title"]
Validation with helpful error messages
Type hints for IDE autocomplete

Purpose: Provide Pythonic interface for defining index schemas with minimal boilerplate

4. Index Creation Updates (Control Plane)

What: Update create_index() to support schema-based index creation

Changes:

Add schema parameter to create_index() methods (optional, for backward compatibility)
Support new deployment structure (replacing spec for serverless)
Handle read_capacity as top-level parameter
Maintain backward compatibility with existing spec-based API
Auto-detect API version: use alpha when schema provided, stable otherwise

Files to Modify:

pinecone/db_control/resources/sync/index.py
pinecone/db_control/resources/asyncio/index.py
pinecone/pinecone.py
pinecone/pinecone_asyncio.py

UX Enhancements:

Convenience Methods:
- create_text_index() - Simplified for text-only indexes
- create_hybrid_index() - Helper for text + vector indexes
Smart Defaults:
- Default deployment: serverless/aws/us-east-1 if not specified
- Auto-generated index names if not provided
Backward Compatibility:
- Existing create_index(name, spec=..., dimension=...) calls work unchanged
- New create_index(name, schema=...) calls use alpha API automatically
- Both patterns can coexist

Key Consideration: Zero breaking changes - all existing code continues to work

5. Document Query Models (Data Plane)

What: Create user-friendly models for document search queries

New Files:

pinecone/db_data/dataclasses/text_query.py - TextQuery wrapper
pinecone/db_data/dataclasses/vector_query.py - VectorQuery wrapper
pinecone/db_data/dataclasses/score_by_query.py - ScoreByQuery union type
pinecone/db_data/query_helpers.py - Convenience factory functions

Key Models:

TextQuery - Text search with field and text_query string (DictLike)
VectorQuery - Vector search with field and values/sparse_values (DictLike)
ScoreByQuery - Union type for query objects

UX Enhancements:

Factory Functions:
- text_query(field, query) - Simple text query creation
- text_query.from_phrase(field, phrase) - Create phrase query
- text_query.from_terms(field, required, optional) - Create with required/optional terms
- vector_query(field, values) - Simple vector query creation
Builder Pattern (optional):
- TextQueryBuilder().field("title").query("pink panther").build()
Validation:
- Clear errors for v0 limitations (single query only)
- Helpful suggestions for common mistakes

Purpose: Pythonic interface matching API schema structure with convenience helpers

6. Document Search Implementation (Data Plane)

What: Add search_documents() method for text and hybrid search

New Method Signature:

def search_documents(
    self,
    namespace: str,
    score_by: list[TextQuery | VectorQuery] | TextQuery | VectorQuery,  # Accept single or list
    filter: dict[str, Any] | FilterBuilder | None = None,  # Support FilterBuilder
    include_fields: list[str] | str | None = ["*"],  # Default to all fields
    top_k: int = 10,
) -> DocumentSearchResponse

UX Enhancements:

Flexible Input:
- Accept single query: score_by=TextQuery(...) (auto-wrap in list)
- Accept list: score_by=[TextQuery(...)]
- Support FilterBuilder for type-safe filter construction
Smart Defaults:
- include_fields defaults to ["*"] (return all fields)
- Clear error if score_by is empty or has >1 item (v0 limitation)
Convenience Overloads:
- search_documents(namespace, text_query="...", field="title") - Simple text search
- search_documents(namespace, vector=[...], field="embedding") - Simple vector search

Files to Modify:

pinecone/db_data/index.py - Add sync method
pinecone/db_data/index_asyncio.py - Add async method
pinecone/db_data/interfaces.py - Add abstract method
pinecone/db_data/index_asyncio_interface.py - Add async abstract method
pinecone/db_data/resources/sync/record.py - Implement endpoint call
pinecone/db_data/resources/asyncio/record_asyncio.py - Async endpoint call
pinecone/db_data/request_factory.py - Add request building logic

Endpoint: POST /namespaces/{namespace}/documents/search (alpha API)

Backward Compatibility:

Existing query() and search() methods unchanged
New method is additive, doesn't replace existing functionality

7. Document Response Models (Data Plane)

What: Create response models for document search results

New Files:

pinecone/db_data/dataclasses/document_search_response.py - Response wrapper
pinecone/db_data/dataclasses/document.py - Document model wrapper

Key Models:

DocumentSearchResponse - Contains documents array and usage (DictLike)
Document - Contains _id, score, and dynamic fields from include_fields (DictLike)

UX Enhancements:

Flexible Access:
- Attribute access: doc.title, doc.genre (with type hints)
- Dict access: doc["title"], doc.get("title")
- Both patterns work for maximum compatibility
Helper Methods:
- response.documents - List of Document objects
- response.usage - Usage information
- response[0] - Access first document (if needed)
Type Safety:
- Type hints for common fields
- IDE autocomplete support
- Clear error messages for missing fields

Purpose: User-friendly access to search results with multiple access patterns

8. Document Upsert Support (Data Plane)

What: Add support for document upsert API (if not already implemented)

Endpoint: POST /namespaces/{namespace}/documents/upsert (alpha API)

Changes:

May already exist via upsert_records() - verify and enhance if needed
Ensure it works with schema-based indexes

9. Validation and Error Handling

What: Add validation for FTS-specific constraints with helpful error messages

Validations:

V0 limitation: score_by array must have exactly 1 item
- Error: "V0 limitation: score_by must contain exactly 1 query. Found {count}. For multiple queries, wait for v1 support."
Schema field type validation
- Error: "Invalid field type '{type}'. Must be one of: string, integer, dense_vector, sparse_vector, semantic_text"
Field name validation (no reserved names)
- Error: "Field name '{name}' is reserved. Reserved names: _id, _values, _sparse_values"
include_fields format validation (array or "*" string)
- Error: "include_fields must be a list of strings or the string '*'. Got: {type}"

UX Enhancements:

Helpful Error Messages:
- Include what was wrong and what's expected
- Provide suggestions when possible
- Link to documentation for complex issues
Early Validation:
- Validate before API call to provide immediate feedback
- Type checking with mypy-friendly errors
Migration Hints:
- Suggest alternatives when using deprecated patterns
- Provide code examples in error messages when helpful

Files to Create/Modify:

pinecone/db_data/validators.py - Add FTS validators
pinecone/db_control/validators.py - Add schema validators
pinecone/exceptions/fts_exceptions.py - FTS-specific exceptions

10. Testing

What: Comprehensive test coverage for FTS features

Integration Tests:

Schema-based index creation
Document upsert with text fields
Text search queries (phrase, required terms)
Metadata filtering with text search
Field selection in responses
Error cases and edge cases

Unit Tests:

Schema model serialization/deserialization
Query model validation
Request factory logic
Response model parsing

Files to Create:

tests/integration/rest_sync/db/control/test_create_index_with_schema.py
tests/integration/rest_sync/db/data/test_document_search.py
tests/integration/rest_async/db/data/test_document_search_async.py
tests/unit/db_control/test_schema_models.py
tests/unit/db_data/test_text_query.py

11. Documentation

What: Update documentation with FTS examples and guides

Files to Update:

docs/working-with-indexes.rst - Schema-based index creation
docs/rest.rst - Document search API
README.md - Add FTS examples if needed

Content:

Creating indexes with full_text_searchable fields
Performing text search queries
Combining text and vector search (v0 limitations)
Query syntax reference

Implementation Phases

Phase 1: Foundation (Prerequisites)

Run OpenAPI codegen for 2026-01.alpha API
Verify generated models match API schema
Set up API version management infrastructure

Phase 2: Control Plane (Index Creation)

Create schema models
Update create_index() methods
Add integration tests

Phase 3: Data Plane (Search)

Create query models
Implement search_documents() method
Create response models
Add integration tests

Phase 4: Polish

Add validation and error handling
Update documentation
Edge case testing
Backward compatibility verification

UX Improvements & Backward Compatibility Strategy

UX Enhancements

Schema Builder Pattern (Similar to FilterBuilder)
- Fluent API for building schemas: SchemaBuilder().string("title", full_text_searchable=True).integer("year", filterable=True).build()
- Reduces boilerplate and prevents errors
- Provides autocomplete and type checking
Convenience Methods & Helpers
- create_text_index() - Simplified method for text-only indexes
- create_hybrid_index() - Helper for indexes with both text and vector fields
- text_query() - Factory function for creating TextQuery objects
- vector_query() - Factory function for creating VectorQuery objects
Smart Defaults
- Auto-detect API version based on parameters (use alpha when schema provided)
- Default include_fields=["*"] for document search (return all fields)
- Sensible defaults for deployment (serverless/aws/us-east-1)
Better Error Messages
- Clear validation errors with suggestions
- Helpful messages for v0 limitations
- Migration hints when using deprecated patterns
Document Model Access
- Document objects support attribute access: doc.title, doc.genre
- Fallback to dict access: doc["title"]
- Type hints for better IDE support
Query Building Helpers
- TextQuery.from_string() - Create from simple string
- TextQuery.from_phrase() - Create phrase query
- TextQuery.from_terms() - Create query with required/optional terms

Backward Compatibility Strategy

Zero Breaking Changes
- All existing create_index() calls continue to work unchanged
- Existing query() and search() methods remain fully functional
- No changes to existing method signatures
Gradual Migration Path
- Old way still works: create_index(name="idx", spec=ServerlessSpec(...), dimension=1536)
- New way available: create_index(name="idx", schema=Schema(...))
- Both can coexist - SDK detects which API version to use
Deprecation Strategy (Future)
- Add deprecation warnings only after FTS is stable
- Provide clear migration guides
- Long deprecation period (multiple releases)
API Version Selection
- Automatic: Use alpha API when schema is provided, stable otherwise
- Explicit: Allow users to specify API version if needed
- Transparent: Users don't need to think about API versions
Response Compatibility
- DocumentSearchResponse has similar structure to existing search responses
- Document objects can be converted to dicts for compatibility
- Existing code patterns continue to work

Key Design Decisions

Dual API Version Support: SDK will use stable API by default, alpha API when FTS features are used
Backward Compatibility: Existing methods (create_index() with spec, query(), search()) remain unchanged
Additive API: New methods (search_documents()) don't replace existing ones
Type Safety: Extensive use of type hints and TypedDict for API contracts
User-Friendly Wrappers: Create Pythonic models on top of generated OpenAPI models
Builder Patterns: Use fluent builder APIs for complex objects (schemas, queries)
Convenience Methods: Provide simple helpers for common use cases
Smart Defaults: Reduce boilerplate with sensible defaults

Dependencies

✅ API submodule on jhamon/fts branch with FTS API definitions
⏳ OpenAPI codegen must be run to generate 2026-01.alpha client code
⏳ Backend must support schema-based index creation and document search endpoints

Success Criteria

✅ Can create indexes with full_text_searchable fields using schema
✅ Can perform text search queries with phrase matching and required terms
✅ Can combine metadata filtering with text search
✅ Can select specific fields in search responses
✅ All tests pass (unit + integration)
✅ Documentation is complete
✅ Backward compatibility maintained (existing code continues to work)

Notes

V0 limitations: Single text field OR single vector field per query (not both)
score_by array limited to 1 item in v0 (minItems: 1, maxItems: 1)
Text query syntax is handled by backend; SDK passes through query strings
include_fields supports both array of strings and "*" string
Filter expressions support $text_match operator for required/excluded terms

Migration Examples

Creating an Index

Old Way (Still Works):

pc.create_index(
    name="my-index",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

New Way (Schema-Based):

# Using SchemaBuilder (recommended)
schema = SchemaBuilder() \
    .string("title", full_text_searchable=True) \
    .integer("year", filterable=True) \
    .dense_vector("embedding", dimension=1536, metric="cosine") \
    .build()

pc.create_index(name="my-index", schema=schema)

# Or using convenience method
pc.create_hybrid_index(
    name="my-index",
    text_fields={"title": {"full_text_searchable": True}},
    vector_fields={"embedding": {"dimension": 1536, "metric": "cosine"}}
)

Searching Documents

Old Way (Still Works):

results = index.query(vector=[...], top_k=10, filter={"genre": "comedy"})

New Way (Document Search):

# Simple text search
results = index.search_documents(
    namespace="movies",
    score_by=text_query("title", "pink panther"),
    filter={"genre": "comedy"},
    top_k=10
)

# Using FilterBuilder
filter = FilterBuilder().eq("genre", "comedy").build()
results = index.search_documents(
    namespace="movies",
    score_by=TextQuery(field="title", text_query="pink panther"),
    filter=filter
)

# Access results
for doc in results.documents:
    print(f"{doc.title} ({doc.year}) - Score: {doc.score}")

Breaking Changes Assessment

Zero Breaking Changes:

All existing code continues to work unchanged
New features are additive only
No method signatures changed
No behavior changes to existing methods

Future Considerations:

When FTS moves to stable API, may deprecate old patterns
Will provide long deprecation period and migration guides
Will maintain backward compatibility for multiple releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Text Search SDK Implementation Plan - High Level

Overview

UX Philosophy

Current State

Required Changes

1. OpenAPI Code Generation for Alpha API

2. API Version Management

3. Schema Models (Control Plane)

4. Index Creation Updates (Control Plane)

5. Document Query Models (Data Plane)

6. Document Search Implementation (Data Plane)

7. Document Response Models (Data Plane)

8. Document Upsert Support (Data Plane)

9. Validation and Error Handling

10. Testing

11. Documentation

Implementation Phases

Phase 1: Foundation (Prerequisites)

Phase 2: Control Plane (Index Creation)

Phase 3: Data Plane (Search)

Phase 4: Polish

UX Improvements & Backward Compatibility Strategy

UX Enhancements

Backward Compatibility Strategy

Key Design Decisions

Dependencies

Success Criteria

Notes

Migration Examples

Creating an Index

Searching Documents

Breaking Changes Assessment

FilesExpand file tree

FTS_SDK_PLAN.md

Latest commit

History

FTS_SDK_PLAN.md

File metadata and controls

Full Text Search SDK Implementation Plan - High Level

Overview

UX Philosophy

Current State

Required Changes

1. OpenAPI Code Generation for Alpha API

2. API Version Management

3. Schema Models (Control Plane)

4. Index Creation Updates (Control Plane)

5. Document Query Models (Data Plane)

6. Document Search Implementation (Data Plane)

7. Document Response Models (Data Plane)

8. Document Upsert Support (Data Plane)

9. Validation and Error Handling

10. Testing

11. Documentation

Implementation Phases

Phase 1: Foundation (Prerequisites)

Phase 2: Control Plane (Index Creation)

Phase 3: Data Plane (Search)

Phase 4: Polish

UX Improvements & Backward Compatibility Strategy

UX Enhancements

Backward Compatibility Strategy

Key Design Decisions

Dependencies

Success Criteria

Notes

Migration Examples

Creating an Index

Searching Documents

Breaking Changes Assessment