rootflo_llm -> lazy fetching llm config by rootflo-hardik · Pull Request #166 · rootflo/wavefront

rootflo-hardik · 2025-11-21T12:46:13Z

the fetch_llm_config call was synchronous, and if the server calling flo-ai and base_url are same, that was causing a deadlock.

Summary by CodeRabbit

Refactor
- Initialization now defers model/SDK creation and performs lazy, async, thread-safe initialization on first use.
New Features
- Added support for a vLLM provider and a factory pathway to construct it.
Bug Fixes
- Improved asynchronous config retrieval with clearer error handling for remote calls.
Behavior
- Public LLM calls (generate/stream/get_message_content) now trigger lazy initialization before use.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- the fetch_llm_config call was synchronous, and if the server calling flo-ai and base_url are same, that was causing a deadlock.

coderabbitai · 2025-11-21T12:46:33Z

Walkthrough

RootFloLLM now defers LLM configuration and SDK instantiation until first use via an async initializer guarded by an asyncio.Lock; it fetches model/type/config with httpx.AsyncClient and selects the provider at runtime. LLMFactory adds an openai_vllm provider and a constructor for OpenAIVLLM instances.

Changes

Cohort / File(s)	Summary
Lazy async initialization refactor `flo_ai/flo_ai/llm/rootflo_llm.py`	Removed eager config fetch from `__init__`; stored init params (`_base_url`, `_model_id`, `_app_key`, `_app_secret`, `_issuer`, `_audience`, `_access_token`, `_temperature`, `_kwargs`) and added lazy state (`_llm`, `_initialized`, `_init_lock`); added `async def _fetch_llm_config_async(...)` and `async def _ensure_initialized()` with double-checked locking; switched to `httpx.AsyncClient`; deferred token/header creation and SDK wrapper instantiation based on fetched `llm_model`/`llm_type` (includes `VLLM`); `generate()` and `stream()` now await `_ensure_initialized()`; `get_message_content()` requires init.
LLM factory — new provider `flo_ai/flo_ai/helpers/llm_factory.py`	Added `'openai_vllm'` to `SUPPORTED_PROVIDERS`; implemented static `_create_openai_vllm_llm(model_config: Dict[str, Any], **kwargs)` to validate inputs and construct an `OpenAIVLLM` (requires `model` and `base_url`; accepts optional `api_key`, `temperature`); `create_llm` dispatches `openai_vllm` to this constructor.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RootFloLLM
    participant InitLock as asyncio.Lock
    participant ConfigAPI as Config API
    participant SDK as LLM SDK

    Note over RootFloLLM: Lazy async init on first use
    Client->>RootFloLLM: generate()/stream()
    RootFloLLM->>InitLock: acquire()
    alt already initialized
        InitLock-->>RootFloLLM: return
    else first call
        RootFloLLM->>ConfigAPI: _fetch_llm_config_async() (httpx.AsyncClient)
        ConfigAPI-->>RootFloLLM: config (llm_model, llm_type, creds)
        RootFloLLM->>RootFloLLM: build headers/token, resolve provider
        RootFloLLM->>SDK: instantiate SDK wrapper (OpenAI/Anthropic/Gemini/VLLM/OpenAIVLLM)
    end
    InitLock-->>RootFloLLM: release()
    RootFloLLM->>SDK: delegate generate()/stream()
    SDK-->>Client: response / stream

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Review focus:
- Concurrency and double-checked locking in _ensure_initialized() (cancellation/exception handling).
- _fetch_llm_config_async() HTTP error handling, timeouts, and response parsing.
- Provider mapping and correct SDK instantiation for VLLM and OpenAIVLLM.
- Security of token/JWT/header handling moved into async path.
- Input validation and defaults in LLMFactory._create_openai_vllm_llm.

Poem

🐰 I waited quiet, paws on keys,
I fetched the config on first breeze,
A lock, a token, model set free,
Instantiated only when called to be,
Tiny hops of async jubilee.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: converting synchronous LLM config fetching to lazy initialization, which is the core objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix_rootflo_llm

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4b0910e and 560c1bf.

📒 Files selected for processing (1)

flo_ai/flo_ai/llm/rootflo_llm.py (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flo_ai/flo_ai/llm/rootflo_llm.py (1)

flo_ai/flo_ai/llm/base_llm.py (1)

BaseLLM (9-136)

🔇 Additional comments (7)

flo_ai/flo_ai/llm/rootflo_llm.py (7)

7-7: LGTM!

The new imports support lazy initialization (asyncio for Lock) and VLLM provider support (OpenAIVLLM).

Also applies to: 12-12

22-22: LGTM!

VLLM provider added to the enum, consistent with existing provider definitions.

89-117: LGTM! Race condition fix confirmed.

The initialization parameters are correctly stored for lazy initialization, and the lock is created in __init__ (line 103) rather than lazily. This addresses the race condition identified in past review comments where concurrent coroutines could create multiple lock instances.

119-172: LGTM!

The async config fetch correctly uses httpx.AsyncClient to avoid blocking, with comprehensive error handling and appropriate timeout.

173-276: LGTM! Double-checked locking correct, VLLM includes custom_headers.

The lazy initialization correctly implements double-checked locking with the lock created in __init__. All providers, including VLLM (line 268), now receive custom_headers, which addresses the issue raised in past review comments.

277-299: LGTM!

Both generate and stream correctly ensure initialization before delegating to the underlying LLM.

301-307: LGTM!

get_message_content correctly guards against uninitialized access with a clear error message.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

flo_ai/flo_ai/llm/rootflo_llm.py (4)
87-107: Lazy state storage and BaseLLM placeholder model/api_key look reasonable

Persisting constructor parameters (_base_url, _model_id, JWT fields, temperature, **kwargs) for later use and introducing _llm/_initialized flags align well with the new lazy-init flow. Calling BaseLLM.__init__ with placeholder model/api_key and then overwriting self.model and self.api_key in _ensure_initialized is a pragmatic way to preserve the parent’s setup without doing eager I/O in __init__.

One tiny nit: the comment “Will be set during initialization” above self.base_url/self.model_id is now slightly misleading since they’re set immediately, but this is non-blocking and can be tidied later if desired.

Also applies to: 108-115

117-169: Async config fetch via httpx.AsyncClient is correct; consider more specific exception types

The async config fetch logic is solid: correct URL construction, auth headers with JWT, a reasonable timeout, and explicit shape validation for data.llm_model and data.type. Switching from a sync client to httpx.AsyncClient should avoid the earlier deadlock if callers share the same base_url as flo-ai.

One improvement you might want to consider later is raising a more specific custom exception (e.g., RootFloLLMConfigError) instead of generic Exception so callers can distinguish config/bootstrap failures from downstream LLM errors more easily.

171-264: _ensure_initialized flow looks correct; minor robustness tweaks possible

The lazy-init sequence here is well structured:

Fast-path and double-checked locking avoid redundant work under concurrency.

JWT generation or reuse of a provided access_token is clean; api_token is then propagated consistently to the config fetch and downstream SDK wrappers.

Config response is validated, llm_type normalized via LLMProvider, and misconfigurations surface as explicit ValueErrors.

The full proxy inference URL and X-Rootflo-Key headers are built once and passed into the appropriate wrapper (OpenAI, Anthropic, Gemini), and _initialized is only set after _llm is constructed.

Two small robustness ideas:

Surface provider/type in errors – in the else branch at lines 259-260, you might include both the raw llm_type and enum value in the error to aid debugging, though this is non-essential.

Optionally cache config fields – if you foresee needing raw llm_type/llm_model from the API for introspection, you could cache self._llm_model_from_config / self._llm_type_from_config, but it’s not required for correctness.

Overall, this method cleanly centralizes all initialization concerns and matches the PR’s goal.

289-303: Guard synchronous helper methods against uninitialized _llm

With the shift to lazy initialization, it’s now possible (at least in theory) for callers to access get_message_content, format_tool_for_llm, format_tools_for_llm, or format_image_in_message before generate/stream has been called. In that case, self._llm would still be None, and these methods would raise an AttributeError instead of a clearer message.

To make failure modes more predictable, consider adding a simple guard:
    def get_message_content(self, response: Any) -> str:
        """Extract message content from response"""
-        return self._llm.get_message_content(response)
+        if self._llm is None:
+            raise RuntimeError(
+                "RootFloLLM is not initialized. Call 'generate' or 'stream' "
+                "at least once before using helper methods."
+            )
+        return self._llm.get_message_content(response)
and similarly for the other helper methods, or at least document that generate/stream must be called first. This isn’t strictly required if all current call sites follow that pattern, but it will make the class more defensive and easier to use correctly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b0bbc3 and 64f65c9.

📒 Files selected for processing (1)

flo_ai/flo_ai/llm/rootflo_llm.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flo_ai/flo_ai/llm/rootflo_llm.py (3)

flo_ai/flo_ai/llm/openai_llm.py (1)

OpenAI (17-209)

flo_ai/flo_ai/llm/anthropic_llm.py (1)

Anthropic (18-277)

flo_ai/flo_ai/llm/gemini_llm.py (1)

Gemini (19-260)

🔇 Additional comments (4)

flo_ai/flo_ai/llm/rootflo_llm.py (4)

55-57: Lazy-init note in docstring matches PR intent

The added note about fetching config lazily on first generate/stream clearly documents the behavioral change and the deadlock fix rationale. No issues here.

273-276: generate correctly gates on _ensure_initialized

Having generate await _ensure_initialized before delegating to self._llm.generate is exactly what’s needed to guarantee that the underlying SDK wrapper and config are ready without changing the public method signature. This should remove the previous deadlock risk tied to eager HTTP calls in the constructor.

285-287: stream integration with lazy init is consistent

stream mirrors generate by awaiting _ensure_initialized once and then streaming from the underlying wrapper. This keeps the interface stable while making lazy init transparent to callers.

98-101: Review comment is outdated and does not apply to this codebase's Python version.

The concern about event-loop mismatch when creating asyncio.Lock() in a synchronous __init__ was valid for Python versions prior to 3.10. However, this codebase explicitly requires Python >=3.10 (per pyproject.toml), and Python 3.10 fundamentally changed how asyncio synchronization primitives handle event loop binding.

In Python 3.10+, asyncio.Lock() no longer captures the event loop at construction time. Instead, it defers creating loop-specific resources until the lock is first used in an async context. This design eliminates the "wrong loop" RuntimeError that could occur in earlier Python versions.

The current implementation at lines 98–101 is safe and correct for the target Python version. The suggested refactoring to defer lock creation is unnecessary complexity.

Likely an incorrect or invalid review comment.

vizsatiz · 2025-11-22T09:31:24Z

flo_ai/flo_ai/llm/rootflo_llm.py

+        # Actual model will be set during lazy initialization
        super().__init__(
-            model=llm_model, api_key=api_token, temperature=temperature, **kwargs
+            model='placeholder',  # Will be updated during lazy init


Change to None

vizsatiz · 2025-11-22T09:32:24Z

flo_ai/flo_ai/llm/rootflo_llm.py

+                )
+            elif llm_provider == LLMProvider.GEMINI:
+                # Gemini SDK - pass base_url which will be handled via http_options
+                self._llm = Gemini(


Add support for vLLM and Ollama

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

flo_ai/flo_ai/llm/rootflo_llm.py (2)
98-102: Potential event‑loop dependency when creating asyncio.Lock() in __init__

Instantiating asyncio.Lock() in a synchronous __init__ can raise RuntimeError (“no running event loop”) depending on the Python version and how/when RootFloLLM is constructed (e.g., at import time or app startup before an event loop is running). This would surface as an immediate construction failure.

Consider deferring lock creation until first use inside _ensure_initialized, which always runs in an async context, e.g.:
-        self._init_lock = asyncio.Lock()
+        self._init_lock = None
and in _ensure_initialized:
-        # Acquire lock for initialization
-        async with self._init_lock:
+        # Lazily create the lock in an async context
+        if self._init_lock is None:
+            self._init_lock = asyncio.Lock()
+
+        # Acquire lock for initialization
+        async with self._init_lock:
Please verify how RootFloLLM is constructed in your deployment (startup vs inside an async request handler) to confirm whether this is observable under your Python runtime.

171-264: Initialization guard and provider dispatch are correct, but helper methods can observe uninitialized _llm

The double‑checked locking in _ensure_initialized, JWT generation, config fetch, mapping type → LLMProvider, and instantiation of OpenAI/Anthropic/Gemini all look correct, and the fast‑path if self._initialized: return keeps steady‑state overhead low.

However, get_message_content, format_tool_for_llm, format_tools_for_llm, and format_image_in_message access self._llm without calling _ensure_initialized. If any caller uses these helpers before the first generate/stream (e.g., formats tools or images before sending the first request), self._llm will still be None and you’ll get an AttributeError.

Two options you might consider:

Defensive guard with clearer error (no interface change, just better failure mode):
    def get_message_content(self, response: Any) -> str:
        """Extract message content from response"""
-        return self._llm.get_message_content(response)
+        if not getattr(self, "_initialized", False) or self._llm is None:
+            raise RuntimeError(
+                "RootFloLLM is not initialized yet; call generate() or stream() first."
+            )
+        return self._llm.get_message_content(response)
and similarly for the other helper methods.

Contract clarification: If your current call‑sites always invoke generate/stream first, document that these helpers must only be used after initialization, and consider adding an explicit async initialize() method that some components can await upfront.

Please check existing usages (e.g., any code that calls format_tools_for_llm or format_image_in_message immediately after constructing the LLM) to make sure this change doesn’t introduce a latent NoneType access.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64f65c9 and eb942e1.

📒 Files selected for processing (2)

flo_ai/flo_ai/helpers/llm_factory.py (4 hunks)
flo_ai/flo_ai/llm/rootflo_llm.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

flo_ai/flo_ai/helpers/llm_factory.py (1)

flo_ai/flo_ai/llm/openai_vllm.py (1)

OpenAIVLLM (6-99)

flo_ai/flo_ai/llm/rootflo_llm.py (3)

flo_ai/flo_ai/llm/openai_llm.py (1)

OpenAI (17-209)

flo_ai/flo_ai/llm/anthropic_llm.py (1)

Anthropic (18-277)

flo_ai/flo_ai/llm/gemini_llm.py (1)

Gemini (19-260)

🔇 Additional comments (4)

flo_ai/flo_ai/helpers/llm_factory.py (1)

18-26: openai_vllm provider wiring and example look correct

Adding 'openai_vllm' to SUPPORTED_PROVIDERS, the explicit branch in create_llm, and the doc example are consistent and should work as expected with a lower‑cased provider field and the OpenAIVLLM implementation.

Also applies to: 68-75, 76-91

flo_ai/flo_ai/llm/rootflo_llm.py (3)

55-57: Lazy initialization design is a good fit for avoiding startup blocking

Storing the config/auth fields on self._*, introducing _llm/_initialized state, and deferring remote config fetch until first generate/stream is a clean way to remove synchronous I/O from __init__. The BaseLLM.__init__ call with placeholder model/api_key followed by later assignment in _ensure_initialized is consistent with how the underlying provider wrappers are used.

Also applies to: 87-107, 108-115

117-144: Async config fetch with httpx.AsyncClient and clear error paths look solid

The asynchronous _fetch_llm_config_async function, including headers, timeout, and explicit handling of missing data/llm_model/type fields plus detailed error messages for HTTPStatusError/RequestError, is structured well for debugging and avoids blocking the event loop with synchronous I/O.

265-276: generate and stream correctly delegate after ensuring initialization

Adding await self._ensure_initialized() at the top of generate and stream ensures the underlying provider wrapper and auth are ready before use, while preserving the original async signatures and delegation to the SDK‑specific implementations. This matches the lazy‑init intent and should be safe under concurrent calls thanks to the lock.

Also applies to: 278-287

flo_ai/flo_ai/helpers/llm_factory.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb942e1 and ff32552.

📒 Files selected for processing (1)

flo_ai/flo_ai/llm/rootflo_llm.py (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flo_ai/flo_ai/llm/rootflo_llm.py (5)

flo_ai/flo_ai/llm/base_llm.py (1)

BaseLLM (9-136)

flo_ai/flo_ai/llm/openai_llm.py (1)

OpenAI (17-209)

flo_ai/flo_ai/llm/gemini_llm.py (1)

Gemini (19-260)

flo_ai/flo_ai/llm/anthropic_llm.py (1)

Anthropic (18-277)

flo_ai/flo_ai/llm/openai_vllm.py (1)

OpenAIVLLM (6-99)

🔇 Additional comments (7)

flo_ai/flo_ai/llm/rootflo_llm.py (7)

7-7: LGTM! Imports support lazy initialization.

The asyncio import enables the lock-based initialization guard, and OpenAIVLLM import supports the new VLLM provider.

Also applies to: 12-12

22-22: LGTM! VLLM provider added.

The new provider type aligns with the OpenAIVLLM adapter and factory support.

119-171: LGTM! Robust async config fetching.

The implementation correctly uses httpx.AsyncClient with proper error handling and response validation. Creating a new client for each fetch is acceptable since this is only called once per instance during lazy initialization.

276-287: LGTM! Lazy initialization correctly implemented.

The method properly calls _ensure_initialized before delegating to the underlying LLM, ensuring the SDK is initialized on first use.

289-298: LGTM! Lazy initialization correctly implemented.

The method properly calls _ensure_initialized before delegating to the underlying LLM.

113-114: No issues found—the lazy initialization pattern prevents the concern from materializing.

The empty strings passed to BaseLLM.__init__ (lines 113-114) are safely updated in _ensure_initialized() before any LLM operations. All public async methods (generate, stream) call await self._ensure_initialized() first, and the actual LLM work delegates to self._llm, not the instance attributes. No synchronous code path accesses these values prematurely, so the design is sound.

300-314: Based on my analysis of the code, the original review comment identifies a real architectural concern with RootFloLLM's lazy initialization pattern, but the severity assessment is incomplete.

Key findings from code analysis:

The usage patterns from the script output show:

get_message_content: Always called after generate() in agent.py (lines 155, 244, 307, 418, 605) and llm_router.py (line 117)

format_tools_for_llm: Called after generate() in agent.py's tool loop (line 232)

format_tool_for_llm: Not found being called directly in production code; only used in tests and in format_tools_for_llm implementations

format_image_in_message: Called in base_agent.py (line 105) during message history building, which can occur before the first generate() call

The actual issue is more nuanced:

The primary risk exists for format_image_in_message in the message preparation flow, not equally across all four methods. The other three methods' usage patterns suggest they're called after initialization. However, RootFloLLM's lazy initialization design does create a foot-gun that could fail if any of these methods are invoked before the first generate() or stream() call.

Recommendation:

The original review's suggestion to add initialization guards is sound and defensive, but the proposed diffs should be applied with an understanding that in current production code flow, only format_image_in_message has a documented execution path before initialization. Adding guards to all four methods is still valid defensive programming for a public API.

flo_ai/flo_ai/llm/rootflo_llm.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

flo_ai/flo_ai/llm/rootflo_llm.py (1)

281-323: Guard all helper methods that rely on _llm to avoid AttributeError before first use

With lazy init, _llm remains None until generate() or stream() has run _ensure_initialized(). get_message_content() now guards against this, but:

format_tool_for_llm
format_tools_for_llm
format_image_in_message

still call self._llm.* directly. If any caller uses these helpers before the first generate()/stream() (which is a reasonable pattern when preparing tools or messages up-front), this will raise a AttributeError: 'NoneType' object has no attribute ... instead of a clear initialization error.

You can centralize the initialization check and reuse it across all sync helpers:

     def get_message_content(self, response: Any) -> str:
         """Extract message content from response"""
-        if not getattr(self, '_initialized', False) or self._llm is None:
-            raise RuntimeError(
-                'RootFloLLM is not initialized yet; call generate() or stream() first.'
-            )
-        return self._llm.get_message_content(response)
+        self._ensure_sync_initialized()
+        return self._llm.get_message_content(response)
+
+    def _ensure_sync_initialized(self) -> None:
+        """
+        Ensure the underlying LLM wrapper is available for synchronous helper methods.
+        """
+        if not getattr(self, '_initialized', False) or self._llm is None:
+            raise RuntimeError(
+                'RootFloLLM is not initialized yet; call generate() or stream() first.'
+            )
@@
     def format_tool_for_llm(self, tool: 'Tool') -> Dict[str, Any]:
         """Format a tool for the specific LLM's API"""
-        return self._llm.format_tool_for_llm(tool)
+        self._ensure_sync_initialized()
+        return self._llm.format_tool_for_llm(tool)
@@
     def format_tools_for_llm(self, tools: List['Tool']) -> List[Dict[str, Any]]:
         """Format a list of tools for the specific LLM's API"""
-        return self._llm.format_tools_for_llm(tools)
+        self._ensure_sync_initialized()
+        return self._llm.format_tools_for_llm(tools)
@@
     def format_image_in_message(self, image: ImageMessageContent) -> str:
         """Format a image in the message"""
-        return self._llm.format_image_in_message(image)
+        self._ensure_sync_initialized()
+        return self._llm.format_image_in_message(image)

This keeps the lazy-init contract while giving callers deterministic and descriptive failures if they use these helpers too early.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff32552 and c7c4fb5.

📒 Files selected for processing (1)

flo_ai/flo_ai/llm/rootflo_llm.py (8 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

flo_ai/flo_ai/llm/rootflo_llm.py (5)

flo_ai/flo_ai/llm/base_llm.py (1)

BaseLLM (9-136)

flo_ai/flo_ai/llm/openai_llm.py (3)

OpenAI (17-209)

stream (119-149)

get_message_content (151-156)

flo_ai/flo_ai/llm/gemini_llm.py (3)

Gemini (19-260)

stream (151-206)

get_message_content (208-212)

flo_ai/flo_ai/llm/anthropic_llm.py (3)

Anthropic (18-277)

stream (155-200)

get_message_content (202-206)

flo_ai/flo_ai/llm/openai_vllm.py (1)

OpenAIVLLM (6-99)

🔇 Additional comments (2)

flo_ai/flo_ai/llm/rootflo_llm.py (2)

119-171: Async config fetch looks solid and non-blocking

The new _fetch_llm_config_async uses httpx.AsyncClient with a per-request timeout, clear auth headers, and specific exception handling (HTTPStatusError, RequestError, generic) with useful error messages. This achieves the PR goal of avoiding a synchronous HTTP call during initialization while keeping failure modes explicit.

233-274: VLLM provider integration via OpenAIVLLM and headers

The provider branching and VLLM case look consistent with the other backends:

llm_type is normalized via LLMProvider(llm_type.lower()), so "vllm" from the API maps cleanly to LLMProvider.VLLM.

For VLLM, OpenAIVLLM is instantiated with model, base_url, api_key, temperature, and custom_headers, matching how OpenAI/Anthropic/Gemini receive proxy headers.

Since OpenAIVLLM.__init__ forwards **kwargs to OpenAI.__init__, custom_headers will correctly reach the OpenAI client.

This closes the earlier gap around forwarding custom_headers to the vLLM proxy.

coderabbitai · 2025-11-22T10:17:37Z

flo_ai/flo_ai/llm/rootflo_llm.py

+        # Store initialization parameters for lazy initialization
+        self._base_url = base_url
+        self._model_id = model_id
+        self._app_key = app_key
+        self._app_secret = app_secret
+        self._issuer = issuer
+        self._audience = audience
+        self._access_token = access_token
+        self._temperature = temperature
+        self._kwargs = kwargs
+
+        # Lazy initialization state
+        self._llm = None
+        self._initialized = False
+        self._init_lock = None
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix race in lazy initialization lock creation

The current double-checked locking pattern can create two different asyncio.Lock instances if two coroutines hit _ensure_initialized concurrently while _init_lock is still None. Each coroutine can then async with a different lock and both proceed to run the one-time initialization concurrently (duplicated config fetch and LLM construction, last writer wins).

Given this lock is lightweight, it’s simpler and safer to create a single instance in __init__ and always use it, instead of lazily creating it inside _ensure_initialized.

Consider a change along these lines:

@@ - # Lazy initialization state - self._llm = None - self._initialized = False - self._init_lock = None + # Lazy initialization state + self._llm = None + self._initialized = False + # Single lock instance to protect one-time async initialization + self._init_lock = asyncio.Lock() @@ - # Lazily create the lock in an async context - if self._init_lock is None: - self._init_lock = asyncio.Lock() - - # Acquire lock for initialization - async with self._init_lock: + # Acquire lock for initialization + async with self._init_lock: # Double-check: another task might have initialized while we waited if self._initialized: return

Please verify this matches your supported Python/asyncio versions’ expectations for creating asyncio.Lock() in __init__.

To double-check the lock semantics for your runtime, you can consult the asyncio docs:

Also applies to: 173-188

🌐 Web query:

Is it safe to construct `asyncio.Lock()` outside a running event loop (e.g., in `__init__`) in Python <your target version>?

💡 Result:

Short answer: No for older Pythons — yes (safe) on modern Python, but prefer lazy creation.

Details:

On Python < 3.10 asyncio primitives (Lock, Event, Condition, Future, etc.) were bound to the event loop at construction time; creating asyncio.Lock() outside the intended running loop can attach it to the wrong loop and lead to RuntimeError when used later. [1][2]

Starting around Python 3.10 the behaviour was changed/cleaned up (the explicit loop argument was deprecated/removed and the library stopped requiring a loop at construction), so creating primitives before calling asyncio.run() is not the same problem on recent Python versions. [3][4]

Recommendation: to be robust across versions, don't construct asyncio primitives at import / in init unless you know which loop they should attach to. Instead initialize them lazily (e.g., self._lock = None and set self._lock = asyncio.Lock() inside an async initializer or on first use), or construct them from inside the running event loop. [2][5]

Sources:

Python docs — asyncio synchronization primitives (shows loop parameter / behaviour). [2]

StackOverflow discussion and examples explaining RuntimeError when Lock created outside running loop. [3]

Python docs / issue notes about deprecating loop parameter and behaviour changes in 3.10+. [6]

Python asyncio policy / event loop docs describing changes in get_event_loop/get_running_loop behaviour. [1]

(References: [1] asyncio policy/event loop docs; [2] asyncio sync docs; [3] StackOverflow; [6] Python issue/changes.)

I need to determine the target Python version for this codebase to verify whether the proposed fix is safe.

🏁 Script executed:

#!/bin/bash # Search for Python version specifications in common config files fd -e 'pyproject.toml' -e 'setup.py' -e 'setup.cfg' -e '.python-version' -e 'tox.ini' -e 'README.md' | head -20

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check pyproject.toml if it exists for Python version if [ -f pyproject.toml ]; then echo "=== pyproject.toml ==="; cat pyproject.toml | grep -A5 -B5 -i "python" fi

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check setup.py if it exists if [ -f setup.py ]; then echo "=== setup.py ==="; cat setup.py | grep -i "python_requires\|version" fi

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check for .python-version or similar if [ -f .python-version ]; then echo "=== .python-version ==="; cat .python-version fi

Length of output: 0

🏁 Script executed:

#!/bin/bash # List root directory to see structure ls -la | head -30

Length of output: 1094

🏁 Script executed:

#!/bin/bash # Broader search for config files anywhere in repo fd -i 'setup|pyproject|requirements|tox' --type f | head -20

Length of output: 65

🏁 Script executed:

#!/bin/bash # Check the actual rootflo_llm.py file for imports and context cat flo_ai/flo_ai/llm/rootflo_llm.py | head -120

Length of output: 3902

🏁 Script executed:

#!/bin/bash # Check flo_ai/pyproject.toml for Python version echo "=== flo_ai/pyproject.toml ===" && cat flo_ai/pyproject.toml

Length of output: 1831

🏁 Script executed:

#!/bin/bash # Check flo_ai/setup.py for Python version echo "=== flo_ai/setup.py ===" && cat flo_ai/setup.py

Length of output: 651

🏁 Script executed:

#!/bin/bash # Search for the helper methods mentioned in the review rg -n 'format_tool_for_llm|format_tools_for_llm|format_image_in_message' flo_ai/flo_ai/llm/rootflo_llm.py

Length of output: 405

🏁 Script executed:

#!/bin/bash # Get full view of the _ensure_initialized method and surrounding code sed -n '170,220p' flo_ai/flo_ai/llm/rootflo_llm.py

Length of output: 1962

Fix race condition in lazy initialization lock and add guards to helper methods

The double-checked locking pattern at lines 182-184 has a race window: two concurrent coroutines can each see _init_lock is None and create two different asyncio.Lock instances, allowing both to proceed past the lock. While Python 3.10+ (your target version per pyproject.toml) permits safe asyncio.Lock() construction in __init__, the official recommendation is still lazy initialization for better practice.

Additionally, format_tool_for_llm, format_tools_for_llm, and format_image_in_message (lines 313–323) call self._llm.* without checking initialization, risking AttributeError if invoked before generate/stream. These methods should guard against uninitialized state.

Suggested approach:

Lock: Create once in __init__:

- self._init_lock = None + # Single lock instance to protect one-time async initialization + self._init_lock = asyncio.Lock()

Then remove the lazy lock creation from _ensure_initialized (lines 182–184).

Helper methods: Add an initialization guard:

def format_tool_for_llm(self, tool: 'Tool') -> Dict[str, Any]: if not self._initialized: raise RuntimeError('LLM not yet initialized; call generate/stream first') return self._llm.format_tool_for_llm(tool)

Apply the same guard pattern to format_tools_for_llm and format_image_in_message.

🤖 Prompt for AI Agents

In flo_ai/flo_ai/llm/rootflo_llm.py around lines 89 to 104 and referenced init code (lines ~182-184 and helper methods ~313-323), avoid creating the init lock lazily to prevent a race: instantiate self._init_lock = asyncio.Lock() in __init__ (where other lazy-init fields are set) and remove any on-the-fly lock creation from _ensure_initialized; additionally, add guards to format_tool_for_llm, format_tools_for_llm, and format_image_in_message that raise a clear RuntimeError if self._initialized is False (e.g., "LLM not yet initialized; call generate/stream first") before delegating to self._llm to prevent AttributeError when these helpers are called too early.

@vizsatiz, I'll provide the fixes for both issues:

Fix 1: Create lock in __init__ to prevent race condition

In the __init__ method, change the lock initialization:

# Lazy initialization state self._llm = None self._initialized = False # Single lock instance to protect one-time async initialization self._init_lock = asyncio.Lock()

Then in the _ensure_initialized method, remove the lazy lock creation (lines 182-184):

async def _ensure_initialized(self): """ Ensure the LLM is initialized by fetching config on first use. Uses double-checked locking to prevent race conditions in concurrent scenarios. """ # Fast path: already initialized if self._initialized: return # Acquire lock for initialization async with self._init_lock: # Double-check: another task might have initialized while we waited if self._initialized: return # ... rest of the initialization logic

Fix 2: Add initialization guards to helper methods

Update the three helper methods to check initialization state:

def format_tool_for_llm(self, tool: 'Tool') -> Dict[str, Any]: """Format a tool for the specific LLM's API""" if not self._initialized or self._llm is None: raise RuntimeError( 'RootFloLLM is not initialized yet; call generate() or stream() first.' ) return self._llm.format_tool_for_llm(tool) def format_tools_for_llm(self, tools: List['Tool']) -> List[Dict[str, Any]]: """Format a list of tools for the specific LLM's API""" if not self._initialized or self._llm is None: raise RuntimeError( 'RootFloLLM is not initialized yet; call generate() or stream() first.' ) return self._llm.format_tools_for_llm(tools) def format_image_in_message(self, image: ImageMessageContent) -> str: """Format a image in the message""" if not self._initialized or self._llm is None: raise RuntimeError( 'RootFloLLM is not initialized yet; call generate() or stream() first.' ) return self._llm.format_image_in_message(image)

These changes ensure thread-safety in the lazy initialization and prevent AttributeError when helper methods are called before the LLM is initialized.

flo_ai/flo_ai/helpers/llm_factory.py

vizsatiz · 2025-11-22T10:21:46Z

flo_ai/flo_ai/llm/rootflo_llm.py

+        # Store initialization parameters for lazy initialization
+        self._base_url = base_url
+        self._model_id = model_id
+        self._app_key = app_key
+        self._app_secret = app_secret
+        self._issuer = issuer
+        self._audience = audience
+        self._access_token = access_token
+        self._temperature = temperature
+        self._kwargs = kwargs
+
+        # Lazy initialization state
+        self._llm = None
+        self._initialized = False
+        self._init_lock = None
+


coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

flo_ai/flo_ai/llm/rootflo_llm.py (1)
301-319: Guard helper methods against uninitialized _llm (runtime bug & prior feedback)

get_message_content now raises a clear RuntimeError when the LLM isn’t initialized, but the helper methods:

format_tool_for_llm

format_tools_for_llm

format_image_in_message

still call self._llm.* directly. If any of these are used before generate() / stream() are awaited, they will raise an AttributeError: 'NoneType' object has no attribute ... instead of a clear, intentional error. In the previous eager-init design these helpers were usable immediately after construction; with lazy init this becomes a regression and matches an unresolved earlier review comment.

To align behavior and fail fast with a clear message, factor the guard into a small private helper and reuse it in all four methods, e.g.:
@@
-    def get_message_content(self, response: Any) -> str:
-        """Extract message content from response"""
-        if not getattr(self, '_initialized', False) or self._llm is None:
-            raise RuntimeError(
-                'RootFloLLM is not initialized yet; call generate() or stream() first.'
-            )
-        return self._llm.get_message_content(response)
+    def _require_initialized(self) -> None:
+        if not getattr(self, '_initialized', False) or self._llm is None:
+            raise RuntimeError(
+                'RootFloLLM is not initialized yet; call generate() or stream() first.'
+            )
+
+    def get_message_content(self, response: Any) -> str:
+        """Extract message content from response"""
+        self._require_initialized()
+        return self._llm.get_message_content(response)
@@
     def format_tool_for_llm(self, tool: 'Tool') -> Dict[str, Any]:
         """Format a tool for the specific LLM's API"""
-        return self._llm.format_tool_for_llm(tool)
+        self._require_initialized()
+        return self._llm.format_tool_for_llm(tool)
@@
     def format_tools_for_llm(self, tools: List['Tool']) -> List[Dict[str, Any]]:
         """Format a list of tools for the specific LLM's API"""
-        return self._llm.format_tools_for_llm(tools)
+        self._require_initialized()
+        return self._llm.format_tools_for_llm(tools)
@@
     def format_image_in_message(self, image: ImageMessageContent) -> str:
         """Format a image in the message"""
-        return self._llm.format_image_in_message(image)
+        self._require_initialized()
+        return self._llm.format_image_in_message(image)
This preserves lazy initialization semantics while ensuring all helper entrypoints either work or raise the same explicit RuntimeError instead of a confusing AttributeError.

🧹 Nitpick comments (1)

flo_ai/flo_ai/llm/rootflo_llm.py (1)

16-23: Keep provider docs in sync with new VLLM support

Enum and wiring now support LLMProvider.VLLM, but the class docstring still says this proxies only “OpenAI, Gemini, and Anthropic”. Consider updating the docstring to include VLLM (and mention lazy config fetch) so callers don’t miss the new provider.

Also applies to: 27-29

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c7c4fb5 and 4b0910e.

📒 Files selected for processing (2)

flo_ai/flo_ai/helpers/llm_factory.py (4 hunks)
flo_ai/flo_ai/llm/rootflo_llm.py (8 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

flo_ai/flo_ai/helpers/llm_factory.py

🧰 Additional context used

🧬 Code graph analysis (1)

flo_ai/flo_ai/llm/rootflo_llm.py (5)

flo_ai/flo_ai/llm/base_llm.py (1)

BaseLLM (9-136)

flo_ai/flo_ai/llm/openai_llm.py (3)

OpenAI (17-209)

stream (119-149)

get_message_content (151-156)

flo_ai/flo_ai/llm/gemini_llm.py (3)

Gemini (19-260)

stream (151-206)

get_message_content (208-212)

flo_ai/flo_ai/llm/anthropic_llm.py (3)

Anthropic (18-277)

stream (155-200)

get_message_content (202-206)

flo_ai/flo_ai/llm/openai_vllm.py (1)

OpenAIVLLM (6-99)

🔇 Additional comments (4)

flo_ai/flo_ai/llm/rootflo_llm.py (4)

89-108: Lazy-init state and BaseLLM initialization look consistent

Storing ctor arguments on private fields, tracking _llm/_initialized, and creating a single _init_lock in __init__, while calling BaseLLM.__init__ with placeholder model/api_key, is a reasonable pattern given you overwrite self.model/self.api_key in _ensure_initialized. No functional issues spotted here.

Also applies to: 110-117

119-171: Async config fetch is robust and well-bounded

The _fetch_llm_config_async helper correctly uses httpx.AsyncClient with a timeout, raises on non-2xx via raise_for_status, validates presence of data/llm_model/type, and normalizes network vs. HTTP vs. generic failures into clear exception messages. For a one-time per-instance call this is perfectly adequate.

173-275: Lazy initialization and provider selection logic look correct

The double-checked locking with _init_lock, JWT generation (when access_token is absent), config fetch, mapping type → LLMProvider, construction of the per-provider wrapper (including custom_headers for all, VLLM included), and final _initialized flag set all look consistent and race-safe. The error message for unknown provider types is also clear.

277-298: generate/stream delegation matches lazy-init design

Having generate and stream await self._ensure_initialized() before delegating to self._llm keeps the public async API unchanged while safely deferring configuration. Argument forwarding (functions, output_schema, **kwargs) matches the BaseLLM and concrete provider signatures.

rootflo_llm -> lazy fetching llm config

64f65c9

- the fetch_llm_config call was synchronous, and if the server calling flo-ai and base_url are same, that was causing a deadlock.

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

vizsatiz requested changes Nov 22, 2025

View reviewed changes

llm_factory -> added openai_vllm

eb942e1

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

flo_ai/flo_ai/helpers/llm_factory.py Show resolved Hide resolved

added OpenAIVLLM to rootflo_llm

ff32552

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

flo_ai/flo_ai/llm/rootflo_llm.py Show resolved Hide resolved

resolved comments

c7c4fb5

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

vizsatiz requested changes Nov 22, 2025

View reviewed changes

resolved comments v2

4b0910e

coderabbitai bot reviewed Nov 22, 2025

View reviewed changes

resolved comments v3

560c1bf

vizsatiz approved these changes Nov 22, 2025

View reviewed changes

vizsatiz merged commit 19b3395 into develop Nov 22, 2025
6 checks passed

vizsatiz deleted the fix_rootflo_llm branch November 22, 2025 10:39

coderabbitai bot mentioned this pull request Nov 22, 2025

If access_token -> then app_key is not required #168

Merged

Conversation

rootflo-hardik commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

vizsatiz Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

vizsatiz Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vizsatiz Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 22, 2025

Choose a reason for hiding this comment

Fix 1: Create lock in __init__ to prevent race condition

Fix 2: Add initialization guards to helper methods

Uh oh!

Uh oh!

vizsatiz Nov 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rootflo-hardik commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading

coderabbitai bot Nov 22, 2025 •

edited

Loading

Fix 1: Create lock in `init` to prevent race condition