Skip to content

feat: add cleanup for stale function executions#11146

Merged
ChiragAgg5k merged 1 commit into1.8.xfrom
feat-cleanup-stale-executions
Jan 17, 2026
Merged

feat: add cleanup for stale function executions#11146
ChiragAgg5k merged 1 commit into1.8.xfrom
feat-cleanup-stale-executions

Conversation

@ChiragAgg5k
Copy link
Copy Markdown
Member

Summary

  • Adds a new interval task cleanupStaleExecutions that runs every 5 minutes
  • Marks executions stuck in processing status for more than 30 minutes as failed
  • Adds new environment variable _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS (default: 300 seconds)

Test plan

  • Deploy and verify the interval task starts correctly
  • Create a function execution and verify it processes normally
  • Manually set an execution to processing with old $createdAt and verify cleanup marks it as failed

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 16, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a new environment variable _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS=300 to .env and docker-compose.yml. Updates Interval::action signature to action(Database $dbForPlatform, callable $getProjectDB, Certificate $queueForCertificates) and adds a private cleanupStaleExecutions(Database $dbForPlatform, callable $getProjectDB): void method. The new task iterates projects, resolves per-project DB via the injected callable, finds executions with status processing older than 20 minutes, marks them failed with an error, and logs progress and per-project errors.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add cleanup for stale function executions' accurately and concisely summarizes the main change - introducing a new cleanup mechanism for stale function executions.
Description check ✅ Passed The description clearly outlines the feature being added, including specific details about the new interval task, the condition for marking executions as failed, and the environment variable configuration.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 16, 2026

Security Scan Results for PR

Docker Image Scan Results

Package Version Vulnerability Severity
libpng 1.6.51-r0 CVE-2025-66293 HIGH
libpng-dev 1.6.51-r0 CVE-2025-66293 HIGH

Source Code Scan Results

🎉 No vulnerabilities found!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds automated cleanup functionality for stale function executions that get stuck in 'processing' status. The cleanup runs as a periodic task every 5 minutes and marks executions that have been stuck for more than 30 minutes as 'failed'.

Changes:

  • Added new cleanupStaleExecutions method to the Interval task that identifies and fails executions stuck in 'processing' status
  • Added environment variable _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS with a default of 300 seconds (5 minutes)
  • Injected getProjectDB callable into the Interval task to enable per-project database operations

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/Appwrite/Platform/Tasks/Interval.php Implements stale execution cleanup logic, adds getProjectDB injection, and creates new cleanup method that runs every 5 minutes
docker-compose.yml Adds _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS environment variable to the interval task service
.env Defines default value (300 seconds) for the new cleanup interval configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 16, 2026

✨ Benchmark results

  • Requests per second: 2,559
  • Requests with 200 status code: 460,628
  • P99 latency: 0.071155039

⚡ Benchmark Comparison

Metric This PR Latest version
RPS 2,559 1,306
200 460,628 235,164
P99 0.071155039 0.160954703

Adds a new interval task that marks executions stuck in 'processing'
status for more than 30 minutes as 'failed' with a timeout error.
@ChiragAgg5k ChiragAgg5k force-pushed the feat-cleanup-stale-executions branch from 84da96a to f5a61fb Compare January 16, 2026 11:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Appwrite/Platform/Tasks/Interval.php`:
- Line 87: The stale threshold in Interval.php is incorrectly set to 20 minutes;
update the value used in DatabaseDateTime::addSeconds(new DateTime(), -1200) by
changing the offset to -1800 so $staleThreshold reflects 30 minutes (i.e., use
-1800 seconds) to match the PR description and tests.
♻️ Duplicate comments (3)
src/Appwrite/Platform/Tasks/Interval.php (3)

99-99: Consider using $updatedAt instead of $createdAt for stale detection.

An execution created 21+ minutes ago might have been in 'waiting' or 'scheduled' status for most of that time and only recently started processing. Using $updatedAt would more accurately identify executions that have been stuck in 'processing' status.


97-101: Pagination within project: only first 100 stale executions are cleaned per run.

If a project has more than 100 stale executions, only the first 100 will be processed. Consider implementing a loop to handle all stale executions or documenting this as expected behavior (relying on subsequent runs to catch the rest).


109-112: Execution update should include responseStatusCode for consistency.

When marking executions as failed due to timeout, setting responseStatusCode to 500 would maintain consistency with other failure scenarios in the codebase (e.g., in Functions.php and Executions/Create.php).

🧹 Nitpick comments (1)
.env (1)

104-105: Minor: Consider alphabetical ordering of environment variables.

The static analysis tool flags that _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS should come before _APP_INTERVAL_DOMAIN_VERIFICATION alphabetically. However, the current grouping is also reasonable since both are interval-related configs.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 84da96a and f5a61fb.

📒 Files selected for processing (3)
  • .env
  • docker-compose.yml
  • src/Appwrite/Platform/Tasks/Interval.php
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-08T01:20:14.364Z
Learnt from: stnguyen90
Repo: appwrite/appwrite PR: 10119
File: app/controllers/api/account.php:1226-1232
Timestamp: 2025-07-08T01:20:14.364Z
Learning: In Appwrite, `_APP_DOMAIN` is a required environment variable that must always be set for the system to function properly.

Applied to files:

  • docker-compose.yml
🧬 Code graph analysis (1)
src/Appwrite/Platform/Tasks/Interval.php (1)
app/realtime.php (1)
  • getProjectDB (80-133)
🪛 dotenv-linter (4.0.0)
.env

[warning] 105-105: [UnorderedKey] The _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS key should go before the _APP_INTERVAL_DOMAIN_VERIFICATION key

(UnorderedKey)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Setup & Build Appwrite Image
  • GitHub Check: Check if utopia-php/database changed
  • GitHub Check: Setup & Build Appwrite Image
  • GitHub Check: scan
🔇 Additional comments (5)
docker-compose.yml (1)

883-883: LGTM!

The new environment variable is correctly added to the appwrite-task-interval service, and all required database configuration variables are already present for the cleanup task to function properly.

src/Appwrite/Platform/Tasks/Interval.php (4)

26-29: LGTM!

The getProjectDB callable injection is correctly added alongside existing dependencies. This follows the established pattern seen in other Appwrite components like realtime.php.


38-50: LGTM!

The new cleanup interval follows the established goroutine pattern used for domain verification. The default 5-minute interval is appropriate for detecting stale executions without creating excessive database load.


114-116: Good error handling pattern.

The try-catch block correctly logs errors per-project without aborting the overall cleanup loop. This ensures a failure in one project doesn't prevent cleanup of other projects.


118-121: Verify pagination behavior of Database::foreach with Query::limit(100) in large deployments.

The concern about Query::limit(100) potentially skipping projects in large deployments is valid and warrants clarification. The Database::foreach method is from the external utopia-php/database library (v4.*), and its internal pagination handling cannot be definitively determined from the Appwrite codebase alone.

Check whether Database::foreach implements cursor-based pagination internally (like the foreachDocument helper in Action.php does) or if the limit acts as a hard ceiling. If it's the latter, the limit should be removed or increased to ensure all projects are processed on each cleanup cycle.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

private function cleanupStaleExecutions(Database $dbForPlatform, callable $getProjectDB): void
{
$time = DatabaseDateTime::now();
$staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1200); // 20 minutes ago
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Threshold mismatch: Code uses 20 minutes but PR description specifies 30 minutes.

The stale threshold is set to 1200 seconds (20 minutes), but the PR description and test plan state that executions should be marked as failed after 30 minutes. This inconsistency should be resolved.

Proposed fix to align with PR description (30 minutes)
-        $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1200); // 20 minutes ago
+        $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1800); // 30 minutes ago
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1200); // 20 minutes ago
$staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1800); // 30 minutes ago
🤖 Prompt for AI Agents
In `@src/Appwrite/Platform/Tasks/Interval.php` at line 87, The stale threshold in
Interval.php is incorrectly set to 20 minutes; update the value used in
DatabaseDateTime::addSeconds(new DateTime(), -1200) by changing the offset to
-1800 so $staleThreshold reflects 30 minutes (i.e., use -1800 seconds) to match
the PR description and tests.

@ChiragAgg5k ChiragAgg5k merged commit ea76213 into 1.8.x Jan 17, 2026
44 checks passed
@ChiragAgg5k ChiragAgg5k deleted the feat-cleanup-stale-executions branch January 17, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants