feat: add cleanup for stale function executions#11146
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a new environment variable Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Security Scan Results for PRDocker Image Scan Results
Source Code Scan Results🎉 No vulnerabilities found! |
There was a problem hiding this comment.
Pull request overview
This pull request adds automated cleanup functionality for stale function executions that get stuck in 'processing' status. The cleanup runs as a periodic task every 5 minutes and marks executions that have been stuck for more than 30 minutes as 'failed'.
Changes:
- Added new
cleanupStaleExecutionsmethod to the Interval task that identifies and fails executions stuck in 'processing' status - Added environment variable
_APP_INTERVAL_CLEANUP_STALE_EXECUTIONSwith a default of 300 seconds (5 minutes) - Injected
getProjectDBcallable into the Interval task to enable per-project database operations
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| src/Appwrite/Platform/Tasks/Interval.php | Implements stale execution cleanup logic, adds getProjectDB injection, and creates new cleanup method that runs every 5 minutes |
| docker-compose.yml | Adds _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS environment variable to the interval task service |
| .env | Defines default value (300 seconds) for the new cleanup interval configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
✨ Benchmark results
⚡ Benchmark Comparison
|
Adds a new interval task that marks executions stuck in 'processing' status for more than 30 minutes as 'failed' with a timeout error.
84da96a to
f5a61fb
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Appwrite/Platform/Tasks/Interval.php`:
- Line 87: The stale threshold in Interval.php is incorrectly set to 20 minutes;
update the value used in DatabaseDateTime::addSeconds(new DateTime(), -1200) by
changing the offset to -1800 so $staleThreshold reflects 30 minutes (i.e., use
-1800 seconds) to match the PR description and tests.
♻️ Duplicate comments (3)
src/Appwrite/Platform/Tasks/Interval.php (3)
99-99: Consider using$updatedAtinstead of$createdAtfor stale detection.An execution created 21+ minutes ago might have been in 'waiting' or 'scheduled' status for most of that time and only recently started processing. Using
$updatedAtwould more accurately identify executions that have been stuck in 'processing' status.
97-101: Pagination within project: only first 100 stale executions are cleaned per run.If a project has more than 100 stale executions, only the first 100 will be processed. Consider implementing a loop to handle all stale executions or documenting this as expected behavior (relying on subsequent runs to catch the rest).
109-112: Execution update should includeresponseStatusCodefor consistency.When marking executions as failed due to timeout, setting
responseStatusCodeto 500 would maintain consistency with other failure scenarios in the codebase (e.g., in Functions.php and Executions/Create.php).
🧹 Nitpick comments (1)
.env (1)
104-105: Minor: Consider alphabetical ordering of environment variables.The static analysis tool flags that
_APP_INTERVAL_CLEANUP_STALE_EXECUTIONSshould come before_APP_INTERVAL_DOMAIN_VERIFICATIONalphabetically. However, the current grouping is also reasonable since both are interval-related configs.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.envdocker-compose.ymlsrc/Appwrite/Platform/Tasks/Interval.php
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-08T01:20:14.364Z
Learnt from: stnguyen90
Repo: appwrite/appwrite PR: 10119
File: app/controllers/api/account.php:1226-1232
Timestamp: 2025-07-08T01:20:14.364Z
Learning: In Appwrite, `_APP_DOMAIN` is a required environment variable that must always be set for the system to function properly.
Applied to files:
docker-compose.yml
🧬 Code graph analysis (1)
src/Appwrite/Platform/Tasks/Interval.php (1)
app/realtime.php (1)
getProjectDB(80-133)
🪛 dotenv-linter (4.0.0)
.env
[warning] 105-105: [UnorderedKey] The _APP_INTERVAL_CLEANUP_STALE_EXECUTIONS key should go before the _APP_INTERVAL_DOMAIN_VERIFICATION key
(UnorderedKey)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Setup & Build Appwrite Image
- GitHub Check: Check if utopia-php/database changed
- GitHub Check: Setup & Build Appwrite Image
- GitHub Check: scan
🔇 Additional comments (5)
docker-compose.yml (1)
883-883: LGTM!The new environment variable is correctly added to the
appwrite-task-intervalservice, and all required database configuration variables are already present for the cleanup task to function properly.src/Appwrite/Platform/Tasks/Interval.php (4)
26-29: LGTM!The
getProjectDBcallable injection is correctly added alongside existing dependencies. This follows the established pattern seen in other Appwrite components likerealtime.php.
38-50: LGTM!The new cleanup interval follows the established goroutine pattern used for domain verification. The default 5-minute interval is appropriate for detecting stale executions without creating excessive database load.
114-116: Good error handling pattern.The try-catch block correctly logs errors per-project without aborting the overall cleanup loop. This ensures a failure in one project doesn't prevent cleanup of other projects.
118-121: Verify pagination behavior ofDatabase::foreachwithQuery::limit(100)in large deployments.The concern about
Query::limit(100)potentially skipping projects in large deployments is valid and warrants clarification. TheDatabase::foreachmethod is from the external utopia-php/database library (v4.*), and its internal pagination handling cannot be definitively determined from the Appwrite codebase alone.Check whether
Database::foreachimplements cursor-based pagination internally (like theforeachDocumenthelper in Action.php does) or if the limit acts as a hard ceiling. If it's the latter, the limit should be removed or increased to ensure all projects are processed on each cleanup cycle.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| private function cleanupStaleExecutions(Database $dbForPlatform, callable $getProjectDB): void | ||
| { | ||
| $time = DatabaseDateTime::now(); | ||
| $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1200); // 20 minutes ago |
There was a problem hiding this comment.
Threshold mismatch: Code uses 20 minutes but PR description specifies 30 minutes.
The stale threshold is set to 1200 seconds (20 minutes), but the PR description and test plan state that executions should be marked as failed after 30 minutes. This inconsistency should be resolved.
Proposed fix to align with PR description (30 minutes)
- $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1200); // 20 minutes ago
+ $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1800); // 30 minutes ago📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1200); // 20 minutes ago | |
| $staleThreshold = DatabaseDateTime::addSeconds(new DateTime(), -1800); // 30 minutes ago |
🤖 Prompt for AI Agents
In `@src/Appwrite/Platform/Tasks/Interval.php` at line 87, The stale threshold in
Interval.php is incorrectly set to 20 minutes; update the value used in
DatabaseDateTime::addSeconds(new DateTime(), -1200) by changing the offset to
-1800 so $staleThreshold reflects 30 minutes (i.e., use -1800 seconds) to match
the PR description and tests.
Summary
cleanupStaleExecutionsthat runs every 5 minutesprocessingstatus for more than 30 minutes asfailed_APP_INTERVAL_CLEANUP_STALE_EXECUTIONS(default: 300 seconds)Test plan
processingwith old$createdAtand verify cleanup marks it asfailed